From gokoproject at gmail.com Mon Jun 1 01:32:56 2015 From: gokoproject at gmail.com (John Wong) Date: Sun, 31 May 2015 19:32:56 -0400 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: Sorry I am on mobile but I'd chime in a concern: David wrote: > The problem that I have with virtualenv is that it requires quote a bit of configuration and a great deal of awareness by the user of what is going on and how things are configured. This is his response to Stephen's comment. I'd like to point out that personally I haven't found a use case in development I would find trouble with source bin/activate and continue with my life. But in various places including #python channel I would hear helpers strongly advise to run /PTAH/TO/VIRTENV/bin/python which seems like a great idea, especially in the case of writing app startup script. So I'm not sure how autonenv, virtualenvwrapper are likeable if they configure something in behalf of users and if users when into trouble the users still have to unfold the doc yo find "Ohhh" moment. What I'm suggesting is I feel these recommendations are kind of contradicting. Maybe I am not convinced why source activate is bad yet because I have not really seen the pain with concrete example, just alwAys someone telling me that is a bad idea. Frankly I never like to posion my source directory with npm_modules and have to reconfigure where to save npm modules. So I still don't gain much. But Having pip to recognize that requirements.txt is there and install can be helpful. But helpful has to come with a price... Is it better for a user to learn what they are doing now or have they enjoy easy ride and then find a mud hole? On Sunday, May 31, 2015, David Townshend wrote: > > On Sun, May 31, 2015 at 9:00 PM, Andrew Barnert > wrote: > >> On May 31, 2015, at 09:19, David Townshend > > wrote: >> >> >>> The default for npm is that your package dir is attached directly to the >>> project. You can get more flexibility by setting an environment variable or >>> creating a symlink, but normally you don't. It has about the same >>> flexibility as virtualenvwrapper, with about the same amount of effort. So >>> if virtualenvwrapper isn't flexible enough for you, my guess is that your >>> take on npm won't be flexible enough either, it'll just come preconfigured >>> for your own idiosyncratic use and everyone else will have to adjust... >>> >> >> You have a point. Maybe lack of flexibility is not actually the issue - >> it's too much flexibility. >> >> >> I think Python needs that kind of flexibility, because it's used in a >> much wider range of use cases, from binary end-user applications to OS >> components to "just run this script against your system environment" to >> conda packages, not just web apps managed by a deployment team and other >> things that fall into the same model. And it needs to be backward >> compatible with the different ways people have come up with for handling >> all those models. >> >> While it's possible to rebuild all of those models around the npm model, >> and the node community is gradually coming up with ways of doing so >> (although notice that much of the node community is instead relying on >> docker or VMs...), you'd have to be able to transparently replace all of >> the current Python use cases today if you wanted to change Python today. >> >> Also, as Nick pointed out, making things easier for the developer comes >> at the cost of making things harder for the user--which is acceptable when >> the user is the developer himself or a deployment team that sits at the >> next set of cubicles, but may not be acceptable when the user is someone >> who just wants to run a script he found online. Again, the Node community >> is coming to terms with this, but they haven't got to the same level as the >> Python community, and, even if they had, it still wouldn't work as a >> drop-in replacement without a lot of work. >> >> What someone _could_ do is make it easier to set up a dev-friendly >> environment based on virtualenvwrapper and virtualenvwrapperhelper. >> Currently, you have to know what you're looking for and find a blog page >> somewhere that tells you how to install and configure all the tools and >> follow three or four steps. That's obvious less than ideal. It would be >> nice if there were a single "pip install envstuff" that got you ready out >> of the box (including working for Windows cmd and PowerShell), and if links >> to that were included in the basic Python docs. It would also be nice if >> there were a way to transfer your own custom setup to a new machine. But I >> don't see why that can't all be built as improvements on the existing tools >> (and a new package that just included requirements and configuration and no >> new tools). >> >> The problem that I have with virtualenv is that it requires quite a bit >> of configuration and a great deal of awareness by the user of what is going >> on and how things are configured. As stated on it's home page While there >> is nothing specifically wrong with this, I usually just want a way to do >> something in a venv without thinking too much about where it is or when or >> how to activate it. >> >> >> But again, if that's what you want, that's what you have with >> virtualenvwrapper or autoenv. You just cd into the directory (whether a new >> one you just created with the wrapper or an old one you just pulled from >> git) and it's set up for you. And setting up a new environment or cloning >> an existing one is just a single command, too. Sure, you can make your >> configuration more complicated than that, but if you don't want to, you >> don't have to. >> >> If you've had a look at the details of the sort of tool I'm proposing, it >> is completely transparent. Perhaps the preconfiguration is just to my own >> idiosyncrasies, but if it serves its use 90% of the time then maybe that is >> good enough. >> >> >> Some of what I'm proposing could be incorporated in to pip (i.e. better >> requirements) and some could possibly be incorporated into >> virtualenvwrapper (although I still think that my proposal for handling >> venvs is just too different from that of virtualenvwrapper to be worth >> pursuing that course), but one of the main aims is to merge it all into one >> tool that manages both the venv and the requirements. >> >> >> There are major advantages in not splitting the Python community between >> two different sets of tools. We've only recently gotten past easy_install >> vs. pip and distribute vs. setuptools, which has finally enabled a clean >> story for everyone who wants to distribute packages to get it right, which >> has finally started to happen (although there are people still finding and >> following blog posts that tell them to install distribute or not to use >> virtualenv because it doesn't play nice with py2app or whatever). >> >> I'm quite sure that this proposal is not going to accepted without a >> trial period on pypi, so maybe that will be the test of whether this is >> useful. >> >> Is this the right place for this, or would distutils-sig be better? >> >> >> Other people have made the case for both sides of that earlier in the >> thread and I'm not sure which one is more compelling... >> >> Also, the pure pip enhancement of coming up with something better than >> freeze/-r may belong on distutils-sig while the environment-aware launcher >> and/or environment-managing tools may belong here. (Notice that Python >> includes venv and the py launcher, but doesn't include setuptools or pip...) >> > > Just to be clear, I'm not suggesting changing the python executable > itself, or any of the other tools already in existence. My proposal is a > separate wrapper around existing python, pip and venv which would not > change anything about the way it works currently. A dev environment set up > using it could still be deployed in the same way it would be now, and there > would still be the option of using virtualenvwrapper, or something else for > those that want to. It is obviously way too early to try to get it > included in the next python release (apart form anything else, pip would > need to be added first), so really this proposal is meant more to gauge > interest in the concept so that if it is popular I can carry on developing > it and preparing it for inclusion in the stdlib, or at least a serious > discussion about including it, once it is mature. > > That said, Andrew's arguments have convinced me that much could be done to > improve existing tools before creating a new one, although I still don't > believe virtualenvwrapper can be squashed into the shape I'm aiming for > without fundamental changes. Also, from the other responses so far it > seems that the general feeling is that handling of requirements could > definitely be improved, but that anything too prescriptive with venvs would > be problematic. Unfortunately for my proposal, if something like what I'm > suggesting were officially supported via inclusion in the stdlib it would > quickly become, at best, the "strongly recommended" way of working and at > worst the One Obvious Way. With all this in mind, I'll withdraw my > proposal, but continue development on my version and see if it goes > anywhere. I'll also see how much of it's functionality I can put into > other tools (specifically pip's requirements handling) instead. > -- Sent from Jeff Dean's printf() mobile console -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.subbarao1 at gmail.com Mon Jun 1 04:25:46 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Sun, 31 May 2015 19:25:46 -0700 Subject: [Python-ideas] Python Float Update Message-ID: Dear Python Developers: I will be presenting a modification to the float class, which will improve its speed and accuracy (reduce floating point errors). This is applicable because Python uses a numerator and denominator rather than a sign and mantissa to represent floats. First, I propose that a float's integer ratio should be accurate. For example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it returns(6004799503160661, 18014398509481984). Second of all, even though 1 * 3 = 3 (last example), 6004799503160661 * 3 does not equal 18014398509481984. Instead, it equals 1801439850948198**3**, one less than the value in the ratio. This means the ratio is inaccurate, as well as completely not simplified. [image: Inline image 1] Even if the value displayed for a float is a rounded value, the internal numerator and denominator should divide to equal to completely accurate value. Thanks for considering this improvement! Sincerely, u8y7541 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: pythonfloats.PNG Type: image/png Size: 16278 bytes Desc: not available URL: From njs at pobox.com Mon Jun 1 04:37:14 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 31 May 2015 19:37:14 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On May 31, 2015 7:26 PM, "u8y7541 The Awesome Person" < surya.subbarao1 at gmail.com> wrote: > > Dear Python Developers: > > I will be presenting a modification to the float class, which will improve its speed and accuracy (reduce floating point errors). This is applicable because Python uses a numerator and denominator rather than a sign and mantissa to represent floats. Python's floats are in fact ieee754 floats, using sign/mantissa/exponent, as provided by all popular CPU floating point hardware. This is why you're getting the results you see -- 1/3 cannot be exactly represented as a float, so it gets rounded to the closest representable float, and then as_integer_ratio shows you an exact representation of this rounded value. It sounds like you're instead looking for an exact fraction representation, which in python is available in the standard "fractions" module: https://docs.python.org/3.5/library/fractions.html -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Mon Jun 1 04:48:12 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 1 Jun 2015 12:48:12 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person wrote: > > I will be presenting a modification to the float class, which will improve its speed and accuracy (reduce floating point errors). This is applicable because Python uses a numerator and denominator rather than a sign and mantissa to represent floats. > > First, I propose that a float's integer ratio should be accurate. For example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it returns(6004799503160661, 18014398509481984). > I think you're misunderstanding the as_integer_ratio method. That isn't how Python works internally; that's a service provided for parsing out float internals into something more readable. What you _actually_ are working with is IEEE 754 binary64. (Caveat: I have no idea what Python-the-language stipulates, nor what other Python implementations use, but that's what CPython uses, and you did your initial experiments with CPython. None of this discussion applies *at all* if a Python implementation doesn't use IEEE 754.) So internally, 1/3 is stored as: 0 <-- sign bit (positive) 01111111101 <-- exponent (1021) 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 bits, repeating) The exponent is offset by 1023, so this means 1.010101.... divided by 2?; the original repeating value is exactly equal to 4/3, so this is correct, but as soon as it's squeezed into a finite-sized mantissa, it gets rounded - in this case, rounded down. That's where your result comes from. It's been rounded such that it fits inside IEEE 754, and then converted back to a fraction afterwards. You're never going to get an exact result for anything with a denominator that isn't a power of two. Fortunately, Python does offer a solution: store your number as a pair of integers, rather than as a packed floating point value, and all calculations truly will be exact (at the cost of performance): >>> one_third = fractions.Fraction(1, 3) >>> one_eighth = fractions.Fraction(1, 8) >>> one_third + one_eighth Fraction(11, 24) This is possibly more what you want to work with. ChrisA From random832 at fastmail.us Mon Jun 1 05:14:06 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 31 May 2015 23:14:06 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: > First, I propose that a float's integer ratio should be accurate. For > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > returns(6004799503160661, 18014398509481984). Even though he's mistaken about the core premise, I do think there's a kernel of a good idea here - it would be nice to have a method (maybe as_integer_ratio, maybe with some parameter added, maybe a different method) to return with the smallest denominator that would result in exactly the original float if divided out, rather than merely the smallest power of two. From jim.witschey at gmail.com Mon Jun 1 05:21:36 2015 From: jim.witschey at gmail.com (Jim Witschey) Date: Mon, 01 Jun 2015 03:21:36 +0000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: Teachable moments about the implementation of floating-point aside, something in this neighborhood has been considered and rejected before, in PEP 240. However, that was in 2001 - it was apparently created the same day as PEP 237, which introduced transparent conversion of machine ints to bignums in the int type. I think hiding hardware number implementations has been a success for integers - it's a far superior API. It could be for rationals as well. Has something like this thread's original proposal - interpeting decimal-number literals as fractional values and using fractions as the result of integer arithmetic - been seriously discussed more recently than PEP 240? If so, why haven't they been implemented? Perhaps enough has changed that it's worth reconsidering. On Sun, May 31, 2015 at 22:49 Chris Angelico wrote: > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person > wrote: > > > > I will be presenting a modification to the float class, which will > improve its speed and accuracy (reduce floating point errors). This is > applicable because Python uses a numerator and denominator rather than a > sign and mantissa to represent floats. > > > > First, I propose that a float's integer ratio should be accurate. For > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > returns(6004799503160661, 18014398509481984). > > > > I think you're misunderstanding the as_integer_ratio method. That > isn't how Python works internally; that's a service provided for > parsing out float internals into something more readable. What you > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no > idea what Python-the-language stipulates, nor what other Python > implementations use, but that's what CPython uses, and you did your > initial experiments with CPython. None of this discussion applies *at > all* if a Python implementation doesn't use IEEE 754.) So internally, > 1/3 is stored as: > > 0 <-- sign bit (positive) > 01111111101 <-- exponent (1021) > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 > bits, repeating) > > The exponent is offset by 1023, so this means 1.010101.... divided by > 2?; the original repeating value is exactly equal to 4/3, so this is > correct, but as soon as it's squeezed into a finite-sized mantissa, it > gets rounded - in this case, rounded down. > > That's where your result comes from. It's been rounded such that it > fits inside IEEE 754, and then converted back to a fraction > afterwards. You're never going to get an exact result for anything > with a denominator that isn't a power of two. Fortunately, Python does > offer a solution: store your number as a pair of integers, rather than > as a packed floating point value, and all calculations truly will be > exact (at the cost of performance): > > >>> one_third = fractions.Fraction(1, 3) > >>> one_eighth = fractions.Fraction(1, 8) > >>> one_third + one_eighth > Fraction(11, 24) > > This is possibly more what you want to work with. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Mon Jun 1 05:27:47 2015 From: mertz at gnosis.cx (David Mertz) Date: Sun, 31 May 2015 20:27:47 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> References: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> Message-ID: On Sun, May 31, 2015 at 8:14 PM, wrote: > Even though he's mistaken about the core premise, I do think there's a > kernel of a good idea here - it would be nice to have a method (maybe > as_integer_ratio, maybe with some parameter added, maybe a different > method) to return with the smallest denominator that would result in > exactly the original float if divided out, rather than merely the > smallest power of two. > What is the computational complexity of a hypothetical float.as_simplest_integer_ratio() method? How hard that is to find is not obvious to me (probably it should be, but I'm not sure). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Jun 1 05:37:17 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 31 May 2015 23:37:17 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> On Sun, May 31, 2015, at 23:21, Jim Witschey wrote: > I think hiding hardware number implementations has been a success for > integers - it's a far superior API. It could be for rationals as well. I'd worry about unbounded complexity. For rationals, unlike integers, values don't have to be large for their bignum representation to be large. > > Has something like this thread's original proposal - interpeting > decimal-number literals as fractional values and using fractions as the > result of integer arithmetic - been seriously discussed more recently > than > PEP 240? If so, why haven't they been implemented? Perhaps enough has > changed that it's worth reconsidering. Also, it raises a question of string representation. Granted, "1/3" becomes much more defensible as the repr of Fraction(1, 3) if it in fact evaluates to that value, but how much do you like "6/5" as the repr of 1.2? Or are we going to use Fractions for integer division and Decimals for literals? And, what of decimal division? Right now you can't even mix Fraction and Decimal in arithmetic operations. And are we going to add %e %f and %g support for both types? Directly so, without any detour to float and its limitations (i.e. %.100f gets you 100 true decimal digits of precision)? Current reality: >>> '%.50f' % Fraction(1, 3) '0.33333333333333331482961625624739099293947219848633' >>> '%.50f' % Decimal('0.3333333333333333333333333333333333333') '0.33333333333333331482961625624739099293947219848633' >>> '{:.50f}'.format(Fraction(1, 3)) Traceback (most recent call last): File "", line 1, in TypeError: non-empty format string passed to object.__format__ >>> '{:.50f}'.format(Decimal('0.3333333333333333333333333333333333')) '0.33333333333333333333333333333333330000000000000000' Okay, that's one case right out of four. From bussonniermatthias at gmail.com Mon Jun 1 05:46:03 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Sun, 31 May 2015 20:46:03 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> > On May 31, 2015, at 20:21, Jim Witschey wrote: > > Teachable moments about the implementation of floating-point aside, something in this neighborhood has been considered and rejected before, in PEP 240. However, that was in 2001 - it was apparently created the same day as PEP 237, which introduced transparent conversion of machine ints to bignums in the int type. > > I think hiding hardware number implementations has been a success for integers - it's a far superior API. It could be for rationals as well. > > Has something like this thread's original proposal - interpeting decimal-number literals as fractional values and using fractions as the result of integer arithmetic - been seriously discussed more recently than PEP 240? If so, why haven't they been implemented? Perhaps enough has changed that it's worth reconsidering. Why I see the interest, does it really belong in core Python ? What would be the advantages ? IIRC (during | after) the language submit at PyCon this year, it was said that maybe the stdlib should get less features, not more. Side note, Sympy as a IPython ast-hook that will wrap all your integers into SymPy Integers and hence give you rationals of whatever you like, if you want to SymPy-plify your life. But for majority of use will it be useful ? What would be the performance costs ? If you start into stroring rationals, then why not continued fraction, as they are just a N-tuple, instead of 2-tuples. but then you are limited to non-infinite continued fraction. So you improve by using generator... I love Python for doing science and math, but please stay away from putting too much in standard lib, or we will end up with cholesky matrix decomposition in Python 4.0 like Julia does? and I?m not sure it is a good idea. I would much rather have a core set of library ?blessed? by CPython that provide features like this one, that are deemed ?important?. ? M From ron3200 at gmail.com Mon Jun 1 05:55:43 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 31 May 2015 23:55:43 -0400 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: References: Message-ID: On 05/30/2015 11:45 AM, Ron Adam wrote: > > The solution I found was to call a function to explicitly set the shared > items in the imported module. A bit of an improvement... def export_to(module, **d): """ Explitely share objects with imported module. Use this_module.item in the sub-module after item is exported to it. """ from collections import namedtuple namespace = namedtuple("exported", d.keys()) for k, v in d.items(): setattr(namespace, k, v) # Not sure about this. Possibly sys.get_frame would be better. setattr(module, __loader__.name, namespace) And used like this. import sub_mod export_to(sub_mod, foo=foo, bar=bar) Then functions in sub-mod can access the objects as if the sub-module imported the parent module, but only the exported items are visible to the sub module. Again, this is for closely dependent modules that can't easily be split by moving common objects into a mutually imported file, or if it is desired to split a larger module by functionality rather than dependency. There are some limitations, but I think they are actually desirable features. The sub-module can't use exported objects at the top level, and it can't alter the parent modules name space directly. Of course, it could just be my own preferences. I like the pattern of control (the specifying of what gets imported/shared) flowing from the top down. Cheers, Ron From tjreedy at udel.edu Mon Jun 1 06:37:53 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 01 Jun 2015 00:37:53 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On 5/31/2015 11:21 PM, Jim Witschey wrote: > Has something like this thread's original proposal - interpeting > decimal-number literals as fractional values and using fractions as the > result of integer arithmetic - been seriously discussed more recently > than PEP 240? The competing proposal is to treat decimal literals as decimal.Decimal values. -- Terry Jan Reedy From casevh at gmail.com Mon Jun 1 06:39:03 2015 From: casevh at gmail.com (Case Van Horsen) Date: Sun, 31 May 2015 21:39:03 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> References: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> Message-ID: On Sun, May 31, 2015 at 8:14 PM, wrote: > On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: >> First, I propose that a float's integer ratio should be accurate. For >> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it >> returns(6004799503160661, 18014398509481984). > > Even though he's mistaken about the core premise, I do think there's a > kernel of a good idea here - it would be nice to have a method (maybe > as_integer_ratio, maybe with some parameter added, maybe a different > method) to return with the smallest denominator that would result in > exactly the original float if divided out, rather than merely the > smallest power of two. The gmpy2 library already supports such a method. >>> import gmpy2 >>> gmpy2.version() '2.0.3' >>> a=gmpy2.mpfr(1)/3 >>> a.as_integer_ratio() (mpz(6004799503160661), mpz(18014398509481984)) >>> a.as_simple_fraction() mpq(1,3) >>> gmpy2 uses a version of the Stern-Brocot algorithm to find the shortest fraction that, when converted to a floating point value, will return the same value as the original floating point value. The implementation was originally done by Alex Martelli; I have just maintained it over the years. The algorithm is quite fast. If there is a consensus to add this method to Python, I would be willing to help implement it. casevh > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From cs at zip.com.au Mon Jun 1 06:37:23 2015 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 1 Jun 2015 14:37:23 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: <20150601043723.GA81627@cskk.homeip.net> On 31May2015 20:27, David Mertz wrote: >On Sun, May 31, 2015 at 8:14 PM, wrote: > >> Even though he's mistaken about the core premise, I do think there's a >> kernel of a good idea here - it would be nice to have a method (maybe >> as_integer_ratio, maybe with some parameter added, maybe a different >> method) to return with the smallest denominator that would result in >> exactly the original float if divided out, rather than merely the >> smallest power of two. >> > >What is the computational complexity of a hypothetical >float.as_simplest_integer_ratio() method? How hard that is to find is not >obvious to me (probably it should be, but I'm not sure). Probably the same as Euler's greatest common factor method. About log(n) I think. Take as_integer_ratio, find greatest common factor, divide both by that. Cheers, Cameron Simpson In the desert, you can remember your name, 'cause there ain't no one for to give you no pain. - America From random832 at fastmail.us Mon Jun 1 07:11:10 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 01 Jun 2015 01:11:10 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150601043723.GA81627@cskk.homeip.net> References: <20150601043723.GA81627@cskk.homeip.net> Message-ID: <1433135470.64534.283159801.6D1501C7@webmail.messagingengine.com> On Mon, Jun 1, 2015, at 00:37, Cameron Simpson wrote: > Probably the same as Euler's greatest common factor method. About log(n) > I > think. Take as_integer_ratio, find greatest common factor, divide both by > that. Er, no, because (6004799503160661, 18014398509481984) are already mutually prime, and we want (1, 3). This is a different problem from finding a reduced fraction. There are algorithms, I know, for constraining the denominator to a specific range (Fraction.limit_denominator does this), but that's not *quite* the same as finding the lowest one that will still convert exactly to the original float From jim.witschey at gmail.com Mon Jun 1 07:19:26 2015 From: jim.witschey at gmail.com (Jim Witschey) Date: Mon, 1 Jun 2015 01:19:26 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> Message-ID: On Sun, May 31, 2015 at 11:37 PM, wrote: > I'd worry about unbounded complexity. For rationals, unlike integers, > values don't have to be large for their bignum representation to be > large. I'd expect rational representations to be reasonably small until a value was operated on many times, in which case you're using more space, but representing the result very precisely. It's a tradeoff, but with a small cost in the common case. I'm no expert, though -- am I not considering some case? > how much do you like "6/5" as the repr of 1.2? 6/5 is an ugly representation of 1.2, but consider the current state of affairs: >>> 1.2 1.2 "1.2" is imprecisely interpreted as 1.2000000476837158 * (2^0), which is then imprecisely represented as 1.2. I recognize this is the way we've dealt with non-integer numbers for a long time, but "1.2" => SomeKindOfRational(6, 5) => "6/5" is conceptually cleaner. > Or are we going to use Fractions for integer division and Decimals > for literals? I had been thinking of rationals built on bignums all around, a la Haskell. Is Fraction as it exists today up to it? I don't know. I agree that some principled decisions would have to be made for, e.g., interpretation by format strings. From casevh at gmail.com Mon Jun 1 07:42:21 2015 From: casevh at gmail.com (Case Van Horsen) Date: Sun, 31 May 2015 22:42:21 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> Message-ID: On Sun, May 31, 2015 at 8:27 PM, David Mertz wrote: > What is the computational complexity of a hypothetical > float.as_simplest_integer_ratio() method? How hard that is to find is not > obvious to me (probably it should be, but I'm not sure). > Here is a (barely tested) implementation based on the Stern-Brocot tree: def as_simple_integer_ratio(x): x = abs(float(x)) left = (int(x), 1) right = (1, 0) while True: mediant = (left[0] + right[0], left[1] + right[1]) test = mediant[0] / mediant[1] print(left, right, mediant, test) if test == x: return mediant elif test < x: left = mediant else: right = mediant print(as_simple_integer_ratio(41152/263)) The approximations are printed so you can watch the convergence. casevh From cs at zip.com.au Mon Jun 1 07:27:45 2015 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 1 Jun 2015 15:27:45 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433135470.64534.283159801.6D1501C7@webmail.messagingengine.com> References: <1433135470.64534.283159801.6D1501C7@webmail.messagingengine.com> Message-ID: <20150601052745.GA58480@cskk.homeip.net> On 01Jun2015 01:11, random832 at fastmail.us wrote: >On Mon, Jun 1, 2015, at 00:37, Cameron Simpson wrote: >> Probably the same as Euler's greatest common factor method. About log(n) >> I >> think. Take as_integer_ratio, find greatest common factor, divide both by >> that. > >Er, no, because (6004799503160661, 18014398509481984) are already >mutually prime, and we want (1, 3). This is a different problem from >finding a reduced fraction. Ah, you want the simplest fraction that _also_ gives the same float representation? >There are algorithms, I know, for >constraining the denominator to a specific range >(Fraction.limit_denominator does this), but that's not *quite* the same >as finding the lowest one that will still convert exactly to the >original float Hmm. Thanks for this clarification. Cheers, Cameron Simpson The Design View editor of Visual InterDev 6.0 is currently incompatible with Compatibility Mode, and may not function correctly. - George Politis , 22apr1999, quoting http://msdn.microsoft.com/vstudio/technical/ie5.asp From jim.witschey at gmail.com Mon Jun 1 07:59:39 2015 From: jim.witschey at gmail.com (Jim Witschey) Date: Mon, 1 Jun 2015 01:59:39 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier wrote: > IIRC (during | after) the language submit at PyCon this year, it was said that maybe the stdlib should get > less features, not more. Rationals (and Decimals) already exist in the standard library. The original proposal (as I read it, anyway) is more about the default interpretation of, e.g., integer division and decimal-number literals. > Side note, Sympy as a IPython ast-hook that will wrap all your integers into SymPy Integers and hence > give you rationals of whatever you like, if you want to SymPy-plify your life. Thank you for the pointer -- that's really cool. > But for majority of use will it be useful ? I believe interpreting "0.1" as 1/10 is more ergonomic than representing it as 1.600000023841858 * (2^-4). I see it as being more useful -- a better fit -- in most use cases because it's simpler, more precise, and more understandable. > What would be the performance costs ? I don't know. Personally, I'd be willing to pay a performance penalty to avoid reasoning about floating-point arithmetic most of the time, then "drop into" floats when I need the speed. From jim.witschey at gmail.com Mon Jun 1 08:02:21 2015 From: jim.witschey at gmail.com (Jim Witschey) Date: Mon, 1 Jun 2015 02:02:21 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On Mon, Jun 1, 2015 at 12:37 AM, Terry Reedy wrote: > The competing proposal is to treat decimal literals as decimal.Decimal > values. Is that an existing PEP? I couldn't find any such proposal. From nicholas.chammas at gmail.com Mon Jun 1 08:27:57 2015 From: nicholas.chammas at gmail.com (Nicholas Chammas) Date: Mon, 01 Jun 2015 06:27:57 +0000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: I don?t know. Personally, I?d be willing to pay a performance penalty to avoid reasoning about floating-point arithmetic most of the time, then ?drop into? floats when I need the speed. This is perhaps a bit off topic for the thread, but +9000 for this. Having decimal literals or something similar by default, though perhaps problematic from a backwards compatibility standpoint, is a) user friendly, b) easily understandable, and c) not surprising to beginners. None of these qualities apply to float literals. I always assumed that float literals were mostly an artifact of history or of some performance limitations. Free of those, why would a language choose them over decimal literals? When does someone ever expect floating-point madness, unless they are doing something that is almost certainly not common, or unless they have been burned in the past? Every day another programmer gets bitten by floating point stupidities like this one . It would be a big win to kill this lame ?programmer rite of passage? and give people numbers that work more like how they learned them in school. The competing proposal is to treat decimal literals as decimal.Decimal values. I?m interested in learning more about such a proposal. Nick ? On Mon, Jun 1, 2015 at 2:03 AM Jim Witschey wrote: > On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier > wrote: > > IIRC (during | after) the language submit at PyCon this year, it was > said that maybe the stdlib should get > > less features, not more. > > Rationals (and Decimals) already exist in the standard library. The > original proposal (as I read it, anyway) is more about the default > interpretation of, e.g., integer division and decimal-number literals. > > > Side note, Sympy as a IPython ast-hook that will wrap all your integers > into SymPy Integers and hence > > give you rationals of whatever you like, if you want to SymPy-plify your > life. > > Thank you for the pointer -- that's really cool. > > > But for majority of use will it be useful ? > > I believe interpreting "0.1" as 1/10 is more ergonomic than > representing it as 1.600000023841858 * (2^-4). I see it as being more > useful -- a better fit -- in most use cases because it's simpler, more > precise, and more understandable. > > > What would be the performance costs ? > > I don't know. Personally, I'd be willing to pay a performance penalty > to avoid reasoning about floating-point arithmetic most of the time, > then "drop into" floats when I need the speed. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 1 08:37:27 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jun 2015 16:37:27 +1000 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: References: Message-ID: On 1 June 2015 at 13:55, Ron Adam wrote: > Of course, it could just be my own preferences. I like the pattern of > control (the specifying of what gets imported/shared) flowing from the top > down. This is actually how we bootstrap the import system in 3.3+ (we inject the sys and os modules *after* the top level execution of the bootstrap module is done, since the "import" statement doesn't work yet at the point where that module is running). However, this trick is never a desirable answer, just sometimes the least wrong choice out of multiple bad options :) Cheers, Nick. P.S. Python 3.5 is also more tolerant of circular imports than has historically been the case: https://bugs.python.org/issue17636 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Jun 1 08:39:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 31 May 2015 23:39:34 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> Message-ID: <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> On May 31, 2015, at 20:37, random832 at fastmail.us wrote: > > Also, it raises a question of string representation. Granted, "1/3" > becomes much more defensible as the repr of Fraction(1, 3) if it in fact > evaluates to that value, but how much do you like "6/5" as the repr of > 1.2? Or are we going to use Fractions for integer division and Decimals > for literals? That's the big problem. There's no one always-right answer. If you interpret the literal 1.20 a Fraction, it's going to be more confusing, not less, to people who are just trying to add up dollars and cents. Do a long financial computation and, instead of $691.05 as you expected or $691.0500000237 as you get today, you've got 10215488088 / 14782560. Not to mention that financial calculations often tend to involve things like e or exponentiation to non-integral powers, and what happens then? And then of course there's the unbounded size issue. If you do a long chain of operations that can theoretically be represented exactly followed by one that can't, you're wasting a ton of time and space for those intermediate values (and, unlike Haskell, Python can't look at the whole expression in advance and determine what the final type will be). On other other hand, if you interpret 1.20 it as a Decimal, now you can't sensibly mix 1.20 * 3/4 without coming up with a rule for how decimal and fraction types should interact. (OK, there's an obvious right answer for multiplication, but what about for addition?) And either one leads to people asking why the code they ported from Java or Ruby is broken on Python. You could make it configurable, so integer division is your choice of float, fraction, or decimal and decimal literals are your separate choice of the same three (and maybe also let fraction exponentiation be your choice of decimal and float), but then which setting is the default? Also, where do you set that? It has to be available at compile time, unless you want to add new types like "decimal literal" at compile time that are interpreted appropriately at runtime (which some languages do, and it works, but it definitely adds complexity). Maybe the answer is just to make it easier to be explicit, using something like C++ literal suffixes, so you can write, e.g., 1.20d or 1/3f (and I guess 1.2f) instead of Decimal('1.20') or Fraction(1, 3) (and Fraction(12, 10)). > And, what of decimal division? Right now you can't even > mix Fraction and Decimal in arithmetic operations. > > And are we going to add %e %f and %g support for both types? Directly > so, without any detour to float and its limitations (i.e. %.100f gets > you 100 true decimal digits of precision)? At least here I think the answer is clear. %-substitution is printf-like, and shouldn't change. If you want formatting that can be overloaded by the type, you use {}, which already works. From ncoghlan at gmail.com Mon Jun 1 09:08:40 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 1 Jun 2015 17:08:40 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On 1 June 2015 at 16:27, Nicholas Chammas wrote: > I always assumed that float literals were mostly an artifact of history or > of some performance limitations. Free of those, why would a language choose > them over decimal literals? In a world of binary computers, no programming language is free of those constraints - if you choose decimal literals as your default, you take a *big* performance hit, because computers are designed as binary systems. (Some languages, like IBM's REXX, do choose to use decimal integers by default) For CPython, we offer C-accelerated decimal support by default since 3.3 (available as pip install cdecimal in Python 2), but it still comes at a high cost in speed: $ python3 -m timeit -s "n = 1.0; d = 3.0" "n / d" 10000000 loops, best of 3: 0.0382 usec per loop $ python3 -m timeit -s "from decimal import Decimal as D; n = D(1); d = D(3)" "n / d" 10000000 loops, best of 3: 0.129 usec per loop And this isn't even like the situation with integers, where the semantics of long integers are such that native integers can be used transparently as an optimisation technique - IEEE754 (which defines the behaviour of native binary floats) and the General Decimal Arithmetic Specification (which defines the behaviour of the decimal module) are genuinely different ways of doing floating point arithmetic, since the choice of base 2 or base 10 has far reaching ramifications for the way various operations work and how various errors accumulate. We aren't even likely to see widespread proliferation of hardware level decimal arithmetic units, because the "binary arithmetic is easier to implement than decimal arithmetic" consideration extends down to the hardware layer as well - a decimal arithmetic unit takes more silicon, and hence more power, than a similarly capable binary unit. With battery conscious mobile device design and environmentally conscious data centre design being two of the most notable current trends in CPU design, this makes it harder than ever to justify providing hardware support for both in general purpose computing devices. For some use cases (e.g. financial math), it's worth paying the price in speed to get the base 10 arithmetic semantics, or the cost in hardware to accelerate it, but for most other situations, we end up being better off teaching humans to cope with the fact that binary logic is the native language of our computational machines. Binary vs decimal floating point is a lot like the Unicode bytes/text distinction in that regard: while Unicode is a better model for representing human communications, there's no avoiding the fact that that text eventually has to be rendered as a bitstream in order to be saved or transmitted. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Mon Jun 1 09:10:24 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 01 Jun 2015 09:10:24 +0200 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> References: <1433128446.31560.283106753.6D60F98F@webmail.messagingengine.com> Message-ID: random832 at fastmail.us schrieb am 01.06.2015 um 05:14: > On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: >> First, I propose that a float's integer ratio should be accurate. For >> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it >> returns(6004799503160661, 18014398509481984). > > Even though he's mistaken about the core premise, I do think there's a > kernel of a good idea here - it would be nice to have a method (maybe > as_integer_ratio, maybe with some parameter added, maybe a different > method) to return with the smallest denominator that would result in > exactly the original float if divided out, rather than merely the > smallest power of two. The fractions module seems the obvious place to put this. Consider opening a feature request. Target version would be Python 3.6. Stefan From mal at egenix.com Mon Jun 1 11:05:14 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 01 Jun 2015 11:05:14 +0200 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> Message-ID: <556C204A.3070003@egenix.com> On 31.05.2015 18:19, David Townshend wrote: >> >> The default for npm is that your package dir is attached directly to the >> project. You can get more flexibility by setting an environment variable or >> creating a symlink, but normally you don't. It has about the same >> flexibility as virtualenvwrapper, with about the same amount of effort. So >> if virtualenvwrapper isn't flexible enough for you, my guess is that your >> take on npm won't be flexible enough either, it'll just come preconfigured >> for your own idiosyncratic use and everyone else will have to adjust... >> > > You have a point. Maybe lack of flexibility is not actually the issue - > it's too much flexibility. The problem that I have with virtualenv is that > it requires quite a bit of configuration and a great deal of awareness by > the user of what is going on and how things are configured. As stated on > it's home page While there is nothing specifically wrong with this, I > usually just want a way to do something in a venv without thinking too much > about where it is or when or how to activate it. If you've had a look at > the details of the sort of tool I'm proposing, it is completely > transparent. Perhaps the preconfiguration is just to my own > idiosyncrasies, but if it serves its use 90% of the time then maybe that is > good enough. If you want to have a system that doesn't require activation, you may want to take a look at what we've done with PyRun: http://www.egenix.com/products/python/PyRun/ It basically takes the "virtual" out of virtualenvs. Instead of creating a local symlinked copy of your host Python installation, you create a completely separate Python installation (which isn't much heavier than a virtualenv due to the way this is done). Once installed, everything works relative to the PyRun binary, so you don't need to activate anything when running code inside your installation: you just need to run the right PyRun binary and this automatically gives you access to everything else you installed in your environment. In our latest release, we've added requirements.txt support to the installation helper install-pyrun, so that you can run install-pyrun -r requirements.txt . to bootstrap a complete project environment with one command. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 01 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From steve at pearwood.info Mon Jun 1 14:09:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 1 Jun 2015 22:09:02 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> Message-ID: <20150601120902.GZ932@ando.pearwood.info> On Sun, May 31, 2015 at 11:37:17PM -0400, random832 at fastmail.us wrote: > On Sun, May 31, 2015, at 23:21, Jim Witschey wrote: > > I think hiding hardware number implementations has been a success for > > integers - it's a far superior API. It could be for rationals as well. > > I'd worry about unbounded complexity. For rationals, unlike integers, > values don't have to be large for their bignum representation to be > large. You and Guido both. ABC used exact integer fractions as their numeric type, and Guido has spoken many times about the cost in both time and space (memory) of numeric calculations using rationals. -- Steve From random832 at fastmail.us Mon Jun 1 14:59:47 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 01 Jun 2015 08:59:47 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> Message-ID: <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> On Mon, Jun 1, 2015, at 02:39, Andrew Barnert wrote: > At least here I think the answer is clear. %-substitution is printf-like, > and shouldn't change. If you want formatting that can be overloaded by > the type, you use {}, which already works. The original proposal was for *getting rid of* float as we know it. Which, unless the floating format specifiers for % are likewise removed, means their semantics have to be defined in terms of types that still exist. From liik.joonas at gmail.com Mon Jun 1 16:52:35 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Mon, 1 Jun 2015 17:52:35 +0300 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> Message-ID: Having some sort of decimal literal would have some advantages of its own, for one it could help against this sillyness: >>> Decimal(1.3) Decimal('1.3000000000000000444089209850062616169452667236328125') >>> Decimal('1.3') Decimal('1.3') I'm not saying that the actual data type needs to be a decimal ( might well be a float but say shove the string repr next to it so it can be accessed when needed) ..but this is one really common pitfall for new users, i know its easy to fix the code above, but this behavior is very unintuitive.. you essentially get a really expensive float when you do the obvious thing. Not sure if this is worth the effort but it would help smooth some corners potentially.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jun 1 16:58:06 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Jun 2015 00:58:06 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: <20150601145806.GB932@ando.pearwood.info> On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote: > Having decimal literals or something similar by default, though perhaps > problematic from a backwards compatibility standpoint, is a) user friendly, > b) easily understandable, and c) not surprising to beginners. None of these > qualities apply to float literals. I wish this myth about Decimals would die, because it isn't true. The only advantage of base-10 floats over base-2 floats -- and I'll admit it can be a big advantage -- is that many of the numbers we commonly care about can be represented in Decimal exactly, but not as base-2 floats. In every other way, Decimals are no more user friendly, understandable, or unsurprising than floats. Decimals violate all the same rules of arithmetic that floats do. This should not come as a surprise, since decimals *are* floats, they merely use base 10 rather than base 2. In the past, I've found that people are very resistant to this fact, so I'm going to show a few examples of how Decimals violate the fundamental laws of mathematics just as floats do. For those who already know this, please forgive me belabouring the obvious. In mathematics, adding anything other than zero to a number must give you a different number. Decimals violate that expectation just as readily as binary floats: py> from decimal import Decimal as D py> x = D(10)**30 py> x == x + 100 # should be False True Apart from zero, multiplying a number by its inverse should always give one. Again, violated by decimals: py> one_third = 1/D(3) py> 3*one_third == 1 False Inverting a number twice should give the original number back: py> 1/(1/D(7)) == 7 False Here's a violation of the Associativity Law, which states that (a+b)+c should equal a+(b+c) for any values a, b, c: py> a = D(1)/17 py> b = D(5)/7 py> c = D(12)/13 py> (a + b) + c == a + (b+c) False (For the record, it only took me two attempts, and a total of about 30 seconds, to find that example, so it's not particularly difficult to come across such violations.) Here's a violation of the Distributive Law, which states that a*(b+c) should equal a*b + a*c: py> a = D(15)/2 py> b = D(15)/8 py> c = D(1)/14 py> a*(b+c) == a*b + a*c False (I'll admit that was a bit trickier to find.) This one is a bit subtle, and to make it easier to see what is going on I will reduce the number of digits used. When you take the average of two numbers x and y, mathematically the average must fall *between* x and y. With base-2 floats, we can't guarantee that the average will be strictly between x and y, but we can be sure that it will be either between the two values, or equal to one of them. But base-10 Decimal floats cannot even guarantee that. Sometimes the calculated average falls completely outside of the inputs. py> from decimal import getcontext py> getcontext().prec = 3 py> x = D('0.516') py> y = D('0.518') py> (x+y)/2 # should be 0.517 Decimal('0.515') This one is even worse: py> getcontext().prec = 1 py> x = D('51.6') py> y = D('51.8') py> (x+y)/2 # should be 51.7 Decimal('5E+1') Instead of the correct answer of 51.7, Decimal calculates the answer as 50 exactly. > I always assumed that float literals were mostly an artifact of history or > of some performance limitations. Free of those, why would a language choose > them over decimal literals? Performance and accuracy will always be better for binary floats. Binary floats are faster, and have stronger error bounds and slower-growing errors. Decimal floats suffer from the same problems as binary floats, only more so, and are slower to boot. > When does someone ever expect floating-point > madness, unless they are doing something that is almost certainly not > common, or unless they have been burned in the past? > Every day another programmer gets bitten by floating point stupidities like > this one . It would be a big win > to kill this lame ?programmer rite of passage? and give people numbers that > work more like how they learned them in school. There's a lot wrong with that. - The sorts of errors we see with floats are not "madness", but the completely logical consequences of what happens when you try to do arithmetic in anything less than the full mathematical abstraction. - And they aren't rare either -- they're incredibly common. Fortunately, most of the time they don't matter, or aren't obvious, or both. - Decimals don't behave like the numbers you learn in school either. Floats are not real numbers, regardless of which base you use. And in fact, the smaller the base, the smaller the errors. Binary floats are better than decimals in this regard. (Decimals *only* win out due to human bias: we don't care too much that 1/7 cannot be expressed exactly as a float using *either* binary or decimal, but we do care about 1/10. And we conveniently ignore the case of 1/3, because familiarity breeds contempt.) - Being at least vaguely aware of floating point issues shouldn't be difficult for anyone who has used a pocket calculator. And yet every day brings in another programmer surprised by floats. - It's not really a rite of passage, that implies that it is arbitrary and imposed culturally. Float issues aren't arbitrary, they are baked into the very nature of the universe. You cannot hope to perform infinitely precise real-number arithmetic using just a finite number of bits of storage, no matter what system you use. Fixed-point maths has its own problems, as does rational maths. All you can do is choose to shift the errors from some calculations to other calculations, you cannot eliminate them altogether. -- Steve From mertz at gnosis.cx Mon Jun 1 16:54:13 2015 From: mertz at gnosis.cx (David Mertz) Date: Mon, 1 Jun 2015 07:54:13 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: Decimal literals are far from as obvious as suggested. We *have* the `decimal` module after all, and it defines all sorts of parameters on precision, rounding rules, etc. that one can provide context for. decimal.ROUND_HALF_DOWN is "the obvious way" for some users, while decimal.ROUND_CEILING is "the obvious way" for others. I like decimals, but they don't simply make all the mathematical answers result in what all users would would consider "do what I mean" either. On Sun, May 31, 2015 at 11:27 PM, Nicholas Chammas < nicholas.chammas at gmail.com> wrote: > I don?t know. Personally, I?d be willing to pay a performance penalty > to avoid reasoning about floating-point arithmetic most of the time, > then ?drop into? floats when I need the speed. > > This is perhaps a bit off topic for the thread, but +9000 for this. > > Having decimal literals or something similar by default, though perhaps > problematic from a backwards compatibility standpoint, is a) user friendly, > b) easily understandable, and c) not surprising to beginners. None of these > qualities apply to float literals. > > I always assumed that float literals were mostly an artifact of history or > of some performance limitations. Free of those, why would a language choose > them over decimal literals? When does someone ever expect floating-point > madness, unless they are doing something that is almost certainly not > common, or unless they have been burned in the past? > > Every day another programmer gets bitten by floating point stupidities > like this one . It would be a > big win to kill this lame ?programmer rite of passage? and give people > numbers that work more like how they learned them in school. > > The competing proposal is to treat decimal literals as decimal.Decimal > values. > > I?m interested in learning more about such a proposal. > > Nick > ? > > On Mon, Jun 1, 2015 at 2:03 AM Jim Witschey > wrote: > >> On Sun, May 31, 2015 at 11:46 PM, Matthias Bussonnier >> wrote: >> > IIRC (during | after) the language submit at PyCon this year, it was >> said that maybe the stdlib should get >> > less features, not more. >> >> Rationals (and Decimals) already exist in the standard library. The >> original proposal (as I read it, anyway) is more about the default >> interpretation of, e.g., integer division and decimal-number literals. >> >> > Side note, Sympy as a IPython ast-hook that will wrap all your integers >> into SymPy Integers and hence >> > give you rationals of whatever you like, if you want to SymPy-plify >> your life. >> >> Thank you for the pointer -- that's really cool. >> >> > But for majority of use will it be useful ? >> >> I believe interpreting "0.1" as 1/10 is more ergonomic than >> representing it as 1.600000023841858 * (2^-4). I see it as being more >> useful -- a better fit -- in most use cases because it's simpler, more >> precise, and more understandable. >> >> > What would be the performance costs ? >> >> I don't know. Personally, I'd be willing to pay a performance penalty >> to avoid reasoning about floating-point arithmetic most of the time, >> then "drop into" floats when I need the speed. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From liik.joonas at gmail.com Mon Jun 1 17:12:52 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Mon, 1 Jun 2015 18:12:52 +0300 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: I'm sorry.. what i meant was not a literal that results in a Decimal, what i meant was a special literal proxy object that usualyl acts like a float except you can ask for its original string form. eg: flit = 1.3 flit*3 == float(flit)*3 str(flit) == '1.3' thus in cases where the intermediate float conversion loses precision you can get at the original string that the programmer actually typed in. Decimal constructors are one case that woudl probably like to use the original string whenever possible to avoid conversion losses, but by no means are they the only ones. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Jun 1 17:19:37 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 1 Jun 2015 16:19:37 +0100 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150601145806.GB932@ando.pearwood.info> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: On 1 June 2015 at 15:58, Steven D'Aprano wrote: > (Decimals *only* win out due to human bias: we don't care too much that > 1/7 cannot be expressed exactly as a float using *either* binary or > decimal, but we do care about 1/10. And we conveniently ignore the case > of 1/3, because familiarity breeds contempt.) There is one other "advantage" to decimals - they behave like electronic calculators (which typically used decimal arithmetic). This is a variation of "human bias" - we (if we're of a certain age, maybe today's youngsters are less used to the vagaries of electronic calculators :-)) are used to seeing 1/3 displayed as 0.33333333, and showing that 1/3*3 = 0.99999999 was a "fun calculator fact" when I was at school. Paul From rosuav at gmail.com Mon Jun 1 18:20:04 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Jun 2015 02:20:04 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150601145806.GB932@ando.pearwood.info> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: On Tue, Jun 2, 2015 at 12:58 AM, Steven D'Aprano wrote: > This one is even worse: > > py> getcontext().prec = 1 > py> x = D('51.6') > py> y = D('51.8') > py> (x+y)/2 # should be 51.7 > Decimal('5E+1') > > Instead of the correct answer of 51.7, Decimal calculates the answer as > 50 exactly. To be fair, you've actually destroyed precision so much that your numbers start out effectively equal: >>> from decimal import Decimal as D, getcontext >>> getcontext().prec = 1 >>> x = D('51.6') >>> y = D('51.8') >>> x == y False >>> x + 0 == y + 0 True They're not actually showing up as equal, but only because the precision setting doesn't (apparently) apply to the constructor. If adding zero to both sides of an equation makes it equal when it wasn't before, something seriously screwy is going on. (Actually, this behaviour of decimal.Decimal reminds me very much of REXX. Since there are literally no data types in REXX (everything is a string), the numeric precision setting ("NUMERIC DIGITS n") applies only to arithmetic operations, so the same thing of adding zero to both sides can happen.) So what you're really doing here is averaging 5E+1 and 5E+1, with an unsurprising result of... 5E+1. Your other example is more significant here, because your numbers actually do fit inside the precision limits - and then the end result slips outside the bounds. ChrisA From random832 at fastmail.us Mon Jun 1 18:43:43 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 01 Jun 2015 12:43:43 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150601145806.GB932@ando.pearwood.info> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: <1433177023.3063935.283721313.7C625856@webmail.messagingengine.com> On Mon, Jun 1, 2015, at 10:58, Steven D'Aprano wrote: > I wish this myth about Decimals would die, because it isn't true. The > only advantage of base-10 floats over base-2 floats -- and I'll admit it > can be a big advantage -- is that many of the numbers we commonly care > about can be represented in Decimal exactly, but not as base-2 floats. > In every other way, Decimals are no more user friendly, understandable, > or unsurprising than floats. Decimals violate all the same rules of > arithmetic that floats do. But people have been learning about those rules, as apply to decimals, since they were small children. They know intuitively that 2/3 rounds to ...6667 at some point because they've done exactly that by hand. "user friendly" and "understandable to beginners" don't arise in a vacuum. From techtonik at gmail.com Mon Jun 1 17:46:34 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 1 Jun 2015 18:46:34 +0300 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: <20150530001811.GS932@ando.pearwood.info> References: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com> <20150530001811.GS932@ando.pearwood.info> Message-ID: On Sat, May 30, 2015 at 3:18 AM, Steven D'Aprano wrote: > > As far as I can see, he has been given the solution, or at least a > potential solution, on python-list, but as far as I can tell he either > hasn't read it, or doesn't like the solutions offerred and so is > ignoring them. Let me update you on this. There was no solution given. Only the pointers to go read some pointers on the internets again. So, yes, I read replies. But I have very little time to analyse and follow up. The idea I wanted to convey in this thread is that encode/decode is confusing, so if you agree with that, I can start to propose alternatives. And just to make you understand the importance of the question with translating from bytes to unicode and back, let me just tell that this question is the third one voted with 221k views on SO in Python 3 tag. http://stackoverflow.com/questions/tagged/python-3.x -- anatoly t. From nicholas.chammas at gmail.com Mon Jun 1 19:24:32 2015 From: nicholas.chammas at gmail.com (Nicholas Chammas) Date: Mon, 1 Jun 2015 13:24:32 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: Well, I learned a lot about decimals today. :) On Mon, Jun 1, 2015 at 3:08 AM, Nick Coghlan ncoghlan at gmail.com wrote: In a world of binary computers, no programming language is free of those constraints - if you choose decimal literals as your default, you take a *big* performance hit, because computers are designed as binary systems. (Some languages, like IBM?s REXX, do choose to use decimal integers by default) I guess it?s a non-trivial tradeoff. But I would lean towards considering people likely to be affected by the performance hit as doing something ?not common?. Like, if they are doing that many calculations that it matters, perhaps it makes sense to ask them to explicitly ask for floats vs. decimals, in exchange for giving the majority who wouldn?t notice a performance difference a better user experience. On Mon, Jun 1, 2015 at 10:58 AM, Steven D?Aprano steve at pearwood.info wrote: I wish this myth about Decimals would die, because it isn?t true. Your email had a lot of interesting information about decimals that would make a good blog post, actually. Writing one up will perhaps help kill this myth in the long run :) In the past, I?ve found that people are very resistant to this fact, so I?m going to show a few examples of how Decimals violate the fundamental laws of mathematics just as floats do. How many of your examples are inherent limitations of decimals vs. problems that can be improved upon? Admittedly, the only place where I?ve played with decimals extensively is on Microsoft?s SQL Server (where they are the default literal ). I?ve stumbled in the past on my own decimal gotchas , but looking at your examples and trying them on SQL Server I suspect that most of the problems you show are problems of precision and scale. Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s , for example), or better defaults when they are not specified? Anyway, here?s what happens on SQL Server for some of the examples you provided. Adding 100: py> from decimal import Decimal as D py> x = D(10)**30 py> x == x + 100 # should be False True DECLARE @x DECIMAL(38,0) = '1' + REPLICATE(0, 30); IF @x = @x + 100 SELECT 'equal' AS adding_100ELSE SELECT 'not equal' AS adding_100 Gives ?not equal? . Leaving out the precision when declaring @x (i.e. going with the default precision of 18 ) immediately yields an understandable data truncation error. Associativity: py> a = D(1)/17 py> b = D(5)/7 py> c = D(12)/13 py> (a + b) + c == a + (b+c) False DECLARE @a DECIMAL = 1.0/17;DECLARE @b DECIMAL = 5.0/7;DECLARE @c DECIMAL = 12.0/13; IF (@a + @b) + @c = @a + (@b + @c) SELECT 'equal' AS associativeELSE SELECT 'not equal' AS associative Gives ?equal? . Distributivity: py> a = D(15)/2 py> b = D(15)/8 py> c = D(1)/14 py> a*(b+c) == a*b + a*c False DECLARE @a DECIMAL = 15.0/2;DECLARE @b DECIMAL = 15.0/8;DECLARE @c DECIMAL = 1.0/14; IF @a * (@b + @c) = @a*@b + @a*@c SELECT 'equal' AS distributiveELSE SELECT 'not equal' AS distributive Gives ?equal? . I think some of the other decimal examples you provide, though definitely not 100% beginner friendly, are still way more human-friendly because they are explainable in terms of precision and scale, which we can understand more simply (?there aren?t enough decimal places to carry the result?) and which have parallels in other areas of life as Paul pointed out. - The sorts of errors we see with floats are not ?madness?, but the completely logical consequences of what happens when you try to do arithmetic in anything less than the full mathematical abstraction. I don?t mean madness as in incorrect, I mean madness as in difficult to predict and difficult to understand. Your examples do show that it isn?t all roses and honey with decimals, but do you find it easier to understand explain all the weirdness of floats vs. decimals? Understanding float weirdness (and disclaimer: I don?t) seems to require understanding some hairy stuff, and even then it is not predictable because there are platform dependent issues. Understanding decimal ?weirdness? seems to require only understanding precision and scale, and after that it is mostly predictable. Nick On Mon, Jun 1, 2015 at 11:19 AM Paul Moore wrote: On 1 June 2015 at 15:58, Steven D'Aprano wrote: > > (Decimals *only* win out due to human bias: we don't care too much that > > 1/7 cannot be expressed exactly as a float using *either* binary or > > decimal, but we do care about 1/10. And we conveniently ignore the case > > of 1/3, because familiarity breeds contempt.) > > There is one other "advantage" to decimals - they behave like > electronic calculators (which typically used decimal arithmetic). This > is a variation of "human bias" - we (if we're of a certain age, maybe > today's youngsters are less used to the vagaries of electronic > calculators :-)) are used to seeing 1/3 displayed as 0.33333333, and > showing that 1/3*3 = 0.99999999 was a "fun calculator fact" when I was > at school. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Mon Jun 1 19:58:42 2015 From: carl at oddbird.net (Carl Meyer) Date: Mon, 01 Jun 2015 11:58:42 -0600 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <556C204A.3070003@egenix.com> References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> <556C204A.3070003@egenix.com> Message-ID: <556C9D52.3090007@oddbird.net> Hi, On 06/01/2015 03:05 AM, M.-A. Lemburg wrote: > If you want to have a system that doesn't require activation, > you may want to take a look at what we've done with PyRun: Virtualenv doesn't require activation either. Activation is a convenience for running repeated commands in the virtualenv context, but all it does is change your shell PATH; you can explicitly specify the virtualenv's python binary and never use activation, if you wish. > http://www.egenix.com/products/python/PyRun/ > > It basically takes the "virtual" out of virtualenvs. Instead > of creating a local symlinked copy of your host Python installation, > you create a completely separate Python installation (which isn't > much heavier than a virtualenv due to the way this is done). Virtualenv doesn't create "a local symlinked copy of your host Python installation." It copies the binary, symlinks a few key stdlib modules that are necessary to bootstrap site.py, and then its custom site.py finds the host Python's stdlib directory and adds it to `sys.path`. > Once installed, everything works relative to the PyRun binary, > so you don't need to activate anything when running code inside > your installation: you just need to run the right PyRun binary > and this automatically gives you access to everything else > you installed in your environment. This is exactly how virtualenv (and pyvenv in Python 3.3+) works. Everything is relative to the Python binary in the virtualenv (this behavior is built into the Python executable, actually). You can just directly run the virtualenv's Python binary (or any script with that Python binary in its shebang, which includes all pip or easy-installed scripts in the virtualenv's bin/ dir), without ever activating anything. It seems the main difference between virtualenv and PyRun is in how much of the standard library is bundled with each environment, and that I guess PyRun doesn't come with any convenience activation shell script? But the method by which "activation" actually occurs is identical (at least as far as you're described it here.) Carl > In our latest release, we've added requirements.txt support > to the installation helper install-pyrun, so that you can > run > > install-pyrun -r requirements.txt . > > to bootstrap a complete project environment with one command. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From breamoreboy at yahoo.co.uk Mon Jun 1 20:32:03 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 01 Jun 2015 19:32:03 +0100 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> Message-ID: On 01/06/2015 15:52, Joonas Liik wrote: > Having some sort of decimal literal would have some advantages of its > own, for one it could help against this sillyness: > > >>> Decimal(1.3) > Decimal('1.3000000000000000444089209850062616169452667236328125') > > >>> Decimal('1.3') > Decimal('1.3') > > I'm not saying that the actual data type needs to be a decimal ( > might well be a float but say shove the string repr next to it so it can > be accessed when needed) > > ..but this is one really common pitfall for new users, i know its easy > to fix the code above, > but this behavior is very unintuitive.. you essentially get a really > expensive float when you do the obvious thing. > > Not sure if this is worth the effort but it would help smooth some > corners potentially.. > Far easier to point them to https://docs.python.org/3/library/decimal.html and/or https://docs.python.org/3/tutorial/floatingpoint.html -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mal at egenix.com Mon Jun 1 20:32:56 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 01 Jun 2015 20:32:56 +0200 Subject: [Python-ideas] npm-style venv-aware launcher In-Reply-To: <556C9D52.3090007@oddbird.net> References: <1136321E-6C0F-4B7A-B6E5-8E60917EEDAC@yahoo.com> <06BB7C80-33A5-4339-A908-522918C1F1B5@yahoo.com> <556C204A.3070003@egenix.com> <556C9D52.3090007@oddbird.net> Message-ID: <556CA558.5000804@egenix.com> On 01.06.2015 19:58, Carl Meyer wrote: > Hi, > > On 06/01/2015 03:05 AM, M.-A. Lemburg wrote: >> If you want to have a system that doesn't require activation, >> you may want to take a look at what we've done with PyRun: > > Virtualenv doesn't require activation either. > > Activation is a convenience for running repeated commands in the > virtualenv context, but all it does is change your shell PATH; you can > explicitly specify the virtualenv's python binary and never use > activation, if you wish. Ok, I was always under the impression that the activation script also does other magic to have the virtualenv Python find the right settings. That's good to know, thanks. >> http://www.egenix.com/products/python/PyRun/ >> >> It basically takes the "virtual" out of virtualenvs. Instead >> of creating a local symlinked copy of your host Python installation, >> you create a completely separate Python installation (which isn't >> much heavier than a virtualenv due to the way this is done). > > Virtualenv doesn't create "a local symlinked copy of your host Python > installation." It copies the binary, symlinks a few key stdlib modules > that are necessary to bootstrap site.py, and then its custom site.py > finds the host Python's stdlib directory and adds it to `sys.path`. Well, this is what I call a symlinked copy :-) It still points to the system installed Python for the stdlib, shared mods and include files. >> Once installed, everything works relative to the PyRun binary, >> so you don't need to activate anything when running code inside >> your installation: you just need to run the right PyRun binary >> and this automatically gives you access to everything else >> you installed in your environment. > > This is exactly how virtualenv (and pyvenv in Python 3.3+) works. > Everything is relative to the Python binary in the virtualenv (this > behavior is built into the Python executable, actually). You can just > directly run the virtualenv's Python binary (or any script with that > Python binary in its shebang, which includes all pip or easy-installed > scripts in the virtualenv's bin/ dir), without ever activating anything. > > It seems the main difference between virtualenv and PyRun is in how much > of the standard library is bundled with each environment, The main difference is that PyRun is a stand-alone Python runtime which doesn't depend on the system Python installation at all. We created it to no longer have to worry about supporting dozens of different Python installation variants on Unix platforms and it turned out to be small enough to just always use instead of virtualenv. > and that I > guess PyRun doesn't come with any convenience activation shell script? > But the method by which "activation" actually occurs is identical (at > least as far as you're described it here.) After what you've explained, the sys.path setup is indeed very similar (well, PyRun doesn't really need much of it since almost the whole Python stdlib is baked into the binary). What virtualenv doesn't appear to do is update sysconfig to point to the virtualenv environment instead of the host system. > Carl > >> In our latest release, we've added requirements.txt support >> to the installation helper install-pyrun, so that you can >> run >> >> install-pyrun -r requirements.txt . >> >> to bootstrap a complete project environment with one command. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 01 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From surya.subbarao1 at gmail.com Mon Jun 1 21:13:44 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Mon, 1 Jun 2015 12:13:44 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 2 In-Reply-To: References: Message-ID: Floats internally use numerator and denominator (float.as_integer_ratio().) It makes no sense to have sign and mantissa while displaying numerator and denominator. Perhaps a redo of the class? I believe fractions should be the standard, and just keep the ieee754 floats as a side option. On Sun, May 31, 2015 at 7:37 PM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Python Float Update (u8y7541 The Awesome Person) > 2. Re: Python Float Update (Nathaniel Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 31 May 2015 19:25:46 -0700 > From: u8y7541 The Awesome Person > To: python-ideas at python.org > Subject: [Python-ideas] Python Float Update > Message-ID: > < > CA+o1fZONRG_dZzct_VZwcUvtwD-5rJ6zOFxNR34jWFrpUXiw9Q at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Dear Python Developers: > > I will be presenting a modification to the float class, which will improve > its speed and accuracy (reduce floating point errors). This is applicable > because Python uses a numerator and denominator rather than a sign and > mantissa to represent floats. > > First, I propose that a float's integer ratio should be accurate. For > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > returns(6004799503160661, 18014398509481984). > > Second of all, even though 1 * 3 = 3 (last example), 6004799503160661 * 3 > does not equal 18014398509481984. Instead, it equals 1801439850948198**3**, > one less than the value in the ratio. This means the ratio is inaccurate, > as well as completely not simplified. > > > [image: Inline image 1] > > > Even if the value displayed for a float is a rounded value, the internal > numerator and denominator should divide to equal to completely accurate > value. > > Thanks for considering this improvement! > > Sincerely, > u8y7541 > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150531/019029a6/attachment-0001.html > > > -------------- next part -------------- > A non-text attachment was scrubbed... > Name: pythonfloats.PNG > Type: image/png > Size: 16278 bytes > Desc: not available > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150531/019029a6/attachment-0001.png > > > > ------------------------------ > > Message: 2 > Date: Sun, 31 May 2015 19:37:14 -0700 > From: Nathaniel Smith > To: u8y7541 The Awesome Person > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > wE1PWDSrquzg at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On May 31, 2015 7:26 PM, "u8y7541 The Awesome Person" < > surya.subbarao1 at gmail.com> wrote: > > > > Dear Python Developers: > > > > I will be presenting a modification to the float class, which will > improve its speed and accuracy (reduce floating point errors). This is > applicable because Python uses a numerator and denominator rather than a > sign and mantissa to represent floats. > > Python's floats are in fact ieee754 floats, using sign/mantissa/exponent, > as provided by all popular CPU floating point hardware. This is why you're > getting the results you see -- 1/3 cannot be exactly represented as a > float, so it gets rounded to the closest representable float, and then > as_integer_ratio shows you an exact representation of this rounded value. > It sounds like you're instead looking for an exact fraction representation, > which in python is available in the standard "fractions" module: > > https://docs.python.org/3.5/library/fractions.html > > -n > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150531/d3147de9/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 103, Issue 2 > ******************************************** > -- -Surya Subbarao -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.subbarao1 at gmail.com Mon Jun 1 21:22:40 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Mon, 1 Jun 2015 12:22:40 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 3 In-Reply-To: References: Message-ID: Maybe we could make a C implementation of the Fraction module? That would be nice. On Sun, May 31, 2015 at 8:28 PM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: Python Float Update (Chris Angelico) > 2. Re: Python Float Update (random832 at fastmail.us) > 3. Re: Python Float Update (Jim Witschey) > 4. Re: Python Float Update (David Mertz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 1 Jun 2015 12:48:12 +1000 > From: Chris Angelico > Cc: python-ideas > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > W5h6etG5TscV5uU6zWhxVbgQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person > wrote: > > > > I will be presenting a modification to the float class, which will > improve its speed and accuracy (reduce floating point errors). This is > applicable because Python uses a numerator and denominator rather than a > sign and mantissa to represent floats. > > > > First, I propose that a float's integer ratio should be accurate. For > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > returns(6004799503160661, 18014398509481984). > > > > I think you're misunderstanding the as_integer_ratio method. That > isn't how Python works internally; that's a service provided for > parsing out float internals into something more readable. What you > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no > idea what Python-the-language stipulates, nor what other Python > implementations use, but that's what CPython uses, and you did your > initial experiments with CPython. None of this discussion applies *at > all* if a Python implementation doesn't use IEEE 754.) So internally, > 1/3 is stored as: > > 0 <-- sign bit (positive) > 01111111101 <-- exponent (1021) > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 > bits, repeating) > > The exponent is offset by 1023, so this means 1.010101.... divided by > 2?; the original repeating value is exactly equal to 4/3, so this is > correct, but as soon as it's squeezed into a finite-sized mantissa, it > gets rounded - in this case, rounded down. > > That's where your result comes from. It's been rounded such that it > fits inside IEEE 754, and then converted back to a fraction > afterwards. You're never going to get an exact result for anything > with a denominator that isn't a power of two. Fortunately, Python does > offer a solution: store your number as a pair of integers, rather than > as a packed floating point value, and all calculations truly will be > exact (at the cost of performance): > > >>> one_third = fractions.Fraction(1, 3) > >>> one_eighth = fractions.Fraction(1, 8) > >>> one_third + one_eighth > Fraction(11, 24) > > This is possibly more what you want to work with. > > ChrisA > > > ------------------------------ > > Message: 2 > Date: Sun, 31 May 2015 23:14:06 -0400 > From: random832 at fastmail.us > To: python-ideas at python.org > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > <1433128446.31560.283106753.6D60F98F at webmail.messagingengine.com> > Content-Type: text/plain > > On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: > > First, I propose that a float's integer ratio should be accurate. For > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > > returns(6004799503160661, 18014398509481984). > > Even though he's mistaken about the core premise, I do think there's a > kernel of a good idea here - it would be nice to have a method (maybe > as_integer_ratio, maybe with some parameter added, maybe a different > method) to return with the smallest denominator that would result in > exactly the original float if divided out, rather than merely the > smallest power of two. > > > ------------------------------ > > Message: 3 > Date: Mon, 01 Jun 2015 03:21:36 +0000 > From: Jim Witschey > To: Chris Angelico > Cc: python-ideas > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > kg at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Teachable moments about the implementation of floating-point aside, > something in this neighborhood has been considered and rejected before, in > PEP 240. However, that was in 2001 - it was apparently created the same day > as PEP 237, which introduced transparent conversion of machine ints to > bignums in the int type. > > I think hiding hardware number implementations has been a success for > integers - it's a far superior API. It could be for rationals as well. > > Has something like this thread's original proposal - interpeting > decimal-number literals as fractional values and using fractions as the > result of integer arithmetic - been seriously discussed more recently than > PEP 240? If so, why haven't they been implemented? Perhaps enough has > changed that it's worth reconsidering. > > > On Sun, May 31, 2015 at 22:49 Chris Angelico wrote: > > > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person > > wrote: > > > > > > I will be presenting a modification to the float class, which will > > improve its speed and accuracy (reduce floating point errors). This is > > applicable because Python uses a numerator and denominator rather than a > > sign and mantissa to represent floats. > > > > > > First, I propose that a float's integer ratio should be accurate. For > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > > returns(6004799503160661, 18014398509481984). > > > > > > > I think you're misunderstanding the as_integer_ratio method. That > > isn't how Python works internally; that's a service provided for > > parsing out float internals into something more readable. What you > > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no > > idea what Python-the-language stipulates, nor what other Python > > implementations use, but that's what CPython uses, and you did your > > initial experiments with CPython. None of this discussion applies *at > > all* if a Python implementation doesn't use IEEE 754.) So internally, > > 1/3 is stored as: > > > > 0 <-- sign bit (positive) > > 01111111101 <-- exponent (1021) > > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 > > bits, repeating) > > > > The exponent is offset by 1023, so this means 1.010101.... divided by > > 2?; the original repeating value is exactly equal to 4/3, so this is > > correct, but as soon as it's squeezed into a finite-sized mantissa, it > > gets rounded - in this case, rounded down. > > > > That's where your result comes from. It's been rounded such that it > > fits inside IEEE 754, and then converted back to a fraction > > afterwards. You're never going to get an exact result for anything > > with a denominator that isn't a power of two. Fortunately, Python does > > offer a solution: store your number as a pair of integers, rather than > > as a packed floating point value, and all calculations truly will be > > exact (at the cost of performance): > > > > >>> one_third = fractions.Fraction(1, 3) > > >>> one_eighth = fractions.Fraction(1, 8) > > >>> one_third + one_eighth > > Fraction(11, 24) > > > > This is possibly more what you want to work with. > > > > ChrisA > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150601/5e792f59/attachment-0001.html > > > > ------------------------------ > > Message: 4 > Date: Sun, 31 May 2015 20:27:47 -0700 > From: David Mertz > To: random832 at fastmail.us > Cc: python-ideas > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > < > CAEbHw4biWOxjYR0vtu8ykwSt63y9dcsR2-FtLPGfyUvqx0GgsQ at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Sun, May 31, 2015 at 8:14 PM, wrote: > > > Even though he's mistaken about the core premise, I do think there's a > > kernel of a good idea here - it would be nice to have a method (maybe > > as_integer_ratio, maybe with some parameter added, maybe a different > > method) to return with the smallest denominator that would result in > > exactly the original float if divided out, rather than merely the > > smallest power of two. > > > > What is the computational complexity of a hypothetical > float.as_simplest_integer_ratio() method? How hard that is to find is not > obvious to me (probably it should be, but I'm not sure). > > -- > Keeping medicines from the bloodstreams of the sick; food > from the bellies of the hungry; books from the hands of the > uneducated; technology from the underdeveloped; and putting > advocates of freedom in prisons. Intellectual property is > to the 21st century what the slave trade was to the 16th. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150531/49264b3d/attachment.html > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 103, Issue 3 > ******************************************** > -- -Surya Subbarao -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Mon Jun 1 22:18:49 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 01 Jun 2015 22:18:49 +0200 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: u8y7541 The Awesome Person schrieb am 01.06.2015 um 21:22: > Maybe we could make a C implementation of the Fraction module? That would > be nice. See the quicktions module: https://pypi.python.org/pypi/quicktions Stefan From random832 at fastmail.us Mon Jun 1 23:09:40 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Mon, 01 Jun 2015 17:09:40 -0400 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 2 In-Reply-To: References: Message-ID: <1433192980.60608.284042721.405C9843@webmail.messagingengine.com> On Mon, Jun 1, 2015, at 15:13, u8y7541 The Awesome Person wrote: > Floats internally use numerator and denominator > (float.as_integer_ratio().) The fact that this method exists is not actually evidence that this form is used internally. This is a utility method, provided, I suspect, for the use of the fractions.Fraction constructor (which class does use a numerator and denominator). From abarnert at yahoo.com Mon Jun 1 23:56:46 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 14:56:46 -0700 Subject: [Python-ideas] Why decode()/encode() name is harmful In-Reply-To: References: <659FCF6A-91F0-4D7D-A88E-28CD1D18EC38@yahoo.com> <20150530001811.GS932@ando.pearwood.info> Message-ID: On Jun 1, 2015, at 08:46, anatoly techtonik wrote: > >> On Sat, May 30, 2015 at 3:18 AM, Steven D'Aprano wrote: >> >> As far as I can see, he has been given the solution, or at least a >> potential solution, on python-list, but as far as I can tell he either >> hasn't read it, or doesn't like the solutions offerred and so is >> ignoring them. > > Let me update you on this. There was no solution given. Only the > pointers to go read some pointers on the internets again. So, yes, > I read replies. But I have very little time to analyse and follow up. Hold on. You had a question, you don't have time to read the answers you were given, so instead you think Python needs to change? > The idea I wanted to convey in this thread is that encode/decode > is confusing, so if you agree with that, I can start to propose > alternatives. > > And just to make you understand the importance of the question > with translating from bytes to unicode and back, let me just tell > that this question is the third one voted with 221k views on SO in > Python 3 tag. First, as multiple people including the OP say in the comments to that question, what's confusing to novices is that subprocess pipes are the first thing they've used that are binary by default instead of text by default. (For other novices that will instead happen with sockets. But it will eventually happen somewhere.) So, maybe the subprocess docs need a prominent link to, say, the Unicode HOWTO, which is what the OP of that question seems to be proposing. Or maybe it should just be easier to open subprocess pipes in text mode, as it is for files. But I don't see how renaming the methods could possibly help anything. The problem is not that the OP saw the answer and didn't understand or believe it, it's that he didn't know how to search for it. When told the right answer, he immediately said "Thanks, that does it" not "Whatchootalkinbout Willis, I don't have any crypto here". I've never heard of anyone besides you having that reaction. Also, your own answer there is a really bad idea. It was an intentional part of the design of UTF-8 that decoding non-UTF-8 non-ASCII text as if it were UTF-8 will almost always signal an error. It's not a good thing to silently get mojibake instead of getting an error--it just pushes the problem back further, to someone it's harder to understand, find, and debug. In the worst case, it just pushes the problem all the way to the end user, who's even less equipped to deal with it than you when his Russian characters get turned into box graphics. If you have bytes and you want text, the only solution to that is to find out the encoding and decode it. That's not a problem with Python, it's a problem with the proliferation of incompatible encodings that people have used without any in-band or out-of-band indications over the past few decades. Of course there are cases where you want to smuggle bytes with text, or degrade as gracefully as possible on errors, or whatever. That's why decode takes an error handler. But in the usual case, if you try to interpret something as UTF-8 when it's really cp1252, or interpret something as Big5 when it's really Shift-JIS, or whatever, an error is exactly what you should hope for, to tell you that you guessed wrong. That's why it's the default. > http://stackoverflow.com/questions/tagged/python-3.x > > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Tue Jun 2 00:15:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 15:15:16 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: On Jun 1, 2015, at 10:24, Nicholas Chammas wrote: > > Well, I learned a lot about decimals today. :) > > On Mon, Jun 1, 2015 at 3:08 AM, Nick Coghlan ncoghlan at gmail.com wrote: > > In a world of binary computers, no programming language is free of > those constraints - if you choose decimal literals as your default, > you take a big performance hit, because computers are designed as > binary systems. (Some languages, like IBM?s REXX, do choose to use > decimal integers by default) > > I guess it?s a non-trivial tradeoff. But I would lean towards considering people likely to be affected by the performance hit as doing something ?not common?. Like, if they are doing that many calculations that it matters, perhaps it makes sense to ask them to explicitly ask for floats vs. decimals, in exchange for giving the majority who wouldn?t notice a performance difference a better user experience. > > On Mon, Jun 1, 2015 at 10:58 AM, Steven D?Aprano steve at pearwood.info wrote: > > I wish this myth about Decimals would die, because it isn?t true. > > Your email had a lot of interesting information about decimals that would make a good blog post, actually. Writing one up will perhaps help kill this myth in the long run :) > > In the past, I?ve found that people are very resistant to this fact, so > I?m going to show a few examples of how Decimals violate the fundamental > laws of mathematics just as floats do. > > How many of your examples are inherent limitations of decimals vs. problems that can be improved upon? > > Admittedly, the only place where I?ve played with decimals extensively is on Microsoft?s SQL Server (where they are the default literal). I?ve stumbled in the past on my own decimal gotchas, but looking at your examples and trying them on SQL Server I suspect that most of the problems you show are problems of precision and scale. > > Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s, for example), or better defaults when they are not specified? > > Anyway, here?s what happens on SQL Server for some of the examples you provided. > > Adding 100: > > py> from decimal import Decimal as D > py> x = D(10)**30 > py> x == x + 100 # should be False > True > > DECLARE @x DECIMAL(38,0) = '1' + REPLICATE(0, 30); > > IF @x = @x + 100 > SELECT 'equal' AS adding_100 > ELSE > SELECT 'not equal' AS adding_100 > Gives ?not equal?. Leaving out the precision when declaring @x (i.e. going with the default precision of 18) immediately yields an understandable data truncation error. > Obviously if you know the maximum precision needed before you start and explicitly set it to something big enough (or 7 places bigger than needed) you won't have any problem. Steven chose a low precision just to make the problems easy to see and understand; he could just as easily have constructed examples for a precision of 18. Unfortunately, even in cases where it is both possible and sufficiently efficient to work out and set the precision high enough to make all of your calculations exact, that's not something most people know how to do reliably. In the fully general case, it's as hard as calculating error propagation. As for the error: Python's decimal flags that too; the difference is that the flag is ignored by default. You can change it to warn or error instead. Maybe the solution is to make that easier--possibly just changing the docs. If you read the whole thing you will eventually learn that the default context ignores most such errors, but a one-liner gets you a different context that acts like SQL Server, but who reads the whole module docs (especially when they already believe they understand how decimal arithmetic works)? Maybe moving that up near the top would be useful? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jun 2 00:20:19 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 01 Jun 2015 18:20:19 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On 6/1/2015 2:02 AM, Jim Witschey wrote: > On Mon, Jun 1, 2015 at 12:37 AM, Terry Reedy wrote: >> The competing proposal is to treat decimal literals as decimal.Decimal >> values. > > Is that an existing PEP? I couldn't find any such proposal. No, it is an idea presented here and other python lists. Example: just today, Laura Creighton wrote on python-list (Re: What is considered an "advanced" topic in Python?) > But I am a bad arguer. > When incompatibilites were going into Python 3.0 I wanted > y = 1.3 to give you a decimal, not a float. > If you wanted a float you would have to write y = 1.3f or something. > I lost that one too. I still think it would be great. > But, hell, I write accounting and bookkeeping systems. Your milage > may vary. :) There is no PEP AFAIK because no one has bothered to write one sure to be rejected. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jun 2 00:26:46 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 01 Jun 2015 18:26:46 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> Message-ID: On 6/1/2015 10:52 AM, Joonas Liik wrote: > Having some sort of decimal literal would have some advantages of its > own, for one it could help against this sillyness: > > >>> Decimal(1.3) > Decimal('1.3000000000000000444089209850062616169452667236328125') > > >>> Decimal('1.3') > Decimal('1.3') > > I'm not saying that the actual data type needs to be a decimal ( > might well be a float but say shove the string repr next to it so it can > be accessed when needed) > > ..but this is one really common pitfall for new users, i know its easy > to fix the code above, > but this behavior is very unintuitive.. you essentially get a really > expensive float when you do the obvious thing. > > Not sure if this is worth the effort but it would help smooth some > corners potentially.. Since Decimal is designed specifically for money calculations, $ could be used as a generic money suffix. 1.3$ == Decimal(1.3) .0101$ (money multiplier, as with interest) -- Terry Jan Reedy From abarnert at yahoo.com Tue Jun 2 00:30:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 15:30:08 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On Jun 1, 2015, at 08:12, Joonas Liik wrote: > > I'm sorry.. > > what i meant was not a literal that results in a Decimal, what i meant was a special literal proxy object that usualyl acts like a float except you can ask for its original string form. This is essentially what I was saying with new "literal constant" types. Swift is probably the most prominent language with this feature. http://nshipster.com/swift-literal-convertible/ is a good description of how it works. Many of the reasons Swift needed this don't apply in Python. For example, in Swift, it's how you can build a Set at compile time from an ArrayLiteral instead of building an Array and converting it to Set at compile time. Or how you can use 0 as a default value for a non-integer type without getting a TypeError or a runtime conversion. Or how you can build an Optional that acts like a real ADT but assign it nil instead of a special enumeration value. Or how you can decode UTF-8 source text to store in UTF-16 or UTF-32 or grapheme-cluster at compile time. And so on. > > eg: > > flit = 1.3 > flit*3 == float(flit)*3 > str(flit) == '1.3' > > thus in cases where the intermediate float conversion loses precision you can get at the original string that the programmer actually typed in. > > Decimal constructors are one case that woudl probably like to use the original string whenever possible to avoid conversion losses, > but by no means are they the only ones. > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Tue Jun 2 00:40:55 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 15:40:55 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: Sorry, I accidentally sent that before it was done... Sent from my iPhone > On Jun 1, 2015, at 15:30, Andrew Barnert via Python-ideas wrote: > >> On Jun 1, 2015, at 08:12, Joonas Liik wrote: >> >> I'm sorry.. >> >> what i meant was not a literal that results in a Decimal, what i meant was a special literal proxy object that usualyl acts like a float except you can ask for its original string form. > > This is essentially what I was saying with new "literal constant" types. Swift is probably the most prominent language with this feature. http://nshipster.com/swift-literal-convertible/ is a good description of how it works. Many of the reasons Swift needed this don't apply in Python. For example, in Swift, it's how you can build a Set at compile time from an ArrayLiteral instead of building an Array and converting it to Set at compile time. Or how you can use 0 as a default value for a non-integer type without getting a TypeError or a runtime conversion. Or how you can build an Optional that acts like a real ADT but assign it nil instead of a special enumeration value. Or how you can decode UTF-8 source text to store in UTF-16 or UTF-32 or grapheme-cluster at compile time. And so on. Anyway, my point was that the Swift feature is complicated, and has some controversial downsides (e.g., see the example at the end of silently using a string literal as if it were a URL by accessing an attribute of the NSURL class--which works given the Smalltalk-derived style of OO, but many people still find it confusing). But the basic idea can be extracted out and Pythonified: The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. The advantage of C++-style user-defined literal suffixes is that the absence of a suffix is something the compiler can see, so 1.23d might still require a runtime call, but 1.23 just is compiled as a float constant the same as it's been since Python 1.x. From surya.subbarao1 at gmail.com Tue Jun 2 00:46:46 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Mon, 1 Jun 2015 15:46:46 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 10 In-Reply-To: References: Message-ID: Thanks Stefan for quicktions! On Mon, Jun 1, 2015 at 1:18 PM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: Python-ideas Digest, Vol 103, Issue 3 > (u8y7541 The Awesome Person) > 2. Re: Python Float Update (Stefan Behnel) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 1 Jun 2015 12:22:40 -0700 > From: u8y7541 The Awesome Person > To: python-ideas at python.org > Subject: Re: [Python-ideas] Python-ideas Digest, Vol 103, Issue 3 > Message-ID: > AKBncpAKXy+QO+vhF8ENtAXL6qg at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Maybe we could make a C implementation of the Fraction module? That would > be nice. > > On Sun, May 31, 2015 at 8:28 PM, wrote: > > > Send Python-ideas mailing list submissions to > > python-ideas at python.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > https://mail.python.org/mailman/listinfo/python-ideas > > or, via email, send a message with subject or body 'help' to > > python-ideas-request at python.org > > > > You can reach the person managing the list at > > python-ideas-owner at python.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Python-ideas digest..." > > > > > > Today's Topics: > > > > 1. Re: Python Float Update (Chris Angelico) > > 2. Re: Python Float Update (random832 at fastmail.us) > > 3. Re: Python Float Update (Jim Witschey) > > 4. Re: Python Float Update (David Mertz) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Mon, 1 Jun 2015 12:48:12 +1000 > > From: Chris Angelico > > Cc: python-ideas > > Subject: Re: [Python-ideas] Python Float Update > > Message-ID: > > > W5h6etG5TscV5uU6zWhxVbgQ at mail.gmail.com> > > Content-Type: text/plain; charset=UTF-8 > > > > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person > > wrote: > > > > > > I will be presenting a modification to the float class, which will > > improve its speed and accuracy (reduce floating point errors). This is > > applicable because Python uses a numerator and denominator rather than a > > sign and mantissa to represent floats. > > > > > > First, I propose that a float's integer ratio should be accurate. For > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > > returns(6004799503160661, 18014398509481984). > > > > > > > I think you're misunderstanding the as_integer_ratio method. That > > isn't how Python works internally; that's a service provided for > > parsing out float internals into something more readable. What you > > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no > > idea what Python-the-language stipulates, nor what other Python > > implementations use, but that's what CPython uses, and you did your > > initial experiments with CPython. None of this discussion applies *at > > all* if a Python implementation doesn't use IEEE 754.) So internally, > > 1/3 is stored as: > > > > 0 <-- sign bit (positive) > > 01111111101 <-- exponent (1021) > > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 > > bits, repeating) > > > > The exponent is offset by 1023, so this means 1.010101.... divided by > > 2?; the original repeating value is exactly equal to 4/3, so this is > > correct, but as soon as it's squeezed into a finite-sized mantissa, it > > gets rounded - in this case, rounded down. > > > > That's where your result comes from. It's been rounded such that it > > fits inside IEEE 754, and then converted back to a fraction > > afterwards. You're never going to get an exact result for anything > > with a denominator that isn't a power of two. Fortunately, Python does > > offer a solution: store your number as a pair of integers, rather than > > as a packed floating point value, and all calculations truly will be > > exact (at the cost of performance): > > > > >>> one_third = fractions.Fraction(1, 3) > > >>> one_eighth = fractions.Fraction(1, 8) > > >>> one_third + one_eighth > > Fraction(11, 24) > > > > This is possibly more what you want to work with. > > > > ChrisA > > > > > > ------------------------------ > > > > Message: 2 > > Date: Sun, 31 May 2015 23:14:06 -0400 > > From: random832 at fastmail.us > > To: python-ideas at python.org > > Subject: Re: [Python-ideas] Python Float Update > > Message-ID: > > <1433128446.31560.283106753.6D60F98F at webmail.messagingengine.com > > > > Content-Type: text/plain > > > > On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: > > > First, I propose that a float's integer ratio should be accurate. For > > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > > > returns(6004799503160661, 18014398509481984). > > > > Even though he's mistaken about the core premise, I do think there's a > > kernel of a good idea here - it would be nice to have a method (maybe > > as_integer_ratio, maybe with some parameter added, maybe a different > > method) to return with the smallest denominator that would result in > > exactly the original float if divided out, rather than merely the > > smallest power of two. > > > > > > ------------------------------ > > > > Message: 3 > > Date: Mon, 01 Jun 2015 03:21:36 +0000 > > From: Jim Witschey > > To: Chris Angelico > > Cc: python-ideas > > Subject: Re: [Python-ideas] Python Float Update > > Message-ID: > > > kg at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > Teachable moments about the implementation of floating-point aside, > > something in this neighborhood has been considered and rejected before, > in > > PEP 240. However, that was in 2001 - it was apparently created the same > day > > as PEP 237, which introduced transparent conversion of machine ints to > > bignums in the int type. > > > > I think hiding hardware number implementations has been a success for > > integers - it's a far superior API. It could be for rationals as well. > > > > Has something like this thread's original proposal - interpeting > > decimal-number literals as fractional values and using fractions as the > > result of integer arithmetic - been seriously discussed more recently > than > > PEP 240? If so, why haven't they been implemented? Perhaps enough has > > changed that it's worth reconsidering. > > > > > > On Sun, May 31, 2015 at 22:49 Chris Angelico wrote: > > > > > On Mon, Jun 1, 2015 at 12:25 PM, u8y7541 The Awesome Person > > > wrote: > > > > > > > > I will be presenting a modification to the float class, which will > > > improve its speed and accuracy (reduce floating point errors). This is > > > applicable because Python uses a numerator and denominator rather than > a > > > sign and mantissa to represent floats. > > > > > > > > First, I propose that a float's integer ratio should be accurate. For > > > example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it > > > returns(6004799503160661, 18014398509481984). > > > > > > > > > > I think you're misunderstanding the as_integer_ratio method. That > > > isn't how Python works internally; that's a service provided for > > > parsing out float internals into something more readable. What you > > > _actually_ are working with is IEEE 754 binary64. (Caveat: I have no > > > idea what Python-the-language stipulates, nor what other Python > > > implementations use, but that's what CPython uses, and you did your > > > initial experiments with CPython. None of this discussion applies *at > > > all* if a Python implementation doesn't use IEEE 754.) So internally, > > > 1/3 is stored as: > > > > > > 0 <-- sign bit (positive) > > > 01111111101 <-- exponent (1021) > > > 0101010101010101010101010101010101010101010101010101 <-- mantissa (52 > > > bits, repeating) > > > > > > The exponent is offset by 1023, so this means 1.010101.... divided by > > > 2?; the original repeating value is exactly equal to 4/3, so this is > > > correct, but as soon as it's squeezed into a finite-sized mantissa, it > > > gets rounded - in this case, rounded down. > > > > > > That's where your result comes from. It's been rounded such that it > > > fits inside IEEE 754, and then converted back to a fraction > > > afterwards. You're never going to get an exact result for anything > > > with a denominator that isn't a power of two. Fortunately, Python does > > > offer a solution: store your number as a pair of integers, rather than > > > as a packed floating point value, and all calculations truly will be > > > exact (at the cost of performance): > > > > > > >>> one_third = fractions.Fraction(1, 3) > > > >>> one_eighth = fractions.Fraction(1, 8) > > > >>> one_third + one_eighth > > > Fraction(11, 24) > > > > > > This is possibly more what you want to work with. > > > > > > ChrisA > > > _______________________________________________ > > > Python-ideas mailing list > > > Python-ideas at python.org > > > https://mail.python.org/mailman/listinfo/python-ideas > > > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > > > http://mail.python.org/pipermail/python-ideas/attachments/20150601/5e792f59/attachment-0001.html > > > > > > > ------------------------------ > > > > Message: 4 > > Date: Sun, 31 May 2015 20:27:47 -0700 > > From: David Mertz > > To: random832 at fastmail.us > > Cc: python-ideas > > Subject: Re: [Python-ideas] Python Float Update > > Message-ID: > > < > > CAEbHw4biWOxjYR0vtu8ykwSt63y9dcsR2-FtLPGfyUvqx0GgsQ at mail.gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > On Sun, May 31, 2015 at 8:14 PM, wrote: > > > > > Even though he's mistaken about the core premise, I do think there's a > > > kernel of a good idea here - it would be nice to have a method (maybe > > > as_integer_ratio, maybe with some parameter added, maybe a different > > > method) to return with the smallest denominator that would result in > > > exactly the original float if divided out, rather than merely the > > > smallest power of two. > > > > > > > What is the computational complexity of a hypothetical > > float.as_simplest_integer_ratio() method? How hard that is to find is > not > > obvious to me (probably it should be, but I'm not sure). > > > > -- > > Keeping medicines from the bloodstreams of the sick; food > > from the bellies of the hungry; books from the hands of the > > uneducated; technology from the underdeveloped; and putting > > advocates of freedom in prisons. Intellectual property is > > to the 21st century what the slave trade was to the 16th. > > -------------- next part -------------- > > An HTML attachment was scrubbed... > > URL: < > > > http://mail.python.org/pipermail/python-ideas/attachments/20150531/49264b3d/attachment.html > > > > > > > ------------------------------ > > > > Subject: Digest Footer > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > > > > > ------------------------------ > > > > End of Python-ideas Digest, Vol 103, Issue 3 > > ******************************************** > > > > > > -- > -Surya Subbarao > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/python-ideas/attachments/20150601/694e9c92/attachment-0001.html > > > > ------------------------------ > > Message: 2 > Date: Mon, 01 Jun 2015 22:18:49 +0200 > From: Stefan Behnel > To: python-ideas at python.org > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > Content-Type: text/plain; charset=utf-8 > > u8y7541 The Awesome Person schrieb am 01.06.2015 um 21:22: > > Maybe we could make a C implementation of the Fraction module? That would > > be nice. > > See the quicktions module: > > https://pypi.python.org/pypi/quicktions > > Stefan > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 103, Issue 10 > ********************************************* > -- -Surya Subbarao -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 2 00:58:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Jun 2015 08:58:14 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On 2 Jun 2015 01:04, "David Mertz" wrote: > > Decimal literals are far from as obvious as suggested. We *have* the `decimal` module after all, and it defines all sorts of parameters on precision, rounding rules, etc. that one can provide context for. decimal.ROUND_HALF_DOWN is "the obvious way" for some users, while decimal.ROUND_CEILING is "the obvious way" for others. > > I like decimals, but they don't simply make all the mathematical answers result in what all users would would consider "do what I mean" either. The last time we had a serious discussion about decimal literals, we realised the fact their behaviour is context dependent posed a significant problem for providing a literal form. With largely hardware provided IEEE754 semantics, binary floats are predictable, albeit somewhat surprising if you're expecting abstract math behaviour (i.e. no rounding errors), or finite base 10 representation behaviour. By contrast, decimal arithmetic deliberately allows for configurable contexts, presumably because financial regulations sometimes place strict constraints on how arithmetic is to be handled (e.g. "round half even" is also known as "banker's rounding", since it eliminates statistical bias in rounding financial transactions to the smallest supported unit of currency). That configurability makes decimal more fit for its primary intended use case (i.e. financial math), but also makes local reasoning harder - the results of some operations (even something as simple as unary plus) may vary based on the configured context (the precision, in particular). Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicholas.chammas at gmail.com Tue Jun 2 00:53:33 2015 From: nicholas.chammas at gmail.com (Nicholas Chammas) Date: Mon, 01 Jun 2015 22:53:33 +0000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: On Mon, Jun 1, 2015 at 6:15 PM Andrew Barnert abarnert at yahoo.com wrote: Obviously if you know the maximum precision needed before you start and > explicitly set it to something big enough (or 7 places bigger than needed) > you won't have any problem. Steven chose a low precision just to make the > problems easy to see and understand; he could just as easily have > constructed examples for a precision of 18. > > Unfortunately, even in cases where it is both possible and sufficiently > efficient to work out and set the precision high enough to make all of your > calculations exact, that's not something most people know how to do > reliably. In the fully general case, it's as hard as calculating error > propagation. > > As for the error: Python's decimal flags that too; the difference is that > the flag is ignored by default. You can change it to warn or error instead. > Maybe the solution is to make that easier--possibly just changing the docs. > If you read the whole thing you will eventually learn that the default > context ignores most such errors, but a one-liner gets you a different > context that acts like SQL Server, but who reads the whole module docs > (especially when they already believe they understand how decimal > arithmetic works)? Maybe moving that up near the top would be useful? > > This angle of discussion is what I was getting at when I wrote: Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s , for example), or better defaults when they are not specified? It sounds like there are perhaps several improvements that can be made to how decimals are handled, documented, and configured by default, that could possibly address the majority of gotchas for the majority of people in a more user friendly way than can be accomplished with floats. For all the problems presented with decimals by Steven and others, I?m not seeing how overall they are supposed to be *worse* than the problems with floats. We can explain precision and scale to people when they are using decimals and give them a basic framework for understanding how they affect calculations, and we can pick sensible defaults so that people won?t hit nasty gotchas easily. So we have some leverage there for making the experience better for most people most of the time. What?s our leverage for improving the experience of working with floats? And is the result really something better than decimals? Nick ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jun 2 01:26:08 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 16:26:08 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: <9964F6D3-E428-4161-B982-0D30710679A0@yahoo.com> On Jun 1, 2015, at 15:53, Nicholas Chammas wrote: > >> On Mon, Jun 1, 2015 at 6:15 PM Andrew Barnert abarnert at yahoo.com wrote: >> >> Obviously if you know the maximum precision needed before you start and explicitly set it to something big enough (or 7 places bigger than needed) you won't have any problem. Steven chose a low precision just to make the problems easy to see and understand; he could just as easily have constructed examples for a precision of 18. >> >> Unfortunately, even in cases where it is both possible and sufficiently efficient to work out and set the precision high enough to make all of your calculations exact, that's not something most people know how to do reliably. In the fully general case, it's as hard as calculating error propagation. >> >> As for the error: Python's decimal flags that too; the difference is that the flag is ignored by default. You can change it to warn or error instead. Maybe the solution is to make that easier--possibly just changing the docs. If you read the whole thing you will eventually learn that the default context ignores most such errors, but a one-liner gets you a different context that acts like SQL Server, but who reads the whole module docs (especially when they already believe they understand how decimal arithmetic works)? Maybe moving that up near the top would be useful? > > This angle of discussion is what I was getting at when I wrote: > > Perhaps Python needs better rules for how precision and scale are affected by calculations (here are SQL Server?s, for example), or better defaults when they are not specified? > I definitely agree that some edits to the decimal module docs, plus maybe a new HOWTO, and maybe some links to outside resources that explain things to people who are used to decimals in MSSQLServer or REXX or whatever, would be helpful. The question is, who has the sufficient knowledge, skill, and time/inclination to do it? > It sounds like there are perhaps several improvements that can be made to how decimals are handled, documented, and configured by default, that could possibly address the majority of gotchas for the majority of people in a more user friendly way than can be accomplished with floats. > > For all the problems presented with decimals by Steven and others, I?m not seeing how overall they are supposed to be worse than the problems with floats. > They're not worse than the problems with floats, they're the same problems... But the _effect_ of those problems can be worse, because: * The magnitude of the rounding errors is larger. * People mistakenly think they understand everything relevant about decimals, and the naive tests they try work out, so the problems may blindside them. * Being much more detailed and configurable means the best solution may be harder to find. * There's a lot of correct but potentially-misleading information out there. For example, any StackOverflow answer that says "you can solve this particular problem by using Decimal instead of float" can be very easily misinterpreted as applying to a much wider range of problems than it actually does. * Sometimes performance matters. On the other hand, the effect can also be less bad, because: * Once people do finally understand a given problem, at least for many people and many problems, working out a solution is easier in decimal. For some uses (in particular, many financial uses, and some kinds of engineering problems), it's even trivial. * Being more detailed and more configurable means the best solution may be better than any solution involving float. I don't think there's any obvious answer to the tradeoff, short of making it easier for people to choose appropriately: a good HOWTO, decimal literals or Swift-style float-convertibles, making it easier to find/construct decimal64 or DECIMAL(18) or Money types, speeding up decimal (already done, but maybe more could be done), etc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 2 02:08:37 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Jun 2015 10:08:37 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" wrote: >But the basic idea can be extracted out and Pythonified: > > The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. > > The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. Joonas's suggestion of storing the original text representation passed to the float constructor is at least a novel one - it's only the idea of actual decimal literals that was ruled out in the past. Aside from the practical implementation question, the main concern I have with it is that we'd be trading the status quo for a situation where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. It seems to me that a potentially better option might be to adjust the implicit float->Decimal conversion in the Decimal constructor to use the same algorithm as we now use for float.__repr__ [1], where we look for the shortest decimal representation that gives the same answer when rendered as a float. At the moment you have to indirect through str() or repr() to get that behaviour: >>> from decimal import Decimal as D >>> 1.3 1.3 >>> D('1.3') Decimal('1.3') >>> D(1.3) Decimal('1.3000000000000000444089209850062616169452667236328125') >>> D(str(1.3)) Decimal('1.3') Cheers, Nick. [1] http://bugs.python.org/issue1580 From abarnert at yahoo.com Tue Jun 2 03:27:32 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 18:27:32 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: On Jun 1, 2015, at 17:08, Nick Coghlan wrote: > > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" > wrote: >> But the basic idea can be extracted out and Pythonified: >> >> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. >> >> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. > > Joonas's suggestion of storing the original text representation passed > to the float constructor is at least a novel one - it's only the idea > of actual decimal literals that was ruled out in the past. I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily. Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral. This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...). But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work, and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality. I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d). > Aside from the practical implementation question, the main concern I > have with it is that we'd be trading the status quo for a situation > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) > It seems to me that a potentially better option might be to adjust the > implicit float->Decimal conversion in the Decimal constructor to use > the same algorithm as we now use for float.__repr__ [1], where we look > for the shortest decimal representation that gives the same answer > when rendered as a float. At the moment you have to indirect through > str() or repr() to get that behaviour: > >>>> from decimal import Decimal as D >>>> 1.3 > 1.3 >>>> D('1.3') > Decimal('1.3') >>>> D(1.3) > Decimal('1.3000000000000000444089209850062616169452667236328125') >>>> D(str(1.3)) > Decimal('1.3') > > Cheers, > Nick. > > [1] http://bugs.python.org/issue1580 From abarnert at yahoo.com Tue Jun 2 04:00:48 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 19:00:48 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: <90691306-98E3-421B-ABEB-BA2DE05962C6@yahoo.com> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas wrote: > >> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >> >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> wrote: >>> But the basic idea can be extracted out and Pythonified: >>> >>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. >>> >>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. >> >> Joonas's suggestion of storing the original text representation passed >> to the float constructor is at least a novel one - it's only the idea >> of actual decimal literals that was ruled out in the past. > > I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily. > > Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral. > > This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...). > > But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work, Make that 15 minutes. https://github.com/abarnert/floatliteralhack > and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality. > > I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d). > >> Aside from the practical implementation question, the main concern I >> have with it is that we'd be trading the status quo for a situation >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. > > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) > >> It seems to me that a potentially better option might be to adjust the >> implicit float->Decimal conversion in the Decimal constructor to use >> the same algorithm as we now use for float.__repr__ [1], where we look >> for the shortest decimal representation that gives the same answer >> when rendered as a float. At the moment you have to indirect through >> str() or repr() to get that behaviour: >> >>>>> from decimal import Decimal as D >>>>> 1.3 >> 1.3 >>>>> D('1.3') >> Decimal('1.3') >>>>> D(1.3) >> Decimal('1.3000000000000000444089209850062616169452667236328125') >>>>> D(str(1.3)) >> Decimal('1.3') >> >> Cheers, >> Nick. >> >> [1] http://bugs.python.org/issue1580 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Tue Jun 2 03:58:09 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Jun 2015 11:58:09 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: <20150602015809.GE932@ando.pearwood.info> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: > It seems to me that a potentially better option might be to adjust the > implicit float->Decimal conversion in the Decimal constructor to use > the same algorithm as we now use for float.__repr__ [1], where we look > for the shortest decimal representation that gives the same answer > when rendered as a float. At the moment you have to indirect through > str() or repr() to get that behaviour: Apart from the questions of whether such a change would be allowed by the Decimal specification, and the breaking of backwards compatibility, I would really hate that change for another reason. At the moment, a good, cheap way to find out what a binary float "really is" (in some sense) is to convert it to Decimal and see what you get: Decimal(1.3) -> Decimal('1.3000000000000000444089209850062616169452667236328125') If you want conversion from repr, then you can be explicit about it: Decimal(repr(1.3)) -> Decimal('1.3') ("Explicit is better than implicit", as they say...) Although in fairness I suppose that if this change happens, we could keep the old behaviour in the from_float method: # hypothetical future behaviour Decimal(1.3) -> Decimal('1.3') Decimal.from_float(1.3) -> Decimal('1.3000000000000000444089209850062616169452667236328125') But all things considered, I don't think we're doing people any favours by changing the behaviour of float->Decimal conversions to implicitly use the repr() instead of being exact. I expect this strategy is like trying to flatten a bubble under wallpaper: all you can do is push the gotchas and surprises to somewhere else. Oh, another thought... Decimals could gain yet another conversion method, one which implicitly uses the float repr, but signals if it was an inexact conversion or not. Explicitly calling repr can never signal, since the conversion occurs outside of the Decimal constructor and Decimal sees only the string: Decimal(repr(1.3)) cannot signal Inexact. But: Decimal.from_nearest_float(1.5) # exact Decimal.from_nearest_float(1.3) # signals Inexact That might be useful, but probably not to beginners. -- Steve From steve at pearwood.info Tue Jun 2 03:37:48 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Jun 2015 11:37:48 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> Message-ID: <20150602013748.GD932@ando.pearwood.info> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: > Having some sort of decimal literal would have some advantages of its own, > for one it could help against this sillyness: > > >>> Decimal(1.3) > Decimal('1.3000000000000000444089209850062616169452667236328125') Why is that silly? That's the actual value of the binary float 1.3 converted into base 10. If you want 1.3 exactly, you can do this: > >>> Decimal('1.3') > Decimal('1.3') Is that really so hard for people to learn? > I'm not saying that the actual data type needs to be a decimal ( > might well be a float but say shove the string repr next to it so it can be > accessed when needed) You want Decimals to *lie* about what value they have? I think that's a terrible idea, one which would lead to a whole set of new and exciting surprises when using Decimal. Let me try to predict a few of the questions on Stackoverflow which would follow this change... Why is equality so inaccurate in Python? py> x = Decimal(1.3) py> y = Decimal('1.3') py> x, y (Decimal('1.3'), Decimal('1.3')) py> x == y False Why does Python insert extra digits into numbers when I multiply? py> x = Decimal(1.3) py> x Decimal('1.3') py> y = 10000000000000000*x py> y - 13000000000000000 Decimal('0.444089209850062616169452667236328125') > ..but this is one really common pitfall for new users, i know its easy to > fix the code above, > but this behavior is very unintuitive.. you essentially get a really > expensive float when you do the obvious thing. Then don't do the obvious thing. Sometimes there really is no good alternative to actually knowing what you are doing. Floating point maths is inherently hard, but that's not the problem. There are all sorts of things in programming which are hard, and people learn how to deal with them. The problem is that people *imagine* that floating point is simple, when it is not and can never be. We don't do them any favours by enabling that delusion. If your needs are light, then you can ignore the complexities of floating point. You really can go a very long way by just rounding the results of your calculations when displaying them. But for anything more than that, we cannot just paper over the floating point complexities without creating new complexities that will burn people. You don't have to become a floating point guru, but it really isn't onerous to expect people who are programming to learn a few basic programming skills, and that includes a few basic coping strategies for floating point. -- Steve From abarnert at yahoo.com Tue Jun 2 04:21:47 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 19:21:47 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <90691306-98E3-421B-ABEB-BA2DE05962C6@yahoo.com> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <90691306-98E3-421B-ABEB-BA2DE05962C6@yahoo.com> Message-ID: <5E8271BF-183E-496D-A556-81C407977FFE@yahoo.com> On Jun 1, 2015, at 19:00, Andrew Barnert wrote: > >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas wrote: >> >>> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >>> >>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >>> wrote: >>>> But the basic idea can be extracted out and Pythonified: >>>> >>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. >>>> >>>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. >>> >>> Joonas's suggestion of storing the original text representation passed >>> to the float constructor is at least a novel one - it's only the idea >>> of actual decimal literals that was ruled out in the past. >> >> I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily. >> >> Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral. >> >> This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...). >> >> But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work, > > Make that 15 minutes. > > https://github.com/abarnert/floatliteralhack And as it turns out, hacking the tokens is no harder than hacking the AST (in fact, it's a little easier; I'd just never done it before), so now it does that, meaning you really get the actual literal string from the source, not the repr of the float of that string literal. Turning this into a real implementation would obviously be more than half an hour's work, but not more than a day or two. Again, I don't think anyone would actually want this, but now people who think they do have an implementation to play with to prove me wrong. >> and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality. >> >> I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d). >> >>> Aside from the practical implementation question, the main concern I >>> have with it is that we'd be trading the status quo for a situation >>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) >> >>> It seems to me that a potentially better option might be to adjust the >>> implicit float->Decimal conversion in the Decimal constructor to use >>> the same algorithm as we now use for float.__repr__ [1], where we look >>> for the shortest decimal representation that gives the same answer >>> when rendered as a float. At the moment you have to indirect through >>> str() or repr() to get that behaviour: >>> >>>>>> from decimal import Decimal as D >>>>>> 1.3 >>> 1.3 >>>>>> D('1.3') >>> Decimal('1.3') >>>>>> D(1.3) >>> Decimal('1.3000000000000000444089209850062616169452667236328125') >>>>>> D(str(1.3)) >>> Decimal('1.3') >>> >>> Cheers, >>> Nick. >>> >>> [1] http://bugs.python.org/issue1580 >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Tue Jun 2 05:00:40 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 2 Jun 2015 13:00:40 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150601145806.GB932@ando.pearwood.info> Message-ID: <20150602030040.GF932@ando.pearwood.info> Nicholas, Your email client appears to not be quoting text you quote. It is a conventional to use a leading > for quoting, perhaps you could configure your mail program to do so? The good ones even have a "Paste As Quote" command. On with the substance of your post... On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote: > I guess it?s a non-trivial tradeoff. But I would lean towards considering > people likely to be affected by the performance hit as doing something ?not > common?. Like, if they are doing that many calculations that it matters, > perhaps it makes sense to ask them to explicitly ask for floats vs. > decimals, in exchange for giving the majority who wouldn?t notice a > performance difference a better user experience. Changing from binary floats to decimal floats by default is a big, backwards incompatible change. Even if it's a good idea, we're constrained by backwards compatibility: I would imagine we wouldn't want to even introduce this feature until the majority of people are using Python 3 rather than Python 2, and then we'd probably want to introduce it using a "from __future__ import decimal_floats" directive. So I would guess this couldn't happen until probably 2020 or so. But we could introduce a decimal literal, say 1.1d for Decimal("1.1"). The first prerequisite is that we have a fast Decimal implementation, which we now have. Next we would have to decide how the decimal literals would interact with the decimal module. Do we include full support of the entire range of decimal features, including globally configurable precision and other modes? Or just a subset? How will these decimals interact with other numeric types, like float and Fraction? At the moment, Decimal isn't even part of the numeric tower. There's a lot of ground to cover, it's not a trivial change, and will definitely need a PEP. > How many of your examples are inherent limitations of decimals vs. problems > that can be improved upon? In one sense, they are inherent limitations of floating point numbers regardless of base. Whether binary, decimal, hexadecimal as used in some IBM computers, or something else, you're going to see the same problems. Only the specific details will vary, e.g. 1/3 cannot be represented exactly in base 2 or base 10, but if you constructed a base 3 float, it would be exact. In another sense, Decimal has a big advantage that it is much more configurable than Python's floats. Decimal lets you configure the precision, rounding mode, error handling and more. That's not inherent to base 10 calculations, you can do exactly the same thing for binary floats too, but Python doesn't offer that feature for floats, only for Decimals. But no matter how you configure Decimal, all you can do is shift the gotchas around. The issue really is inherent to the nature of the problem, and you cannot defeat the universe. Regardless of what base you use, binary or decimal or something else, or how many digits precision, you're still trying to simulate an uncountably infinite continuous, infinitely divisible number line using a finite, discontinuous set of possible values. Something has to give. (For the record, when I say "uncountably infinite", I don't just mean "too many to count", it's a technical term. To oversimplify horribly, it means "larger than infinity" in some sense. It's off-topic for here, but if anyone is interested in learning more, you can email me off-list, or google for "countable vs uncountable infinity".) Basically, you're trying to squeeze an infinite number of real numbers into a finite amount of memory. It can't be done. Consequently, there will *always* be some calculations where the true value simply cannot be calculated and the answer you get is slightly too big or slightly too small. All the other floating point gotchas follow from that simple fact. > Admittedly, the only place where I?ve played with decimals extensively is > on Microsoft?s SQL Server (where they are the default literal > ). I?ve stumbled in > the past on my own decimal gotchas > , but looking at your examples > and trying them on SQL Server I suspect that most of the problems you show > are problems of precision and scale. No. Change the precision and scale, and some *specific* problems goes away, but they reappear with other numbers. Besides, at the point that you're talking about setting the precision, we're really not talking about making things easy for beginners any more. And not all floating point issues are related to precision and scale in decimal. You cannot divide a cake into exactly three equal pieces in Decimal any more than you can divide a cake into exactly three equal pieces in binary. All you can hope for is to choose a precision were the rounding errors in one part of your calculation will be cancelled by the rounding errors in another part of your calculation. And that precision will be different for any two arbitrary calculations. -- Steve From abarnert at yahoo.com Tue Jun 2 05:10:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 20:10:29 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150602015809.GE932@ando.pearwood.info> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150602015809.GE932@ando.pearwood.info> Message-ID: <79C16144-8BF7-4260-A356-DD4E8D97BAAD@yahoo.com> On Jun 1, 2015, at 18:58, Steven D'Aprano wrote: > >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: >> >> It seems to me that a potentially better option might be to adjust the >> implicit float->Decimal conversion in the Decimal constructor to use >> the same algorithm as we now use for float.__repr__ [1], where we look >> for the shortest decimal representation that gives the same answer >> when rendered as a float. At the moment you have to indirect through >> str() or repr() to get that behaviour: > > Apart from the questions of whether such a change would be allowed by > the Decimal specification, As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing. As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case. > and the breaking of backwards compatibility, > I would really hate that change for another reason. > > At the moment, a good, cheap way to find out what a binary float "really > is" (in some sense) is to convert it to Decimal and see what you get: > > Decimal(1.3) > -> Decimal('1.3000000000000000444089209850062616169452667236328125') > > If you want conversion from repr, then you can be explicit about it: > > Decimal(repr(1.3)) > -> Decimal('1.3') > > ("Explicit is better than implicit", as they say...) > > Although in fairness I suppose that if this change happens, we could > keep the old behaviour in the from_float method: > > # hypothetical future behaviour > Decimal(1.3) > -> Decimal('1.3') > Decimal.from_float(1.3) > -> Decimal('1.3000000000000000444089209850062616169452667236328125') > > But all things considered, I don't think we're doing people any favours > by changing the behaviour of float->Decimal conversions to implicitly > use the repr() instead of being exact. I expect this strategy is like > trying to flatten a bubble under wallpaper: all you can do is push the > gotchas and surprises to somewhere else. > > Oh, another thought... Decimals could gain yet another conversion > method, one which implicitly uses the float repr, but signals if it was > an inexact conversion or not. Explicitly calling repr can never signal, > since the conversion occurs outside of the Decimal constructor and > Decimal sees only the string: > > Decimal(repr(1.3)) cannot signal Inexact. > > But: > > Decimal.from_nearest_float(1.5) # exact > Decimal.from_nearest_float(1.3) # signals Inexact > > That might be useful, but probably not to beginners. I think this might be worth having whether the default constructor is changed or not. I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it. From surya.subbarao1 at gmail.com Tue Jun 2 05:48:51 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Mon, 1 Jun 2015 20:48:51 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 14 In-Reply-To: References: Message-ID: That patch sounds nice, I don't have to edit my Python distribution! We'll have to do with this. On Mon, Jun 1, 2015 at 7:03 PM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: Python Float Update (Nick Coghlan) > 2. Re: Python Float Update (Andrew Barnert) > 3. Re: Python Float Update (Andrew Barnert) > 4. Re: Python Float Update (Steven D'Aprano) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 2 Jun 2015 10:08:37 +1000 > From: Nick Coghlan > To: Andrew Barnert > Cc: python-ideas > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > < > CADiSq7fjhS_XrKe3QfF58hXdhLSSbX6NvsFZZKjRq-+OLOQ-eQ at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" > wrote: > >But the basic idea can be extracted out and Pythonified: > > > > The literal 1.23 no longer gives you a float, but a FloatLiteral, which > is either a subclass of float, or an unrelated class that has a __float__ > method. Doing any calculation on it gives you a float. But as long as you > leave it alone as a FloatLiteral, it has its literal characters available > for any function that wants to distinguish FloatLiteral from float, like > the Decimal constructor. > > > > The problem that Python faces that Swift doesn't is that Python doesn't > use static typing and implicit compile-time conversions. So in Python, > you'd be passing around these larger values and doing the slow conversions > at runtime. That may or may not be unacceptable; without actually building > it and testing some realistic programs it's pretty hard to guess. > > Joonas's suggestion of storing the original text representation passed > to the float constructor is at least a novel one - it's only the idea > of actual decimal literals that was ruled out in the past. > > Aside from the practical implementation question, the main concern I > have with it is that we'd be trading the status quo for a situation > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. > > It seems to me that a potentially better option might be to adjust the > implicit float->Decimal conversion in the Decimal constructor to use > the same algorithm as we now use for float.__repr__ [1], where we look > for the shortest decimal representation that gives the same answer > when rendered as a float. At the moment you have to indirect through > str() or repr() to get that behaviour: > > >>> from decimal import Decimal as D > >>> 1.3 > 1.3 > >>> D('1.3') > Decimal('1.3') > >>> D(1.3) > Decimal('1.3000000000000000444089209850062616169452667236328125') > >>> D(str(1.3)) > Decimal('1.3') > > Cheers, > Nick. > > [1] http://bugs.python.org/issue1580 > > > ------------------------------ > > Message: 2 > Date: Mon, 1 Jun 2015 18:27:32 -0700 > From: Andrew Barnert > To: Nick Coghlan > Cc: python-ideas > Subject: Re: [Python-ideas] Python Float Update > Message-ID: > Content-Type: text/plain; charset=us-ascii > > On Jun 1, 2015, at 17:08, Nick Coghlan wrote: > > > > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" > > wrote: > >> But the basic idea can be extracted out and Pythonified: > >> > >> The literal 1.23 no longer gives you a float, but a FloatLiteral, which > is either a subclass of float, or an unrelated class that has a __float__ > method. Doing any calculation on it gives you a float. But as long as you > leave it alone as a FloatLiteral, it has its literal characters available > for any function that wants to distinguish FloatLiteral from float, like > the Decimal constructor. > >> > >> The problem that Python faces that Swift doesn't is that Python doesn't > use static typing and implicit compile-time conversions. So in Python, > you'd be passing around these larger values and doing the slow conversions > at runtime. That may or may not be unacceptable; without actually building > it and testing some realistic programs it's pretty hard to guess. > > > > Joonas's suggestion of storing the original text representation passed > > to the float constructor is at least a novel one - it's only the idea > > of actual decimal literals that was ruled out in the past. > > I actually built about half an implementation of something like Swift's > LiteralConvertible protocol back when I was teaching myself Swift. But I > think I have a simpler version that I could implement much more easily. > > Basically, FloatLiteral is just a subclass of float whose __new__ stores > its constructor argument. Then decimal.Decimal checks for that stored > string and uses it instead of the float value if present. Then there's an > import hook that replaces every Num with a call to FloatLiteral. > > This design doesn't actually fix everything; in effect, 1.3 actually > compiles to FloatLiteral(str(float('1.3')) (because by the time you get to > the AST it's too late to avoid that first conversion). Which does actually > solve the problem with 1.3, but doesn't solve everything in general (e.g., > just feed in a number that has more precision than a double can hold but > less than your current decimal context can...). > > But it just lets you test whether the implementation makes sense and what > the performance effects are, and it's only an hour of work, and doesn't > require anyone to patch their interpreter to play with it. If it seems > promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') > may be worth doing for a test of the actual functionality. > > I'll be glad to hack it up when I get a chance tonight. But personally, I > think decimal literals are a better way to go here. Decimal(1.20) magically > doing what you want still has all the same downsides as 1.20d (or implicit > decimal literals), plus it's more complex, adds performance costs, and > doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little > nicer than Decimal('1.20'), but only a little--and nowhere near as nice as > 1.20d). > > > Aside from the practical implementation question, the main concern I > > have with it is that we'd be trading the status quo for a situation > > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. > > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which > implies that maybe the simplification in Decimal(1.3) is more misleading > than helpful. (Notice that this problem also doesn't arise for decimal > literals--13/10d is int vs. Decimal division, which is correct out of the > box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) > > > It seems to me that a potentially better option might be to adjust the > > implicit float->Decimal conversion in the Decimal constructor to use > > the same algorithm as we now use for float.__repr__ [1], where we look > > for the shortest decimal representation that gives the same answer > > when rendered as a float. At the moment you have to indirect through > > str() or repr() to get that behaviour: > > > >>>> from decimal import Decimal as D > >>>> 1.3 > > 1.3 > >>>> D('1.3') > > Decimal('1.3') > >>>> D(1.3) > > Decimal('1.3000000000000000444089209850062616169452667236328125') > >>>> D(str(1.3)) > > Decimal('1.3') > > > > Cheers, > > Nick. > > > > [1] http://bugs.python.org/issue1580 > > > ------------------------------ > > Message: 3 > Date: Mon, 1 Jun 2015 19:00:48 -0700 > From: Andrew Barnert > To: Andrew Barnert > Cc: Nick Coghlan , python-ideas > > Subject: Re: [Python-ideas] Python Float Update > Message-ID: <90691306-98E3-421B-ABEB-BA2DE05962C6 at yahoo.com> > Content-Type: text/plain; charset=us-ascii > > On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > > > >> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: > >> > >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" > >> wrote: > >>> But the basic idea can be extracted out and Pythonified: > >>> > >>> The literal 1.23 no longer gives you a float, but a FloatLiteral, > which is either a subclass of float, or an unrelated class that has a > __float__ method. Doing any calculation on it gives you a float. But as > long as you leave it alone as a FloatLiteral, it has its literal characters > available for any function that wants to distinguish FloatLiteral from > float, like the Decimal constructor. > >>> > >>> The problem that Python faces that Swift doesn't is that Python > doesn't use static typing and implicit compile-time conversions. So in > Python, you'd be passing around these larger values and doing the slow > conversions at runtime. That may or may not be unacceptable; without > actually building it and testing some realistic programs it's pretty hard > to guess. > >> > >> Joonas's suggestion of storing the original text representation passed > >> to the float constructor is at least a novel one - it's only the idea > >> of actual decimal literals that was ruled out in the past. > > > > I actually built about half an implementation of something like Swift's > LiteralConvertible protocol back when I was teaching myself Swift. But I > think I have a simpler version that I could implement much more easily. > > > > Basically, FloatLiteral is just a subclass of float whose __new__ stores > its constructor argument. Then decimal.Decimal checks for that stored > string and uses it instead of the float value if present. Then there's an > import hook that replaces every Num with a call to FloatLiteral. > > > > This design doesn't actually fix everything; in effect, 1.3 actually > compiles to FloatLiteral(str(float('1.3')) (because by the time you get to > the AST it's too late to avoid that first conversion). Which does actually > solve the problem with 1.3, but doesn't solve everything in general (e.g., > just feed in a number that has more precision than a double can hold but > less than your current decimal context can...). > > > > But it just lets you test whether the implementation makes sense and > what the performance effects are, and it's only an hour of work, > > Make that 15 minutes. > > https://github.com/abarnert/floatliteralhack > > > and doesn't require anyone to patch their interpreter to play with it. > If it seems promising, then hacking the compiler so 2.3 compiles to > FloatLiteral('2.3') may be worth doing for a test of the actual > functionality. > > > > I'll be glad to hack it up when I get a chance tonight. But personally, > I think decimal literals are a better way to go here. Decimal(1.20) > magically doing what you want still has all the same downsides as 1.20d (or > implicit decimal literals), plus it's more complex, adds performance costs, > and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little > nicer than Decimal('1.20'), but only a little--and nowhere near as nice as > 1.20d). > > > >> Aside from the practical implementation question, the main concern I > >> have with it is that we'd be trading the status quo for a situation > >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. > > > > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which > implies that maybe the simplification in Decimal(1.3) is more misleading > than helpful. (Notice that this problem also doesn't arise for decimal > literals--13/10d is int vs. Decimal division, which is correct out of the > box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) > > > >> It seems to me that a potentially better option might be to adjust the > >> implicit float->Decimal conversion in the Decimal constructor to use > >> the same algorithm as we now use for float.__repr__ [1], where we look > >> for the shortest decimal representation that gives the same answer > >> when rendered as a float. At the moment you have to indirect through > >> str() or repr() to get that behaviour: > >> > >>>>> from decimal import Decimal as D > >>>>> 1.3 > >> 1.3 > >>>>> D('1.3') > >> Decimal('1.3') > >>>>> D(1.3) > >> Decimal('1.3000000000000000444089209850062616169452667236328125') > >>>>> D(str(1.3)) > >> Decimal('1.3') > >> > >> Cheers, > >> Nick. > >> > >> [1] http://bugs.python.org/issue1580 > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > ------------------------------ > > Message: 4 > Date: Tue, 2 Jun 2015 11:58:09 +1000 > From: Steven D'Aprano > To: python-ideas at python.org > Subject: Re: [Python-ideas] Python Float Update > Message-ID: <20150602015809.GE932 at ando.pearwood.info> > Content-Type: text/plain; charset=us-ascii > > On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: > > > It seems to me that a potentially better option might be to adjust the > > implicit float->Decimal conversion in the Decimal constructor to use > > the same algorithm as we now use for float.__repr__ [1], where we look > > for the shortest decimal representation that gives the same answer > > when rendered as a float. At the moment you have to indirect through > > str() or repr() to get that behaviour: > > Apart from the questions of whether such a change would be allowed by > the Decimal specification, and the breaking of backwards compatibility, > I would really hate that change for another reason. > > At the moment, a good, cheap way to find out what a binary float "really > is" (in some sense) is to convert it to Decimal and see what you get: > > Decimal(1.3) > -> Decimal('1.3000000000000000444089209850062616169452667236328125') > > If you want conversion from repr, then you can be explicit about it: > > Decimal(repr(1.3)) > -> Decimal('1.3') > > ("Explicit is better than implicit", as they say...) > > Although in fairness I suppose that if this change happens, we could > keep the old behaviour in the from_float method: > > # hypothetical future behaviour > Decimal(1.3) > -> Decimal('1.3') > Decimal.from_float(1.3) > -> Decimal('1.3000000000000000444089209850062616169452667236328125') > > But all things considered, I don't think we're doing people any favours > by changing the behaviour of float->Decimal conversions to implicitly > use the repr() instead of being exact. I expect this strategy is like > trying to flatten a bubble under wallpaper: all you can do is push the > gotchas and surprises to somewhere else. > > Oh, another thought... Decimals could gain yet another conversion > method, one which implicitly uses the float repr, but signals if it was > an inexact conversion or not. Explicitly calling repr can never signal, > since the conversion occurs outside of the Decimal constructor and > Decimal sees only the string: > > Decimal(repr(1.3)) cannot signal Inexact. > > But: > > Decimal.from_nearest_float(1.5) # exact > Decimal.from_nearest_float(1.3) # signals Inexact > > That might be useful, but probably not to beginners. > > > -- > Steve > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 103, Issue 14 > ********************************************* > -- -Surya Subbarao -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.subbarao1 at gmail.com Tue Jun 2 05:55:55 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Mon, 1 Jun 2015 20:55:55 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 16 In-Reply-To: References: Message-ID: Thanks for making that patch! On Mon, Jun 1, 2015 at 8:48 PM, wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > > > Today's Topics: > > 1. Re: Python-ideas Digest, Vol 103, Issue 14 > (u8y7541 The Awesome Person) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 1 Jun 2015 20:48:51 -0700 > From: u8y7541 The Awesome Person > To: python-ideas at python.org, abarnert at yahoo.com > Subject: Re: [Python-ideas] Python-ideas Digest, Vol 103, Issue 14 > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > That patch sounds nice, I don't have to edit my Python distribution! We'll > have to do with this. > > On Mon, Jun 1, 2015 at 7:03 PM, wrote: > >> Send Python-ideas mailing list submissions to >> python-ideas at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/python-ideas >> or, via email, send a message with subject or body 'help' to >> python-ideas-request at python.org >> >> You can reach the person managing the list at >> python-ideas-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Python-ideas digest..." >> >> >> Today's Topics: >> >> 1. Re: Python Float Update (Nick Coghlan) >> 2. Re: Python Float Update (Andrew Barnert) >> 3. Re: Python Float Update (Andrew Barnert) >> 4. Re: Python Float Update (Steven D'Aprano) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 2 Jun 2015 10:08:37 +1000 >> From: Nick Coghlan >> To: Andrew Barnert >> Cc: python-ideas >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: >> < >> CADiSq7fjhS_XrKe3QfF58hXdhLSSbX6NvsFZZKjRq-+OLOQ-eQ at mail.gmail.com> >> Content-Type: text/plain; charset=UTF-8 >> >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> wrote: >> >But the basic idea can be extracted out and Pythonified: >> > >> > The literal 1.23 no longer gives you a float, but a FloatLiteral, which >> is either a subclass of float, or an unrelated class that has a __float__ >> method. Doing any calculation on it gives you a float. But as long as you >> leave it alone as a FloatLiteral, it has its literal characters available >> for any function that wants to distinguish FloatLiteral from float, like >> the Decimal constructor. >> > >> > The problem that Python faces that Swift doesn't is that Python doesn't >> use static typing and implicit compile-time conversions. So in Python, >> you'd be passing around these larger values and doing the slow conversions >> at runtime. That may or may not be unacceptable; without actually building >> it and testing some realistic programs it's pretty hard to guess. >> >> Joonas's suggestion of storing the original text representation passed >> to the float constructor is at least a novel one - it's only the idea >> of actual decimal literals that was ruled out in the past. >> >> Aside from the practical implementation question, the main concern I >> have with it is that we'd be trading the status quo for a situation >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> >> It seems to me that a potentially better option might be to adjust the >> implicit float->Decimal conversion in the Decimal constructor to use >> the same algorithm as we now use for float.__repr__ [1], where we look >> for the shortest decimal representation that gives the same answer >> when rendered as a float. At the moment you have to indirect through >> str() or repr() to get that behaviour: >> >> >>> from decimal import Decimal as D >> >>> 1.3 >> 1.3 >> >>> D('1.3') >> Decimal('1.3') >> >>> D(1.3) >> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >>> D(str(1.3)) >> Decimal('1.3') >> >> Cheers, >> Nick. >> >> [1] http://bugs.python.org/issue1580 >> >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 1 Jun 2015 18:27:32 -0700 >> From: Andrew Barnert >> To: Nick Coghlan >> Cc: python-ideas >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >> > >> > On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> > wrote: >> >> But the basic idea can be extracted out and Pythonified: >> >> >> >> The literal 1.23 no longer gives you a float, but a FloatLiteral, which >> is either a subclass of float, or an unrelated class that has a __float__ >> method. Doing any calculation on it gives you a float. But as long as you >> leave it alone as a FloatLiteral, it has its literal characters available >> for any function that wants to distinguish FloatLiteral from float, like >> the Decimal constructor. >> >> >> >> The problem that Python faces that Swift doesn't is that Python doesn't >> use static typing and implicit compile-time conversions. So in Python, >> you'd be passing around these larger values and doing the slow conversions >> at runtime. That may or may not be unacceptable; without actually building >> it and testing some realistic programs it's pretty hard to guess. >> > >> > Joonas's suggestion of storing the original text representation passed >> > to the float constructor is at least a novel one - it's only the idea >> > of actual decimal literals that was ruled out in the past. >> >> I actually built about half an implementation of something like Swift's >> LiteralConvertible protocol back when I was teaching myself Swift. But I >> think I have a simpler version that I could implement much more easily. >> >> Basically, FloatLiteral is just a subclass of float whose __new__ stores >> its constructor argument. Then decimal.Decimal checks for that stored >> string and uses it instead of the float value if present. Then there's an >> import hook that replaces every Num with a call to FloatLiteral. >> >> This design doesn't actually fix everything; in effect, 1.3 actually >> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to >> the AST it's too late to avoid that first conversion). Which does actually >> solve the problem with 1.3, but doesn't solve everything in general (e.g., >> just feed in a number that has more precision than a double can hold but >> less than your current decimal context can...). >> >> But it just lets you test whether the implementation makes sense and what >> the performance effects are, and it's only an hour of work, and doesn't >> require anyone to patch their interpreter to play with it. If it seems >> promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') >> may be worth doing for a test of the actual functionality. >> >> I'll be glad to hack it up when I get a chance tonight. But personally, I >> think decimal literals are a better way to go here. Decimal(1.20) magically >> doing what you want still has all the same downsides as 1.20d (or implicit >> decimal literals), plus it's more complex, adds performance costs, and >> doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little >> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as >> 1.20d). >> >> > Aside from the practical implementation question, the main concern I >> > have with it is that we'd be trading the status quo for a situation >> > where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which >> implies that maybe the simplification in Decimal(1.3) is more misleading >> than helpful. (Notice that this problem also doesn't arise for decimal >> literals--13/10d is int vs. Decimal division, which is correct out of the >> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) >> >> > It seems to me that a potentially better option might be to adjust the >> > implicit float->Decimal conversion in the Decimal constructor to use >> > the same algorithm as we now use for float.__repr__ [1], where we look >> > for the shortest decimal representation that gives the same answer >> > when rendered as a float. At the moment you have to indirect through >> > str() or repr() to get that behaviour: >> > >> >>>> from decimal import Decimal as D >> >>>> 1.3 >> > 1.3 >> >>>> D('1.3') >> > Decimal('1.3') >> >>>> D(1.3) >> > Decimal('1.3000000000000000444089209850062616169452667236328125') >> >>>> D(str(1.3)) >> > Decimal('1.3') >> > >> > Cheers, >> > Nick. >> > >> > [1] http://bugs.python.org/issue1580 >> >> >> ------------------------------ >> >> Message: 3 >> Date: Mon, 1 Jun 2015 19:00:48 -0700 >> From: Andrew Barnert >> To: Andrew Barnert >> Cc: Nick Coghlan , python-ideas >> >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <90691306-98E3-421B-ABEB-BA2DE05962C6 at yahoo.com> >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas < >> python-ideas at python.org> wrote: >> > >> >> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >> >> >> >> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> >> wrote: >> >>> But the basic idea can be extracted out and Pythonified: >> >>> >> >>> The literal 1.23 no longer gives you a float, but a FloatLiteral, >> which is either a subclass of float, or an unrelated class that has a >> __float__ method. Doing any calculation on it gives you a float. But as >> long as you leave it alone as a FloatLiteral, it has its literal characters >> available for any function that wants to distinguish FloatLiteral from >> float, like the Decimal constructor. >> >>> >> >>> The problem that Python faces that Swift doesn't is that Python >> doesn't use static typing and implicit compile-time conversions. So in >> Python, you'd be passing around these larger values and doing the slow >> conversions at runtime. That may or may not be unacceptable; without >> actually building it and testing some realistic programs it's pretty hard >> to guess. >> >> >> >> Joonas's suggestion of storing the original text representation passed >> >> to the float constructor is at least a novel one - it's only the idea >> >> of actual decimal literals that was ruled out in the past. >> > >> > I actually built about half an implementation of something like Swift's >> LiteralConvertible protocol back when I was teaching myself Swift. But I >> think I have a simpler version that I could implement much more easily. >> > >> > Basically, FloatLiteral is just a subclass of float whose __new__ stores >> its constructor argument. Then decimal.Decimal checks for that stored >> string and uses it instead of the float value if present. Then there's an >> import hook that replaces every Num with a call to FloatLiteral. >> > >> > This design doesn't actually fix everything; in effect, 1.3 actually >> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to >> the AST it's too late to avoid that first conversion). Which does actually >> solve the problem with 1.3, but doesn't solve everything in general (e.g., >> just feed in a number that has more precision than a double can hold but >> less than your current decimal context can...). >> > >> > But it just lets you test whether the implementation makes sense and >> what the performance effects are, and it's only an hour of work, >> >> Make that 15 minutes. >> >> https://github.com/abarnert/floatliteralhack >> >> > and doesn't require anyone to patch their interpreter to play with it. >> If it seems promising, then hacking the compiler so 2.3 compiles to >> FloatLiteral('2.3') may be worth doing for a test of the actual >> functionality. >> > >> > I'll be glad to hack it up when I get a chance tonight. But personally, >> I think decimal literals are a better way to go here. Decimal(1.20) >> magically doing what you want still has all the same downsides as 1.20d (or >> implicit decimal literals), plus it's more complex, adds performance costs, >> and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little >> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as >> 1.20d). >> > >> >> Aside from the practical implementation question, the main concern I >> >> have with it is that we'd be trading the status quo for a situation >> >> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> > >> > Yes, to solve that you really need Decimal(13)/Decimal(10)... Which >> implies that maybe the simplification in Decimal(1.3) is more misleading >> than helpful. (Notice that this problem also doesn't arise for decimal >> literals--13/10d is int vs. Decimal division, which is correct out of the >> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) >> > >> >> It seems to me that a potentially better option might be to adjust the >> >> implicit float->Decimal conversion in the Decimal constructor to use >> >> the same algorithm as we now use for float.__repr__ [1], where we look >> >> for the shortest decimal representation that gives the same answer >> >> when rendered as a float. At the moment you have to indirect through >> >> str() or repr() to get that behaviour: >> >> >> >>>>> from decimal import Decimal as D >> >>>>> 1.3 >> >> 1.3 >> >>>>> D('1.3') >> >> Decimal('1.3') >> >>>>> D(1.3) >> >> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >>>>> D(str(1.3)) >> >> Decimal('1.3') >> >> >> >> Cheers, >> >> Nick. >> >> >> >> [1] http://bugs.python.org/issue1580 >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> ------------------------------ >> >> Message: 4 >> Date: Tue, 2 Jun 2015 11:58:09 +1000 >> From: Steven D'Aprano >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <20150602015809.GE932 at ando.pearwood.info> >> Content-Type: text/plain; charset=us-ascii >> >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: >> >> > It seems to me that a potentially better option might be to adjust the >> > implicit float->Decimal conversion in the Decimal constructor to use >> > the same algorithm as we now use for float.__repr__ [1], where we look >> > for the shortest decimal representation that gives the same answer >> > when rendered as a float. At the moment you have to indirect through >> > str() or repr() to get that behaviour: >> >> Apart from the questions of whether such a change would be allowed by >> the Decimal specification, and the breaking of backwards compatibility, >> I would really hate that change for another reason. >> >> At the moment, a good, cheap way to find out what a binary float "really >> is" (in some sense) is to convert it to Decimal and see what you get: >> >> Decimal(1.3) >> -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >> If you want conversion from repr, then you can be explicit about it: >> >> Decimal(repr(1.3)) >> -> Decimal('1.3') >> >> ("Explicit is better than implicit", as they say...) >> >> Although in fairness I suppose that if this change happens, we could >> keep the old behaviour in the from_float method: >> >> # hypothetical future behaviour >> Decimal(1.3) >> -> Decimal('1.3') >> Decimal.from_float(1.3) >> -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >> But all things considered, I don't think we're doing people any favours >> by changing the behaviour of float->Decimal conversions to implicitly >> use the repr() instead of being exact. I expect this strategy is like >> trying to flatten a bubble under wallpaper: all you can do is push the >> gotchas and surprises to somewhere else. >> >> Oh, another thought... Decimals could gain yet another conversion >> method, one which implicitly uses the float repr, but signals if it was >> an inexact conversion or not. Explicitly calling repr can never signal, >> since the conversion occurs outside of the Decimal constructor and >> Decimal sees only the string: >> >> Decimal(repr(1.3)) cannot signal Inexact. >> >> But: >> >> Decimal.from_nearest_float(1.5) # exact >> Decimal.from_nearest_float(1.3) # signals Inexact >> >> That might be useful, but probably not to beginners. >> >> >> -- >> Steve >> >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> >> ------------------------------ >> >> End of Python-ideas Digest, Vol 103, Issue 14 >> ********************************************* >> > > > > -- > -Surya Subbarao > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > > ------------------------------ > > End of Python-ideas Digest, Vol 103, Issue 16 > ********************************************* -- -Surya Subbarao From stephen at xemacs.org Tue Jun 2 06:21:48 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 02 Jun 2015 13:21:48 +0900 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> Message-ID: <87h9qqu4df.fsf@uwakimon.sk.tsukuba.ac.jp> Joonas Liik writes: > Having some sort of decimal literal would have some advantages of its own, > for one it could help against this sillyness: > I'm not saying that the actual data type needs to be a decimal ( > might well be a float but say shove the string repr next to it so > it can be accessed when needed) That *would* be a different type from float. You may as well go all the way to Decimal. > ..but this is one really common pitfall for new users, To fix it, you really need to change the parser, i.e., make Decimal the default type for non-integral numbers. "Decimal('1.3')" isn't that much harder to remember than "1.3$" (although it's quite a bit more to type). But people are going to continue writing things like pennies = 13 pennies_per_dollar = 100 dollars = pennies / pennies_per_dollar # Much later ... future_value = dollars * Decimal('1.07') And in real applications you're going to be using Decimal in code like def inputDecimals(file): for row, line in enumerate(file): for col, value in enumerate(line.strip().split()): matrix[row][col] = Decimal(value) or def what_if(): principal = Decimal(input("Principal ($): ")) rate = Decimal(input("Interest rate (%): ")) print("Future value is ", principal * (1 + rate/100), ".", sep="") and the whole issue evaporates. From guido at python.org Tue Jun 2 06:31:47 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Jun 2015 21:31:47 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <87h9qqu4df.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <87h9qqu4df.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 1, 2015 at 9:21 PM, Stephen J. Turnbull wrote: > Joonas Liik writes: > > > Having some sort of decimal literal would have some advantages of its > own, > > for one it could help against this sillyness: > > > I'm not saying that the actual data type needs to be a decimal ( > > might well be a float but say shove the string repr next to it so > > it can be accessed when needed) > > That *would* be a different type from float. Shudder indeed. > You may as well go all > the way to Decimal. > Or perhaps switch to decimal64 ( http://en.wikipedia.org/wiki/Decimal64_floating-point_format)? (Or its bigger cousing, decimal128) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Jun 2 06:41:02 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 02 Jun 2015 13:41:02 +0900 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> Message-ID: <87fv6au3hd.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > the main concern I have with [a FloatLiteral that carries the > original repr around] is that we'd be trading the status quo for a > situation where "Decimal(1.3)" and "Decimal(13/10)" gave different > answers. Yeah, and that kills the deal for me. Either Decimal is the default representation for non-integers, or this is a no-go. And that isn't going to happen. From random832 at fastmail.us Tue Jun 2 06:47:02 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 02 Jun 2015 00:47:02 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <87h9qqu4df.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <1433220422.818880.284335641.7348DADC@webmail.messagingengine.com> On Tue, Jun 2, 2015, at 00:31, Guido van Rossum wrote: > Or perhaps switch to decimal64 ( > http://en.wikipedia.org/wiki/Decimal64_floating-point_format)? (Or its > bigger cousing, decimal128) Does anyone know if any common computer architectures have any hardware support for this? Are there any known good implementations for all the functions in math/cmath for these types? Moving to a fixed-size floating point type does have the advantage of not requiring making all these decisions about environments and precision and potentially unbounded growth etc. From ncoghlan at gmail.com Tue Jun 2 07:10:22 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Jun 2015 15:10:22 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <79C16144-8BF7-4260-A356-DD4E8D97BAAD@yahoo.com> References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150602015809.GE932@ando.pearwood.info> <79C16144-8BF7-4260-A356-DD4E8D97BAAD@yahoo.com> Message-ID: On 2 June 2015 at 13:10, Andrew Barnert via Python-ideas wrote: > On Jun 1, 2015, at 18:58, Steven D'Aprano wrote: >> Apart from the questions of whether such a change would be allowed by >> the Decimal specification, > > As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing. > > As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case. As far as I know, nobody has looked into it. If there aren't any meaningful differences, we should just switch, if there are differences, we should probably switch anyway, but it will be more work (and hence will require volunteers willing to do that work). Either way, the starting point would be an assessment of what the differences are, and whether or not they have any implications for the decimal module and cdecimal. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Tue Jun 2 07:15:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 22:15:35 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433220422.818880.284335641.7348DADC@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <87h9qqu4df.fsf@uwakimon.sk.tsukuba.ac.jp> <1433220422.818880.284335641.7348DADC@webmail.messagingengine.com> Message-ID: On Jun 1, 2015, at 21:47, random832 at fastmail.us wrote: > >> On Tue, Jun 2, 2015, at 00:31, Guido van Rossum wrote: >> Or perhaps switch to decimal64 ( >> http://en.wikipedia.org/wiki/Decimal64_floating-point_format)? (Or its >> bigger cousing, decimal128) > > Does anyone know if any common computer architectures have any hardware > support for this? IBM's RS/POWER architecture supports decimal32, 64, and 128. The PowerPC and Cell offshoots only support them in some models, not all. Is that common enough? (Is _anything_ common enough besides x86, x86_64, ARM7, ARM8, and various less-capable things like embedded 68k variants?) > Are there any known good implementations for all the > functions in math/cmath for these types? Intel wrote a reference implementation for IEEE 754-2008 as part of the standardization process. And since then, they've focused on improvements geared at making it possible to write highly-optimized financial applications in C or C++ that run on x86_64 hardware. And I think it's BSD-licensed. It's available somewhere on netlib, but searching that repo is no fun on my phone (plus, most of Intel's code, you can't see the license or the detailed README until you unpack it...), so I'll leave it to someone else to find it. Of course 754-2008 isn't necessarily identical to GDAS (which is what POWER implements, and Python's decimal module). > Moving to a fixed-size floating point type does have the advantage of > not requiring making all these decisions about environments and > precision and potentially unbounded growth etc. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From random832 at fastmail.us Tue Jun 2 07:23:57 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 02 Jun 2015 01:23:57 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <74FC1DDF-9DBA-4046-AED0-1B7617F4B173@gmail.com> <20150602015809.GE932@ando.pearwood.info> <79C16144-8BF7-4260-A356-DD4E8D97BAAD@yahoo.com> Message-ID: <1433222637.830982.284351489.524D9FCF@webmail.messagingengine.com> On Tue, Jun 2, 2015, at 01:10, Nick Coghlan wrote: > Either way, the starting point would be an assessment of what the > differences are, and whether or not they have any implications for the > decimal module and cdecimal. Does IEEE even have anything about arbitrary-precision decimal types (which are what decimal/cdecimal are)? From abarnert at yahoo.com Tue Jun 2 07:22:07 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 1 Jun 2015 22:22:07 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15 In-Reply-To: References: Message-ID: On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person wrote: > > I think you're right. I was also considering ... "editing" my Python distribution. If they didn't implement my suggestion for correcting floats, at least they can fix this, instead of making people hack Python for good results! If you're going to reply to digests, please learn how to reply inline instead of top-posting (and how to trim out all the irrelevant stuff). It's next to impossible to tell which part of which of the messages you're replying to even in simple cases like this one, with only 4 messages in the digest. >> On Mon, Jun 1, 2015 at 8:10 PM, wrote: >> Send Python-ideas mailing list submissions to >> python-ideas at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/python-ideas >> or, via email, send a message with subject or body 'help' to >> python-ideas-request at python.org >> >> You can reach the person managing the list at >> python-ideas-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Python-ideas digest..." >> >> >> Today's Topics: >> >> 1. Re: Python Float Update (Steven D'Aprano) >> 2. Re: Python Float Update (Andrew Barnert) >> 3. Re: Python Float Update (Steven D'Aprano) >> 4. Re: Python Float Update (Andrew Barnert) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 2 Jun 2015 11:37:48 +1000 >> From: Steven D'Aprano >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <20150602013748.GD932 at ando.pearwood.info> >> Content-Type: text/plain; charset=us-ascii >> >> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: >> >> > Having some sort of decimal literal would have some advantages of its own, >> > for one it could help against this sillyness: >> > >> > >>> Decimal(1.3) >> > Decimal('1.3000000000000000444089209850062616169452667236328125') >> >> Why is that silly? That's the actual value of the binary float 1.3 >> converted into base 10. If you want 1.3 exactly, you can do this: >> >> > >>> Decimal('1.3') >> > Decimal('1.3') >> >> Is that really so hard for people to learn? >> >> >> > I'm not saying that the actual data type needs to be a decimal ( >> > might well be a float but say shove the string repr next to it so it can be >> > accessed when needed) >> >> You want Decimals to *lie* about what value they have? >> >> I think that's a terrible idea, one which would lead to a whole set of >> new and exciting surprises when using Decimal. Let me try to predict a >> few of the questions on Stackoverflow which would follow this change... >> >> Why is equality so inaccurate in Python? >> >> py> x = Decimal(1.3) >> py> y = Decimal('1.3') >> py> x, y >> (Decimal('1.3'), Decimal('1.3')) >> py> x == y >> False >> >> Why does Python insert extra digits into numbers when I multiply? >> >> py> x = Decimal(1.3) >> py> x >> Decimal('1.3') >> py> y = 10000000000000000*x >> py> y - 13000000000000000 >> Decimal('0.444089209850062616169452667236328125') >> >> >> > ..but this is one really common pitfall for new users, i know its easy to >> > fix the code above, >> > but this behavior is very unintuitive.. you essentially get a really >> > expensive float when you do the obvious thing. >> >> Then don't do the obvious thing. >> >> Sometimes there really is no good alternative to actually knowing what >> you are doing. Floating point maths is inherently hard, but that's not >> the problem. There are all sorts of things in programming which are >> hard, and people learn how to deal with them. The problem is that people >> *imagine* that floating point is simple, when it is not and can never >> be. We don't do them any favours by enabling that delusion. >> >> If your needs are light, then you can ignore the complexities of >> floating point. You really can go a very long way by just rounding the >> results of your calculations when displaying them. But for anything more >> than that, we cannot just paper over the floating point complexities >> without creating new complexities that will burn people. >> >> You don't have to become a floating point guru, but it really isn't >> onerous to expect people who are programming to learn a few basic >> programming skills, and that includes a few basic coping strategies for >> floating point. >> >> >> >> -- >> Steve >> >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 1 Jun 2015 19:21:47 -0700 >> From: Andrew Barnert >> To: Andrew Barnert >> Cc: Nick Coghlan , python-ideas >> >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <5E8271BF-183E-496D-A556-81C407977FFE at yahoo.com> >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 19:00, Andrew Barnert wrote: >> > >> >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas wrote: >> >> >> >>> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >> >>> >> >>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> >>> wrote: >> >>>> But the basic idea can be extracted out and Pythonified: >> >>>> >> >>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, which is either a subclass of float, or an unrelated class that has a __float__ method. Doing any calculation on it gives you a float. But as long as you leave it alone as a FloatLiteral, it has its literal characters available for any function that wants to distinguish FloatLiteral from float, like the Decimal constructor. >> >>>> >> >>>> The problem that Python faces that Swift doesn't is that Python doesn't use static typing and implicit compile-time conversions. So in Python, you'd be passing around these larger values and doing the slow conversions at runtime. That may or may not be unacceptable; without actually building it and testing some realistic programs it's pretty hard to guess. >> >>> >> >>> Joonas's suggestion of storing the original text representation passed >> >>> to the float constructor is at least a novel one - it's only the idea >> >>> of actual decimal literals that was ruled out in the past. >> >> >> >> I actually built about half an implementation of something like Swift's LiteralConvertible protocol back when I was teaching myself Swift. But I think I have a simpler version that I could implement much more easily. >> >> >> >> Basically, FloatLiteral is just a subclass of float whose __new__ stores its constructor argument. Then decimal.Decimal checks for that stored string and uses it instead of the float value if present. Then there's an import hook that replaces every Num with a call to FloatLiteral. >> >> >> >> This design doesn't actually fix everything; in effect, 1.3 actually compiles to FloatLiteral(str(float('1.3')) (because by the time you get to the AST it's too late to avoid that first conversion). Which does actually solve the problem with 1.3, but doesn't solve everything in general (e.g., just feed in a number that has more precision than a double can hold but less than your current decimal context can...). >> >> >> >> But it just lets you test whether the implementation makes sense and what the performance effects are, and it's only an hour of work, >> > >> > Make that 15 minutes. >> > >> > https://github.com/abarnert/floatliteralhack >> >> And as it turns out, hacking the tokens is no harder than hacking the AST (in fact, it's a little easier; I'd just never done it before), so now it does that, meaning you really get the actual literal string from the source, not the repr of the float of that string literal. >> >> Turning this into a real implementation would obviously be more than half an hour's work, but not more than a day or two. Again, I don't think anyone would actually want this, but now people who think they do have an implementation to play with to prove me wrong. >> >> >> and doesn't require anyone to patch their interpreter to play with it. If it seems promising, then hacking the compiler so 2.3 compiles to FloatLiteral('2.3') may be worth doing for a test of the actual functionality. >> >> >> >> I'll be glad to hack it up when I get a chance tonight. But personally, I think decimal literals are a better way to go here. Decimal(1.20) magically doing what you want still has all the same downsides as 1.20d (or implicit decimal literals), plus it's more complex, adds performance costs, and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little nicer than Decimal('1.20'), but only a little--and nowhere near as nice as 1.20d). >> >> >> >>> Aside from the practical implementation question, the main concern I >> >>> have with it is that we'd be trading the status quo for a situation >> >>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> >> >> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which implies that maybe the simplification in Decimal(1.3) is more misleading than helpful. (Notice that this problem also doesn't arise for decimal literals--13/10d is int vs. Decimal division, which is correct out of the box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) >> >> >> >>> It seems to me that a potentially better option might be to adjust the >> >>> implicit float->Decimal conversion in the Decimal constructor to use >> >>> the same algorithm as we now use for float.__repr__ [1], where we look >> >>> for the shortest decimal representation that gives the same answer >> >>> when rendered as a float. At the moment you have to indirect through >> >>> str() or repr() to get that behaviour: >> >>> >> >>>>>> from decimal import Decimal as D >> >>>>>> 1.3 >> >>> 1.3 >> >>>>>> D('1.3') >> >>> Decimal('1.3') >> >>>>>> D(1.3) >> >>> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >>>>>> D(str(1.3)) >> >>> Decimal('1.3') >> >>> >> >>> Cheers, >> >>> Nick. >> >>> >> >>> [1] http://bugs.python.org/issue1580 >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> ------------------------------ >> >> Message: 3 >> Date: Tue, 2 Jun 2015 13:00:40 +1000 >> From: Steven D'Aprano >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <20150602030040.GF932 at ando.pearwood.info> >> Content-Type: text/plain; charset=utf-8 >> >> Nicholas, >> >> Your email client appears to not be quoting text you quote. It is a >> conventional to use a leading > for quoting, perhaps you could configure >> your mail program to do so? The good ones even have a "Paste As Quote" >> command. >> >> On with the substance of your post... >> >> On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote: >> >> > I guess it?s a non-trivial tradeoff. But I would lean towards considering >> > people likely to be affected by the performance hit as doing something ?not >> > common?. Like, if they are doing that many calculations that it matters, >> > perhaps it makes sense to ask them to explicitly ask for floats vs. >> > decimals, in exchange for giving the majority who wouldn?t notice a >> > performance difference a better user experience. >> >> Changing from binary floats to decimal floats by default is a big, >> backwards incompatible change. Even if it's a good idea, we're >> constrained by backwards compatibility: I would imagine we wouldn't want >> to even introduce this feature until the majority of people are using >> Python 3 rather than Python 2, and then we'd probably want to introduce >> it using a "from __future__ import decimal_floats" directive. >> >> So I would guess this couldn't happen until probably 2020 or so. >> >> But we could introduce a decimal literal, say 1.1d for Decimal("1.1"). >> The first prerequisite is that we have a fast Decimal implementation, >> which we now have. Next we would have to decide how the decimal literals >> would interact with the decimal module. Do we include full support of >> the entire range of decimal features, including globally configurable >> precision and other modes? Or just a subset? How will these decimals >> interact with other numeric types, like float and Fraction? At the >> moment, Decimal isn't even part of the numeric tower. >> >> There's a lot of ground to cover, it's not a trivial change, and will >> definitely need a PEP. >> >> >> > How many of your examples are inherent limitations of decimals vs. problems >> > that can be improved upon? >> >> In one sense, they are inherent limitations of floating point numbers >> regardless of base. Whether binary, decimal, hexadecimal as used in some >> IBM computers, or something else, you're going to see the same problems. >> Only the specific details will vary, e.g. 1/3 cannot be represented >> exactly in base 2 or base 10, but if you constructed a base 3 float, it >> would be exact. >> >> In another sense, Decimal has a big advantage that it is much more >> configurable than Python's floats. Decimal lets you configure the >> precision, rounding mode, error handling and more. That's not inherent >> to base 10 calculations, you can do exactly the same thing for binary >> floats too, but Python doesn't offer that feature for floats, only for >> Decimals. >> >> But no matter how you configure Decimal, all you can do is shift the >> gotchas around. The issue really is inherent to the nature of the >> problem, and you cannot defeat the universe. Regardless of what >> base you use, binary or decimal or something else, or how many digits >> precision, you're still trying to simulate an uncountably infinite >> continuous, infinitely divisible number line using a finite, >> discontinuous set of possible values. Something has to give. >> >> (For the record, when I say "uncountably infinite", I don't just mean >> "too many to count", it's a technical term. To oversimplify horribly, it >> means "larger than infinity" in some sense. It's off-topic for here, >> but if anyone is interested in learning more, you can email me off-list, >> or google for "countable vs uncountable infinity".) >> >> Basically, you're trying to squeeze an infinite number of real numbers >> into a finite amount of memory. It can't be done. Consequently, there >> will *always* be some calculations where the true value simply cannot be >> calculated and the answer you get is slightly too big or slightly too >> small. All the other floating point gotchas follow from that simple >> fact. >> >> >> > Admittedly, the only place where I?ve played with decimals extensively is >> > on Microsoft?s SQL Server (where they are the default literal >> > ). I?ve stumbled in >> > the past on my own decimal gotchas >> > , but looking at your examples >> > and trying them on SQL Server I suspect that most of the problems you show >> > are problems of precision and scale. >> >> No. Change the precision and scale, and some *specific* problems goes >> away, but they reappear with other numbers. >> >> Besides, at the point that you're talking about setting the precision, >> we're really not talking about making things easy for beginners any >> more. >> >> And not all floating point issues are related to precision and scale in >> decimal. You cannot divide a cake into exactly three equal pieces in >> Decimal any more than you can divide a cake into exactly three equal >> pieces in binary. All you can hope for is to choose a precision were the >> rounding errors in one part of your calculation will be cancelled by the >> rounding errors in another part of your calculation. And that precision >> will be different for any two arbitrary calculations. >> >> >> >> -- >> Steve >> >> >> ------------------------------ >> >> Message: 4 >> Date: Mon, 1 Jun 2015 20:10:29 -0700 >> From: Andrew Barnert >> To: Steven D'Aprano >> Cc: "python-ideas at python.org" >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <79C16144-8BF7-4260-A356-DD4E8D97BAAD at yahoo.com> >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 18:58, Steven D'Aprano wrote: >> > >> >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: >> >> >> >> It seems to me that a potentially better option might be to adjust the >> >> implicit float->Decimal conversion in the Decimal constructor to use >> >> the same algorithm as we now use for float.__repr__ [1], where we look >> >> for the shortest decimal representation that gives the same answer >> >> when rendered as a float. At the moment you have to indirect through >> >> str() or repr() to get that behaviour: >> > >> > Apart from the questions of whether such a change would be allowed by >> > the Decimal specification, >> >> As far as I know, GDAS doesn't specify anything about implicit conversion from floats. As long as the required explicit conversion function (which I think is from_float?) exists and does the required thing. >> >> As a side note, has anyone considered whether it's worth switching to IEEE-754-2008 as the controlling specification? There may be a good reason not to do so; I'm just curious whether someone has thought it through and made the case. >> >> > and the breaking of backwards compatibility, >> > I would really hate that change for another reason. >> > >> > At the moment, a good, cheap way to find out what a binary float "really >> > is" (in some sense) is to convert it to Decimal and see what you get: >> > >> > Decimal(1.3) >> > -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> > >> > If you want conversion from repr, then you can be explicit about it: >> > >> > Decimal(repr(1.3)) >> > -> Decimal('1.3') >> > >> > ("Explicit is better than implicit", as they say...) >> > >> > Although in fairness I suppose that if this change happens, we could >> > keep the old behaviour in the from_float method: >> > >> > # hypothetical future behaviour >> > Decimal(1.3) >> > -> Decimal('1.3') >> > Decimal.from_float(1.3) >> > -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> > >> > But all things considered, I don't think we're doing people any favours >> > by changing the behaviour of float->Decimal conversions to implicitly >> > use the repr() instead of being exact. I expect this strategy is like >> > trying to flatten a bubble under wallpaper: all you can do is push the >> > gotchas and surprises to somewhere else. >> > >> > Oh, another thought... Decimals could gain yet another conversion >> > method, one which implicitly uses the float repr, but signals if it was >> > an inexact conversion or not. Explicitly calling repr can never signal, >> > since the conversion occurs outside of the Decimal constructor and >> > Decimal sees only the string: >> > >> > Decimal(repr(1.3)) cannot signal Inexact. >> > >> > But: >> > >> > Decimal.from_nearest_float(1.5) # exact >> > Decimal.from_nearest_float(1.3) # signals Inexact >> > >> > That might be useful, but probably not to beginners. >> >> I think this might be worth having whether the default constructor is changed or not. >> >> I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it. >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> >> ------------------------------ >> >> End of Python-ideas Digest, Vol 103, Issue 15 >> ********************************************* > > > > -- > -Surya Subbarao -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Tue Jun 2 08:40:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 06:40:34 +0000 (UTC) Subject: [Python-ideas] Python Float Update In-Reply-To: <1433222637.830982.284351489.524D9FCF@webmail.messagingengine.com> References: <1433222637.830982.284351489.524D9FCF@webmail.messagingengine.com> Message-ID: <2146195060.3229211.1433227234150.JavaMail.yahoo@mail.yahoo.com> On Monday, June 1, 2015 10:23 PM, "random832 at fastmail.us" wrote: >Does IEEE even have anything about arbitrary-precision decimal types >(which are what decimal/cdecimal are)? Yes. When many people say "IEEE float" they still mean 754-1985. This is what C90 was designed to "support without quite supporting", and what C99 explicitly supports, and what many consumer FPUs support (or, in the case of the 8087 and its successors, a preliminary version of the 1985 standard). That standard did not cover either arbitrary precision or decimals; both of those were only part of the companion standard 854 (which isn't complete enough to base an implementation on). But the current version of the standard, 754-2008, does cover arbitrary-precision decimal types. If I understand the relationship between the standards: 754-2008 was designed to merge 754-1985 and 854-1987, fill in the gaps, and fix any bugs; GDAS was a major influence (the committee chair was GDAS's author); and since 2009 GDAS has gone from being a de facto independent standard to being a more-specific specification of the relevant subset of 754-2008. IBM's hardware and Java library implement GDAS (and therefore implicitly the relevant part of 754-2008); Itanium (partly), C11, the gcc extensions, and Intel's C library implement 754-2008 (or IEC 60559, which is just a republished 754-2008). So, my guess is that GDAS makes perfect sense to follow unless Python wants to expose C11's native fixed decimals, or the newer math.h functions from C99/C11/C14, or the other parts of 754-2008 that it doesn't support (like arbitrary-precision binary). My question was just whether someone had actually made that decision, or whether decimal is following GDAS just because that was the obvious decision to make in 2003. From mal at egenix.com Tue Jun 2 09:53:03 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 02 Jun 2015 09:53:03 +0200 Subject: [Python-ideas] Python Float Update In-Reply-To: <2146195060.3229211.1433227234150.JavaMail.yahoo@mail.yahoo.com> References: <1433222637.830982.284351489.524D9FCF@webmail.messagingengine.com> <2146195060.3229211.1433227234150.JavaMail.yahoo@mail.yahoo.com> Message-ID: <556D60DF.9050904@egenix.com> On 02.06.2015 08:40, Andrew Barnert via Python-ideas wrote: > On Monday, June 1, 2015 10:23 PM, "random832 at fastmail.us" wrote: > >> Does IEEE even have anything about arbitrary-precision decimal types > >> (which are what decimal/cdecimal are)? > > Yes. > > When many people say "IEEE float" they still mean 754-1985. This is what C90 was designed to "support without quite supporting", and what C99 explicitly supports, and what many consumer FPUs support (or, in the case of the 8087 and its successors, a preliminary version of the 1985 standard). That standard did not cover either arbitrary precision or decimals; both of those were only part of the companion standard 854 (which isn't complete enough to base an implementation on). > > But the current version of the standard, 754-2008, does cover arbitrary-precision decimal types. > > If I understand the relationship between the standards: 754-2008 was designed to merge 754-1985 and 854-1987, fill in the gaps, and fix any bugs; GDAS was a major influence (the committee chair was GDAS's author); and since 2009 GDAS has gone from being a de facto independent standard to being a more-specific specification of the relevant subset of 754-2008. IBM's hardware and Java library implement GDAS (and therefore implicitly the relevant part of 754-2008); Itanium (partly), C11, the gcc extensions, and Intel's C library implement 754-2008 (or IEC 60559, which is just a republished 754-2008). > > So, my guess is that GDAS makes perfect sense to follow unless Python wants to expose C11's native fixed decimals, or the newer math.h functions from C99/C11/C14, or the other parts of 754-2008 that it doesn't support (like arbitrary-precision binary). My question was just whether someone had actually made that decision, or whether decimal is following GDAS just because that was the obvious decision to make in 2003. The IBM decimal implementation by Mike Cowlishaw was chosen as basis for the Python's decimal implementation at the time, so yes, this was an explicit design choice at the time: http://legacy.python.org/dev/peps/pep-0327/ http://speleotrove.com/decimal/ According to the PEP, decimal implements IEEE 854-1987 (with some restrictions). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From p.f.moore at gmail.com Tue Jun 2 10:13:39 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 2 Jun 2015 09:13:39 +0100 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150602013748.GD932@ando.pearwood.info> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> Message-ID: On 2 June 2015 at 02:37, Steven D'Aprano wrote: > Sometimes there really is no good alternative to actually knowing what > you are doing. +1 QOTW From mal at egenix.com Tue Jun 2 10:19:39 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Tue, 02 Jun 2015 10:19:39 +0200 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150602013748.GD932@ando.pearwood.info> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> Message-ID: <556D671B.6040803@egenix.com> On 02.06.2015 03:37, Steven D'Aprano wrote: > On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: > >> Having some sort of decimal literal would have some advantages of its own, >> for one it could help against this sillyness: >> >>>>> Decimal(1.3) >> Decimal('1.3000000000000000444089209850062616169452667236328125') > > Why is that silly? That's the actual value of the binary float 1.3 > converted into base 10. If you want 1.3 exactly, you can do this: > >>>>> Decimal('1.3') >> Decimal('1.3') > > Is that really so hard for people to learn? Joonas, I think you're approaching this from the wrong angle. People who want to get an exact decimal from a literal, will use the string representation to define it, not a float representation. In practice, you typically read the data from some file or stream anyway, so it already comes as string value and if you want to convert an actual float to a decimal, this will most likely not be done in a literal way, but instead by passed in to the Decimal constructor as variable, so there's no literal involved. It may be good to provide some alternative ways of converting a float to a decimal, e.g. one which uses the float repr logic to overcome things like repr(float(1.1)) == '1.1000000000000001' instead of a direct conversion: >>> Decimal(1.1) Decimal('1.100000000000000088817841970012523233890533447265625') >>> Decimal(repr(1.1)) Decimal('1.1') These could be added as parameter to the Decimal constructor. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dennis at kaarsemaker.net Tue Jun 2 10:23:35 2015 From: dennis at kaarsemaker.net (Dennis Kaarsemaker) Date: Tue, 02 Jun 2015 10:23:35 +0200 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: References: Message-ID: <1433233415.13986.54.camel@kaarsemaker.net> On za, 2015-05-30 at 11:45 -0400, Ron Adam wrote: > The solution I found was to call a function to explicitly set the shared > items in the imported module. This reminds me of my horrible april fools hack of 2013 to make Python look more like perl: http://seveas.net/exporter/ -- Dennis Kaarsemaker http://www.kaarsemaker.net From tritium-list at sdamon.com Tue Jun 2 11:38:47 2015 From: tritium-list at sdamon.com (Alexander Walters) Date: Tue, 02 Jun 2015 05:38:47 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <556D671B.6040803@egenix.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> Message-ID: <556D79A7.7080904@sdamon.com> I think there is another discussion to have here, and that is making Decimal part of the language (__builtin(s)__) vs. part of the library (which implementations can freely omit). If it were part of the language, then maybe, just maybe, a literal syntax should be considered. As it stands, Decimal and Fraction are libraries - implementations of python are free to omit them (as I think some of the embedded platform implementations do), and it currently does not make a lick of sense to add syntax for something that is only in the library. On 6/2/2015 04:19, M.-A. Lemburg wrote: > On 02.06.2015 03:37, Steven D'Aprano wrote: >> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: >> >>> Having some sort of decimal literal would have some advantages of its own, >>> for one it could help against this sillyness: >>> >>>>>> Decimal(1.3) >>> Decimal('1.3000000000000000444089209850062616169452667236328125') >> Why is that silly? That's the actual value of the binary float 1.3 >> converted into base 10. If you want 1.3 exactly, you can do this: >> >>>>>> Decimal('1.3') >>> Decimal('1.3') >> Is that really so hard for people to learn? > Joonas, I think you're approaching this from the wrong angle. > > People who want to get an exact decimal from a literal, will > use the string representation to define it, not a float > representation. > > In practice, you typically read the data from some file or stream > anyway, so it already comes as string value and if you want to > convert an actual float to a decimal, this will most likely > not be done in a literal way, but instead by passed in to > the Decimal constructor as variable, so there's no literal > involved. > > It may be good to provide some alternative ways of converting > a float to a decimal, e.g. one which uses the float repr logic > to overcome things like repr(float(1.1)) == '1.1000000000000001' > instead of a direct conversion: > >>>> Decimal(1.1) > Decimal('1.100000000000000088817841970012523233890533447265625') >>>> Decimal(repr(1.1)) > Decimal('1.1') > > These could be added as parameter to the Decimal constructor. > From ncoghlan at gmail.com Tue Jun 2 14:34:30 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Jun 2015 22:34:30 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <556D79A7.7080904@sdamon.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> Message-ID: On 2 June 2015 at 19:38, Alexander Walters wrote: > I think there is another discussion to have here, and that is making Decimal > part of the language (__builtin(s)__) vs. part of the library (which > implementations can freely omit). If it were part of the language, then > maybe, just maybe, a literal syntax should be considered. For decimal, the issues that keep it from becoming a literal are similar to those that keep it from becoming a builtin: configurable contexts are a core part of the decimal module's capabilities, and making a builtin type context dependent causes various problems when it comes to reasoning about a piece of code based on purely local information. Those problems affect human readers regardless, but once literals enter the mix, they affect all compile time processing as well. On that front, I also finally found the (mammoth) thread from last year about the idea of using base 10 for floating point values by default: https://mail.python.org/pipermail/python-ideas/2014-March/026436.html One of the things we eventually realised in that thread is that the context dependence problem, while concerning for a builtin type, is an absolute deal breaker for literals, because it means you *can't constant fold them* by calculating the results of expressions at compile time and store the result directly into the code object (https://mail.python.org/pipermail/python-ideas/2014-March/026998.html). This problem is illustrated by asking the following question: What is the result of "Decimal('1.0') + Decimal('1e70')"? Correct answer? Insufficient data (since we don't know the current decimal precision). With the current decimal module, the configurable rounding behaviour is something you just need to learn about as part of adopting the module. Since that configurability is one of the main reasons for using it over binary floating point, that's generally not a big deal. It becomes a much bigger deal when the question being asked is: What is the result of "1.0d + 1e70d"? Those look like they should be numeric constants, and hence the compiler should be able to constant fold them at compile time. That's possible if we were to pick a single IEEE decimal type as a builtin (either decimal64 or decimal128), but not possible if we tried to use the current variable precision decimal type. One of the other "fun" discrepancies introduced by the context sensitive processing in decimals is that unary plus and minus are context-sensitive, which means that any literal format can't express arbitrary negative decimal values without a parser hack to treat the minus sign as part of the trailing literal. This is one of the other main reasons why decimal64 or decimal128 are better candidates for a builtin decimal type than decimal.Decimal as it exists today (as well as being potentially more amenable to hardware acceleration on some platforms). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Tue Jun 2 14:40:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 05:40:42 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <556D79A7.7080904@sdamon.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> Message-ID: <429B7BCC-57E6-43C2-9D0D-09FD1B8468E2@yahoo.com> On Jun 2, 2015, at 02:38, Alexander Walters wrote: > > I think there is another discussion to have here, and that is making Decimal part of the language (__builtin(s)__) vs. part of the library (which implementations can freely omit). I don't think there is any such distinction in Python. Neither the language reference nor the library reference claims to be a specification. The library documentation specifically says that it "describes the standard library that is distributed with Python" and "also describes some of the optional components that are commonly included in Python distributions", which implies that, except for the handful of modules that are described as optional or platform-specific, everything should always be there. (There is special dispensation for Unix systems to split up Python into separate packages, but even that is specifically limited to "some or all of the optional components".) Historically, implementations that haven't included the entire stdlib also haven't included parts of the language (Jython 2.2 and early 2.5 versions, early versions of PyPy, the various browser-based implementations, MicroPython and PyMite, etc.). Also, both the builtins module and the actual built-in functions, constants, types, and exceptions it contains are documented as part of the library, just like decimal, not as part of the language. So, Python isn't like C, with separate specifications for "freestanding" vs. "hosted" implementations, and it doesn't have a separate specification for an "embedded" subset like C++ used to. > If it were part of the language, then maybe, just maybe, a literal syntax should be considered. Since there is no such distinction between language and library, I think we're free to define a literal syntax for decimals and fractions. From a practical point of view (which beats purity, of course), it's probably not reasonable for CPython to define such literals unless there's a C implementation that defines the numeric type slot (and maybe even has a C API concrete type interface, although maybe not), and which can be "frozen" at build time. (See past discussions on adding an OrderedDict literal for why these things are important.) That's currently true for Decimal, but not for Fraction. So, that might be an argument against fraction literals, or for providing a C implementation of the fraction module. > As it stands, Decimal and Fraction are libraries - implementations of python are free to omit them (as I think some of the embedded platform implementations do), and it currently does not make a lick of sense to add syntax for something that is only in the library. Even besides the four library sections on the various kinds of built-in things, plenty of other things are syntax for something that's "only in the library". The import statement is defined in terms of functionality in importlib, and (at least in CPython) actually implemented that way. In fact, numeric values, as defined in the data model section of the language reference, are defined in terms of types from the library docs, both in stdtypes and in the numbers module. Defining decimal values in terms of types defined in the decimal module library section would be no different. (Numeric _literals_ don't seem to have their semantics defined anywhere, just their syntax, but it's pretty obvious from the wording that they're intended to have int, float, and complex values as defined by the data model--which, again, means as defined by the library.) So, while there are potentially compelling arguments against a decimal literal (how it interacts with contexts may be confusing, the idea may be too bikesheddable to come up with one true design that everyone will like, or may be an attractive nuisance, it may add too much complexity to the implementation for the benefit, etc.), "decimal is only a library" doesn't seem to be one. >> On 6/2/2015 04:19, M.-A. Lemburg wrote: >>> On 02.06.2015 03:37, Steven D'Aprano wrote: >>>> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: >>>> >>>> Having some sort of decimal literal would have some advantages of its own, >>>> for one it could help against this sillyness: >>>> >>>>>>> Decimal(1.3) >>>> Decimal('1.3000000000000000444089209850062616169452667236328125') >>> Why is that silly? That's the actual value of the binary float 1.3 >>> converted into base 10. If you want 1.3 exactly, you can do this: >>> >>>>>>> Decimal('1.3') >>>> Decimal('1.3') >>> Is that really so hard for people to learn? >> Joonas, I think you're approaching this from the wrong angle. >> >> People who want to get an exact decimal from a literal, will >> use the string representation to define it, not a float >> representation. >> >> In practice, you typically read the data from some file or stream >> anyway, so it already comes as string value and if you want to >> convert an actual float to a decimal, this will most likely >> not be done in a literal way, but instead by passed in to >> the Decimal constructor as variable, so there's no literal >> involved. >> >> It may be good to provide some alternative ways of converting >> a float to a decimal, e.g. one which uses the float repr logic >> to overcome things like repr(float(1.1)) == '1.1000000000000001' >> instead of a direct conversion: >> >>>>> Decimal(1.1) >> Decimal('1.100000000000000088817841970012523233890533447265625') >>>>> Decimal(repr(1.1)) >> Decimal('1.1') >> >> These could be added as parameter to the Decimal constructor. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Jun 2 14:44:02 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 02 Jun 2015 08:44:02 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> Message-ID: <1433249042.1139401.284643833.6ACF01EC@webmail.messagingengine.com> On Tue, Jun 2, 2015, at 08:34, Nick Coghlan wrote: > For decimal, the issues that keep it from becoming a literal are > similar to those that keep it from becoming a builtin: configurable > contexts are a core part of the decimal module's capabilities, and > making a builtin type context dependent causes various problems when > it comes to reasoning about a piece of code based on purely local > information. Those problems affect human readers regardless, but once > literals enter the mix, they affect all compile time processing as > well. Why do contexts exist? Why isn't this an issue for float, despite floating point contexts being something that exists in IEEE 754? As for constant folding - well, maybe python needs a -ffast-math equivalent. From abarnert at yahoo.com Tue Jun 2 14:50:42 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 05:50:42 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <1433249042.1139401.284643833.6ACF01EC@webmail.messagingengine.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> <1433249042.1139401.284643833.6ACF01EC@webmail.messagingengine.com> Message-ID: <923B13D8-395A-4E80-9E83-3DB46DD19D5E@yahoo.com> On Jun 2, 2015, at 05:44, random832 at fastmail.us wrote: > >> On Tue, Jun 2, 2015, at 08:34, Nick Coghlan wrote: >> For decimal, the issues that keep it from becoming a literal are >> similar to those that keep it from becoming a builtin: configurable >> contexts are a core part of the decimal module's capabilities, and >> making a builtin type context dependent causes various problems when >> it comes to reasoning about a piece of code based on purely local >> information. Those problems affect human readers regardless, but once >> literals enter the mix, they affect all compile time processing as >> well. > > Why do contexts exist? Why isn't this an issue for float, despite > floating point contexts being something that exists in IEEE 754? The issue here isn't really binary vs. decimal, but rather that float implements a specific fixed-precision (binary) float type, and Decimal implements a configurable-precision (decimal) float type. As Nick explained elsewhere in that message, decimal64 or decimal128 wouldn't have the context problem. And similarly, a binary.Binary type designed like decimal.Decimal would have the context problem. (This is a slight oversimplification; there's also the fact that Decimal implements the full set of 754-2008 context features, while float implements a subset of 754-1985 features, and even that only if the underlying C lib does so, and nobody ever uses them anyway.) From abarnert at yahoo.com Tue Jun 2 15:05:32 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 06:05:32 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> Message-ID: <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> On Jun 2, 2015, at 05:34, Nick Coghlan wrote: > > This is one of the other > main reasons why decimal64 or decimal128 are better candidates for a > builtin decimal type than decimal.Decimal as it exists today (as well > as being potentially more amenable to hardware acceleration on some > platforms). OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them? I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?), how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial), how it fits into the numeric tower ABCs, and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes and into math, and whether we need decimal complex values, and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...); then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing? (Of course while we're at it, it would be nice to have arbitrary-precision IEEE binary floats as well, modeled on the decimal module, and to add all the missing 754-2008/C11 methods/math functions for the existing float type, but those seem like separate proposals from fixed-precision decimal floats.) From oscar.j.benjamin at gmail.com Tue Jun 2 16:05:07 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 2 Jun 2015 15:05:07 +0100 Subject: [Python-ideas] Python Float Update In-Reply-To: <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> Message-ID: On 2 June 2015 at 14:05, Andrew Barnert via Python-ideas wrote: > On Jun 2, 2015, at 05:34, Nick Coghlan wrote: >> >> This is one of the other >> main reasons why decimal64 or decimal128 are better candidates for a >> builtin decimal type than decimal.Decimal as it exists today (as well >> as being potentially more amenable to hardware acceleration on some >> platforms). > > OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them? > > I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?), I would argue that it should be as simple as float. If someone wants the rest of it they've got the Decimal module which is more than enough for their needs. > how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial), Interaction between decimalN and Decimal coerces to Decimal. Interaction with floats is a TypeError. > how it fits into the numeric tower ABCs, Does anyone really use these for anything? I haven't really found them to be very useful since no third-party numeric types use them and they don't really define the kind of information that you might really want in any carefully written numerical algorithm. I don't see any gain in adding any decimal types to e.g Real as the ABCs seem irrelevant to me. > and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes It's not essential to incorporate them here. If they become commonly used in C then it would be good to have these for binary compatibility. > and into math, and whether we need decimal complex values, It's not common to use the math-style functions with the decimal module unless you're using it as a multi-precision library and then you'd really want the full Decimal type. There's no advantage in using decimal for e.g. sin, cos etc. so there's not much really lost in converting to binary and back. It's in the simple arithmetic where it makes a difference so I'd say that decimal should stick to that. As for complex decimals this would only really be worth it if the ultimate plan was to have decimals become the default floating point type. Laura suggested that earlier and I probably agree that it would have been a good idea at some earlier time but it's a bit late for that. > and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...); Presumably CPython would have to write it's own implementation e.g.: PyDecimal64_FromIntExponentAndLongSignificand ... or something like that. > then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing? I don't think anyone has proposed to add all of the things that you suggested. Of course if there are decimal literals and a fixed-width decimal type then over time people will suggest some of the other things. That doesn't mean that they'd need to be incorporated though. A year ago I said I'd write a PEP for decimal literals but then I got clobbered at work and a number of other things happened so that I didn't even have time to read threads like this. Maybe it's worth revisiting... Oscar From abarnert at yahoo.com Tue Jun 2 17:14:14 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 08:14:14 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> Message-ID: <8BC3C92B-7C84-491B-823C-6A7704D9EB26@yahoo.com> On Jun 2, 2015, at 07:05, Oscar Benjamin wrote: > > On 2 June 2015 at 14:05, Andrew Barnert via Python-ideas > wrote: >> On Jun 2, 2015, at 05:34, Nick Coghlan wrote: >>> >>> This is one of the other >>> main reasons why decimal64 or decimal128 are better candidates for a >>> builtin decimal type than decimal.Decimal as it exists today (as well >>> as being potentially more amenable to hardware acceleration on some >>> platforms). >> >> OK, so what are the stumbling blocks to adding decimal32/64/128 (or just one of the three), either in builtins/stdtypes or in decimal, and then adding literals for them? >> >> I can imagine a few: someone has to work out exactly what features to support (the same things as float, or everything in the standard?), > > I would argue that it should be as simple as float. If someone wants > the rest of it they've got the Decimal module which is more than > enough for their needs. But decimal64 and Decimal are not the same types. So, if you want to, e.g., get the next decimal64 value after the current value, how would you do that? (Unless you're suggesting there should be a builtin decimal64 and a separate decimal.decimal64 or something, but I don't think you are.) Also, with float, we can get away with saying we're supporting the 1985 standard and common practice among C90 implementations; with decimal64, the justification for arbitrarily implementing part of the 2008 standard but not the rest is not as clear-cut. >> how it interacts with Decimal and float (which is defined by the standard, but translating that to Python isn't quite trivial), > > Interaction between decimalN and Decimal coerces to Decimal. Even when the current decimal context is too small to hold a decimalN? Does that raise any flags? > Interaction with floats is a TypeError. > >> how it fits into the numeric tower ABCs, > > Does anyone really use these for anything? I haven't really found them > to be very useful since no third-party numeric types The NumPy native types do. (Of course they also subclass int and float even relevant.) > use them and they > don't really define the kind of information that you might really want > in any carefully written numerical algorithm. I don't see any gain in > adding any decimal types to e.g Real as the ABCs seem irrelevant to > me. Even if they are completely irrelevant, unless they're deprecated they pretty much have to be supported by any new types. There might be a good argument that decimal64 doesn't fit into the numeric tower, but you'd have to make that argument. >> and what syntax to use for the literals, and if/how it fits into things like array/struct/ctypes > > It's not essential to incorporate them here. If they become commonly > used in C then it would be good to have these for binary > compatibility. For ctypes, sure (although even there, ctypes is a relatively simple way to share values between pure-Python child processes with multiprocessing.shared_ctypes). But for array, that's generally not about compatibility with existing C code, it's about efficiently packing zillions of homogenous simple values into as little memory as possible. >> and into math, and whether we need decimal complex values, > > It's not common to use the math-style functions with the decimal > module Well, math is mostly about double functions from the C90 stdlib, so it's not common to use them with decimal. But that doesn't mean you wouldn't want decimal64 implementations of some of the functions in math. > unless you're using it as a multi-precision library and then > you'd really want the full Decimal type. But again, the full Decimal type isn't just an expansion on decimal64, it's a completely different type, with context-sensitive precision. > There's no advantage in using > decimal for e.g. sin, cos etc. > so there's not much really lost in > converting to binary and back. There's still rounding error. Sure, usually that won't make a difference--but when it does, it will be surprising and frustrating if you didn't explicitly ask for it. > It's in the simple arithmetic where it > makes a difference so I'd say that decimal should stick to that. > > As for complex decimals this would only really be worth it if the > ultimate plan was to have decimals become the default floating point > type. Why? > Laura suggested that earlier and I probably agree that it would > have been a good idea at some earlier time but it's a bit late for > that. > >> and what the C API looks like (it would be nice if PyDecimal64_AsDecimal64 worked as expected on C11 platforms, but you could still use decimal64 on C90 platforms and just not get such functions...); > > Presumably CPython would have to write it's own implementation e.g.: > > PyDecimal64_FromIntExponentAndLongSignificand > > ... or something like that. Sure, if you want a C API for C90 platforms at all. But you may not even need that. When would you need to write C code that deals with decimal64 values as exponent and significant? Dealing with them as abstract numbers, general Python objects, native decimal64, and maybe even opaque values that I can pass around in C without being able to interpret them, I can see, but what C code needs the exponent and significand? >> then write a PEP; then write an implementation; and after all that work, the result may be seen as too much extra complexity (either in the language or in the implementation) for the benefits. But is that it, or is there even more that I'm missing? > > I don't think anyone has proposed to add all of the things that you > suggested. I think in many (but maybe not all) of these cases the simplest answer is the best, but a PEP would have to actually make that case for each thing. > Of course if there are decimal literals and a fixed-width > decimal type then over time people will suggest some of the other > things. That doesn't mean that they'd need to be incorporated though. > > A year ago I said I'd write a PEP for decimal literals but then I got > clobbered at work and a number of other things happened so that I > didn't even have time to read threads like this. Maybe it's worth > revisiting... Maybe we need a PEP for the decimalN type(s) first, then if someone has time and inclination they can write a PEP for literals for those types, either as a companion or as a followup. That would probably cut out 30-50% of the work, and maybe even more of the room for argument and bikeshedding. From tjreedy at udel.edu Tue Jun 2 19:29:14 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 02 Jun 2015 13:29:14 -0400 Subject: [Python-ideas] Python Float Update In-Reply-To: <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> References: <1433129837.38160.283114145.0AFD5C3F@webmail.messagingengine.com> <64B749C1-501A-4A9E-B254-6C288CDC7257@yahoo.com> <1433163587.208341.283463505.046453FA@webmail.messagingengine.com> <20150602013748.GD932@ando.pearwood.info> <556D671B.6040803@egenix.com> <556D79A7.7080904@sdamon.com> <80A3779D-F68E-4573-924F-142AE59AB0D2@yahoo.com> Message-ID: On 6/2/2015 9:05 AM, Andrew Barnert via Python-ideas wrote: > OK, so what are the stumbling blocks to adding decimal32/64/128 (or > just one of the three), either in builtins/stdtypes or in decimal, > and then adding literals for them? A compelling rationale. Python exposes the two basic number types used by the kinds of computers it runs on: integers (extended) and floats (binary in practice, though the language definition would all a decimal float machine). The first killer app for Python was scientific numerical computing. The first numerical package developed for this exposed the entire gamut of integer and float types available in C. Numpy is the third numerical package. (Even so, none of the packages have been distributed with CPython -- and properly so.) Numbers pre-wrapped as dates, times, and datetimes with specialized methods are not essential (Python once managed without) but are enormously useful in a wide variety of application areas. Decimals, another class of pre-wrapped numbers, greatly simplify money calculations, including those that must follow legal or contractual rules. It is no accident that the decimal specification is a product of what was once International Business Machines. Contexts and specialized rounding rules are an essential part of fulfilling the purpose of the module. What application area would be opened up by adding a fixed-precision float? The only thing I have seen presented is making interactive python act even more* like a generic (decimal) calculator, so that newbies will find python floats less surprising that those of other languages. (Of course, a particular decimal## might not exactly any existing calculator.) *The int division change solved the biggest discrepancy: 1/10 is not .1 instead of 0. Representation changes improved things also. -- Terry Jan Reedy From abarnert at yahoo.com Tue Jun 2 21:03:25 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 12:03:25 -0700 Subject: [Python-ideas] User-defined literals Message-ID: This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery. I explored the convertible literals a while ago, and I'm pretty sure that doesn't work in a duck-typed language. But the C++ design does work, as long as you're willing to have the conversion (including the lookup of the conversion function itself) done at runtime. Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`. Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice. Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal. Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary. Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it. My feeling is that this would be useful, but the problems are not surmountable without much bigger changes, and there's no obvious better design that avoids them. But I'm interested to see what others think. From me at the-compiler.org Tue Jun 2 21:33:46 2015 From: me at the-compiler.org (Florian Bruhin) Date: Tue, 2 Jun 2015 21:33:46 +0200 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: <20150602193346.GG26357@tonks> * Andrew Barnert via Python-ideas [2015-06-02 12:03:25 -0700]: > This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. I actually had the exact same thing in mind recently, and never brought it up because it seemed too crazy to me. It seems I'm not the only one! :D > Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`. I think a big issue is that it's non-obvious syntactic sugar. You wouldn't expect 1.2x to actually be a function call, and for newcomers this might be rather confusing... > Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. That actually was the use-case I had in mind. I think {'spam': 1, 'eggs': 2}_o is less ugly (and less error-prone!) than OrderedDict([('spam', 1), ('eggs': 2)]) Also, it's immediately apparent that it is some kind of dict. > I've built a quick&dirty toy implementation (at https://github.com/abarnert/userliteralhack). Unlike the real proposal, this only handles numbers, and allows whitespace between the numbers and the names, and is a terrible hack. But it's enough to play with the idea, and you don't need to patch and recompile CPython to use it. Wow! I'm always amazed at how malleable Python is. Florian -- http://www.the-compiler.org | me at the-compiler.org (Mail/XMPP) GPG: 916E B0C8 FD55 A072 | http://the-compiler.org/pubkey.asc I love long mails! | http://email.is-not-s.ms/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: not available URL: From rosuav at gmail.com Tue Jun 2 21:40:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Jun 2015 05:40:01 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas wrote: > Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice. > There's probably no solution to the literal_imal problem, but the easiest fix for literal_ump is to have 21j be parsed the same way - it's a 21 modified by j, same as 21jump is a 21 modified by jump. > Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal. > I'd much rather see it be done at compile time. Something like this: compile("x = 1d/10", "<>", "exec") would immediately call literal_d("1") and embed its return value in the resulting code as a literal. (Since the peephole optimizer presumably doesn't currently understand Decimals, this would probably keep the division, but if it got enhanced, this could end up constant-folding to Decimal("0.1") before returning the code object.) So it's only the compilation step that needs to know about all those literal_* functions. Should there be a way to globally register them for default usage, or is this too much action-at-a-distance? > Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary. > I'd be inclined to simply always provide a string. The special case would be that the quotes can sometimes be omitted, same as redundant parens on genexprs can sometimes be omitted. Otherwise, 1.2d might still produce wrong results. > Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. > I thought there was no such thing as a dict/list/set literal, only display syntax? In any case, that can always be left for a future extension to the proposal. ChrisA From njs at pobox.com Tue Jun 2 21:40:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 2 Jun 2015 12:40:50 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas wrote: > This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. > > In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery. Are there any use cases besides decimals? Wouldn't it be easier to just add, say, a fixed "0d" prefix for decimals? 0x1001 # hex 0b1001 # binary 0d1.001 # decimal > Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. Also there's the idea floating around of making *all* dicts ordered (as PyPy has done), which would be much cleaner if it can be managed, so I'm guessing that would have to be tried and fail before any new syntax would be added for this use case. -n -- Nathaniel J. Smith -- http://vorpus.org From ckaynor at zindagigames.com Tue Jun 2 22:30:27 2015 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Tue, 2 Jun 2015 13:30:27 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith wrote: > On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas > wrote: > > This is a straw-man proposal for user-defined literal suffixes, similar > to the design in C++. > > > > In the thread on decimals, a number of people suggested that they'd like > to have decimal literals. Nick Coghlan explained why decimal.Decimal > literals don't make sense in general (primarily, but not solely, because > they're inherently context-sensitive), so unless we first add a fixed type > like decimal64, that idea is a non-starter. However, there was some > interest in either having Swift-style convertible literals or C++-style > user-defined literals. Either one would allow users who want decimal > literals for a particular app where it makes sense (because there's a > single fixed context, and the performance cost of Decimal('1.2') vs. a real > constant is irrelevant) to add them without too much hassle or hackery. > > Are there any use cases besides decimals? Wouldn't it be easier to > just add, say, a fixed "0d" prefix for decimals? > > 0x1001 # hex > 0b1001 # binary > 0d1.001 # decimal > In terms of other included useful options, you also have fractions. There could also be benefit of using such a system for cases of numbers with units, such as having the language understand 23.49MB. That said, very similar results could be achieved in most cases by merely using a normal function, without the need for special syntax. Decimal and Fraction are probably the only two major cases where you will see any actual benefit, though there may be libraries that may provide other number formats that could benefit (perhaps a base-3 number?). > Similarly, this idea could be extended to handle all literal types, so > you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but > I think that's ugly enough to not be worth proposing. (A prefix looks > better there... but a prefix doesn't work for numbers or strings. And I'm > not sure it's unambiguously parseable even for list/set/dict.) Plus, > there's the problem that comprehensions and actual literals are both parsed > as displays, but you wouldn't want user-defined comprehensions. > > Also there's the idea floating around of making *all* dicts ordered > (as PyPy has done), which would be much cleaner if it can be managed, > so I'm guessing that would have to be tried and fail before any new > syntax would be added for this use case. One benefit of the proposal is that it can be readily generalized to all literal syntax, so custom behaviors for native support of ordered dicts, trees, ordered sets, multi-sets, counters, and so forth could all be added via libraries, with little to no additional need for Python to be updated to support them directly. All-in-all, I'd be very mixed on such a feature. I can see plenty of cases where it would provide benefit, however it also adds quite a bit of complexity to the language, and could easily result in code with nasty action-at-a-distance issues. If such a feature were implemented, Python would probably also want to reserve some set of the names for future language features, similar to how dunder names are reserved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Jun 2 23:26:22 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 2 Jun 2015 14:26:22 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Jun 2, 2015 1:32 PM, "Chris Kaynor" wrote: > > On Tue, Jun 2, 2015 at 12:40 PM, Nathaniel Smith wrote: >> >> Are there any use cases besides decimals? Wouldn't it be easier to >> just add, say, a fixed "0d" prefix for decimals? >> >> 0x1001 # hex >> 0b1001 # binary >> 0d1.001 # decimal > > > In terms of other included useful options, you also have fractions. > > There could also be benefit of using such a system for cases of numbers with units, such as having the language understand 23.49MB. The unit libraries I've seen just spell this as "23.49 * MB" (or "22.49 * km / h" for a speed, say). And crucially they don't have any need to override the parsing rules for float literals. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Tue Jun 2 23:46:13 2015 From: ron3200 at gmail.com (Ron Adam) Date: Tue, 02 Jun 2015 17:46:13 -0400 Subject: [Python-ideas] Explicitly shared objects with sub modules vs import In-Reply-To: <1433233415.13986.54.camel@kaarsemaker.net> References: <1433233415.13986.54.camel@kaarsemaker.net> Message-ID: On 06/02/2015 04:23 AM, Dennis Kaarsemaker wrote: > On za, 2015-05-30 at 11:45 -0400, Ron Adam wrote: > >> The solution I found was to call a function to explicitly set the shared >> items in the imported module. > > This reminds me of my horrible april fools hack of 2013 to make Python > look more like perl: http://seveas.net/exporter/ It's the reverse of what this suggestion does, if I'm reading it correctly. It allows called code to alter the callers frame. Obviously that wouldn't be good to do. I think what makes the suggestion in this thread "not good", is that modules have no formal order of dependency. If they did, then it could be restricted to only work in one direction, which means sub-modules couldn't effect parent modules. But python isn't organised that way. All modules are at the same level. Which means they can import from each other... and possibly export to each other too. So it's up to the programmer to restrict what parts effect other parts as if it did have a formal dependency order. Cheers, Ron From surya.subbarao1 at gmail.com Wed Jun 3 00:28:33 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Tue, 2 Jun 2015 15:28:33 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15 In-Reply-To: References: Message-ID: What do you mean by replying inine? On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert wrote: > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person > wrote: > > I think you're right. I was also considering ... "editing" my Python > distribution. If they didn't implement my suggestion for correcting floats, > at least they can fix this, instead of making people hack Python for good > results! > > > If you're going to reply to digests, please learn how to reply inline > instead of top-posting (and how to trim out all the irrelevant stuff). It's > next to impossible to tell which part of which of the messages you're > replying to even in simple cases like this one, with only 4 messages in the > digest. > > On Mon, Jun 1, 2015 at 8:10 PM, wrote: >> >> Send Python-ideas mailing list submissions to >> python-ideas at python.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> https://mail.python.org/mailman/listinfo/python-ideas >> or, via email, send a message with subject or body 'help' to >> python-ideas-request at python.org >> >> You can reach the person managing the list at >> python-ideas-owner at python.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Python-ideas digest..." >> >> >> Today's Topics: >> >> 1. Re: Python Float Update (Steven D'Aprano) >> 2. Re: Python Float Update (Andrew Barnert) >> 3. Re: Python Float Update (Steven D'Aprano) >> 4. Re: Python Float Update (Andrew Barnert) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Tue, 2 Jun 2015 11:37:48 +1000 >> From: Steven D'Aprano >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <20150602013748.GD932 at ando.pearwood.info> >> Content-Type: text/plain; charset=us-ascii >> >> On Mon, Jun 01, 2015 at 05:52:35PM +0300, Joonas Liik wrote: >> >> > Having some sort of decimal literal would have some advantages of its >> > own, >> > for one it could help against this sillyness: >> > >> > >>> Decimal(1.3) >> > Decimal('1.3000000000000000444089209850062616169452667236328125') >> >> Why is that silly? That's the actual value of the binary float 1.3 >> converted into base 10. If you want 1.3 exactly, you can do this: >> >> > >>> Decimal('1.3') >> > Decimal('1.3') >> >> Is that really so hard for people to learn? >> >> >> > I'm not saying that the actual data type needs to be a decimal ( >> > might well be a float but say shove the string repr next to it so it can >> > be >> > accessed when needed) >> >> You want Decimals to *lie* about what value they have? >> >> I think that's a terrible idea, one which would lead to a whole set of >> new and exciting surprises when using Decimal. Let me try to predict a >> few of the questions on Stackoverflow which would follow this change... >> >> Why is equality so inaccurate in Python? >> >> py> x = Decimal(1.3) >> py> y = Decimal('1.3') >> py> x, y >> (Decimal('1.3'), Decimal('1.3')) >> py> x == y >> False >> >> Why does Python insert extra digits into numbers when I multiply? >> >> py> x = Decimal(1.3) >> py> x >> Decimal('1.3') >> py> y = 10000000000000000*x >> py> y - 13000000000000000 >> Decimal('0.444089209850062616169452667236328125') >> >> >> > ..but this is one really common pitfall for new users, i know its easy >> > to >> > fix the code above, >> > but this behavior is very unintuitive.. you essentially get a really >> > expensive float when you do the obvious thing. >> >> Then don't do the obvious thing. >> >> Sometimes there really is no good alternative to actually knowing what >> you are doing. Floating point maths is inherently hard, but that's not >> the problem. There are all sorts of things in programming which are >> hard, and people learn how to deal with them. The problem is that people >> *imagine* that floating point is simple, when it is not and can never >> be. We don't do them any favours by enabling that delusion. >> >> If your needs are light, then you can ignore the complexities of >> floating point. You really can go a very long way by just rounding the >> results of your calculations when displaying them. But for anything more >> than that, we cannot just paper over the floating point complexities >> without creating new complexities that will burn people. >> >> You don't have to become a floating point guru, but it really isn't >> onerous to expect people who are programming to learn a few basic >> programming skills, and that includes a few basic coping strategies for >> floating point. >> >> >> >> -- >> Steve >> >> >> ------------------------------ >> >> Message: 2 >> Date: Mon, 1 Jun 2015 19:21:47 -0700 >> From: Andrew Barnert >> To: Andrew Barnert >> Cc: Nick Coghlan , python-ideas >> >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <5E8271BF-183E-496D-A556-81C407977FFE at yahoo.com> >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 19:00, Andrew Barnert wrote: >> > >> >> On Jun 1, 2015, at 18:27, Andrew Barnert via Python-ideas >> >> wrote: >> >> >> >>> On Jun 1, 2015, at 17:08, Nick Coghlan wrote: >> >>> >> >>> On 2 Jun 2015 08:44, "Andrew Barnert via Python-ideas" >> >>> wrote: >> >>>> But the basic idea can be extracted out and Pythonified: >> >>>> >> >>>> The literal 1.23 no longer gives you a float, but a FloatLiteral, >> >>>> which is either a subclass of float, or an unrelated class that has a >> >>>> __float__ method. Doing any calculation on it gives you a float. But as long >> >>>> as you leave it alone as a FloatLiteral, it has its literal characters >> >>>> available for any function that wants to distinguish FloatLiteral from >> >>>> float, like the Decimal constructor. >> >>>> >> >>>> The problem that Python faces that Swift doesn't is that Python >> >>>> doesn't use static typing and implicit compile-time conversions. So in >> >>>> Python, you'd be passing around these larger values and doing the slow >> >>>> conversions at runtime. That may or may not be unacceptable; without >> >>>> actually building it and testing some realistic programs it's pretty hard to >> >>>> guess. >> >>> >> >>> Joonas's suggestion of storing the original text representation passed >> >>> to the float constructor is at least a novel one - it's only the idea >> >>> of actual decimal literals that was ruled out in the past. >> >> >> >> I actually built about half an implementation of something like Swift's >> >> LiteralConvertible protocol back when I was teaching myself Swift. But I >> >> think I have a simpler version that I could implement much more easily. >> >> >> >> Basically, FloatLiteral is just a subclass of float whose __new__ >> >> stores its constructor argument. Then decimal.Decimal checks for that stored >> >> string and uses it instead of the float value if present. Then there's an >> >> import hook that replaces every Num with a call to FloatLiteral. >> >> >> >> This design doesn't actually fix everything; in effect, 1.3 actually >> >> compiles to FloatLiteral(str(float('1.3')) (because by the time you get to >> >> the AST it's too late to avoid that first conversion). Which does actually >> >> solve the problem with 1.3, but doesn't solve everything in general (e.g., >> >> just feed in a number that has more precision than a double can hold but >> >> less than your current decimal context can...). >> >> >> >> But it just lets you test whether the implementation makes sense and >> >> what the performance effects are, and it's only an hour of work, >> > >> > Make that 15 minutes. >> > >> > https://github.com/abarnert/floatliteralhack >> >> And as it turns out, hacking the tokens is no harder than hacking the AST >> (in fact, it's a little easier; I'd just never done it before), so now it >> does that, meaning you really get the actual literal string from the source, >> not the repr of the float of that string literal. >> >> Turning this into a real implementation would obviously be more than half >> an hour's work, but not more than a day or two. Again, I don't think anyone >> would actually want this, but now people who think they do have an >> implementation to play with to prove me wrong. >> >> >> and doesn't require anyone to patch their interpreter to play with it. >> >> If it seems promising, then hacking the compiler so 2.3 compiles to >> >> FloatLiteral('2.3') may be worth doing for a test of the actual >> >> functionality. >> >> >> >> I'll be glad to hack it up when I get a chance tonight. But personally, >> >> I think decimal literals are a better way to go here. Decimal(1.20) >> >> magically doing what you want still has all the same downsides as 1.20d (or >> >> implicit decimal literals), plus it's more complex, adds performance costs, >> >> and doesn't provide nearly as much benefit. (Yes, Decimal(1.20) is a little >> >> nicer than Decimal('1.20'), but only a little--and nowhere near as nice as >> >> 1.20d). >> >> >> >>> Aside from the practical implementation question, the main concern I >> >>> have with it is that we'd be trading the status quo for a situation >> >>> where "Decimal(1.3)" and "Decimal(13/10)" gave different answers. >> >> >> >> Yes, to solve that you really need Decimal(13)/Decimal(10)... Which >> >> implies that maybe the simplification in Decimal(1.3) is more misleading >> >> than helpful. (Notice that this problem also doesn't arise for decimal >> >> literals--13/10d is int vs. Decimal division, which is correct out of the >> >> box. Or, if you want prefixes, d13/10 is Decimal vs. int division.) >> >> >> >>> It seems to me that a potentially better option might be to adjust the >> >>> implicit float->Decimal conversion in the Decimal constructor to use >> >>> the same algorithm as we now use for float.__repr__ [1], where we look >> >>> for the shortest decimal representation that gives the same answer >> >>> when rendered as a float. At the moment you have to indirect through >> >>> str() or repr() to get that behaviour: >> >>> >> >>>>>> from decimal import Decimal as D >> >>>>>> 1.3 >> >>> 1.3 >> >>>>>> D('1.3') >> >>> Decimal('1.3') >> >>>>>> D(1.3) >> >>> Decimal('1.3000000000000000444089209850062616169452667236328125') >> >>>>>> D(str(1.3)) >> >>> Decimal('1.3') >> >>> >> >>> Cheers, >> >>> Nick. >> >>> >> >>> [1] http://bugs.python.org/issue1580 >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> https://mail.python.org/mailman/listinfo/python-ideas >> >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> ------------------------------ >> >> Message: 3 >> Date: Tue, 2 Jun 2015 13:00:40 +1000 >> From: Steven D'Aprano >> To: python-ideas at python.org >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <20150602030040.GF932 at ando.pearwood.info> >> Content-Type: text/plain; charset=utf-8 >> >> Nicholas, >> >> Your email client appears to not be quoting text you quote. It is a >> conventional to use a leading > for quoting, perhaps you could configure >> your mail program to do so? The good ones even have a "Paste As Quote" >> command. >> >> On with the substance of your post... >> >> On Mon, Jun 01, 2015 at 01:24:32PM -0400, Nicholas Chammas wrote: >> >> > I guess it?s a non-trivial tradeoff. But I would lean towards >> > considering >> > people likely to be affected by the performance hit as doing something >> > ?not >> > common?. Like, if they are doing that many calculations that it matters, >> > perhaps it makes sense to ask them to explicitly ask for floats vs. >> > decimals, in exchange for giving the majority who wouldn?t notice a >> > performance difference a better user experience. >> >> Changing from binary floats to decimal floats by default is a big, >> backwards incompatible change. Even if it's a good idea, we're >> constrained by backwards compatibility: I would imagine we wouldn't want >> to even introduce this feature until the majority of people are using >> Python 3 rather than Python 2, and then we'd probably want to introduce >> it using a "from __future__ import decimal_floats" directive. >> >> So I would guess this couldn't happen until probably 2020 or so. >> >> But we could introduce a decimal literal, say 1.1d for Decimal("1.1"). >> The first prerequisite is that we have a fast Decimal implementation, >> which we now have. Next we would have to decide how the decimal literals >> would interact with the decimal module. Do we include full support of >> the entire range of decimal features, including globally configurable >> precision and other modes? Or just a subset? How will these decimals >> interact with other numeric types, like float and Fraction? At the >> moment, Decimal isn't even part of the numeric tower. >> >> There's a lot of ground to cover, it's not a trivial change, and will >> definitely need a PEP. >> >> >> > How many of your examples are inherent limitations of decimals vs. >> > problems >> > that can be improved upon? >> >> In one sense, they are inherent limitations of floating point numbers >> regardless of base. Whether binary, decimal, hexadecimal as used in some >> IBM computers, or something else, you're going to see the same problems. >> Only the specific details will vary, e.g. 1/3 cannot be represented >> exactly in base 2 or base 10, but if you constructed a base 3 float, it >> would be exact. >> >> In another sense, Decimal has a big advantage that it is much more >> configurable than Python's floats. Decimal lets you configure the >> precision, rounding mode, error handling and more. That's not inherent >> to base 10 calculations, you can do exactly the same thing for binary >> floats too, but Python doesn't offer that feature for floats, only for >> Decimals. >> >> But no matter how you configure Decimal, all you can do is shift the >> gotchas around. The issue really is inherent to the nature of the >> problem, and you cannot defeat the universe. Regardless of what >> base you use, binary or decimal or something else, or how many digits >> precision, you're still trying to simulate an uncountably infinite >> continuous, infinitely divisible number line using a finite, >> discontinuous set of possible values. Something has to give. >> >> (For the record, when I say "uncountably infinite", I don't just mean >> "too many to count", it's a technical term. To oversimplify horribly, it >> means "larger than infinity" in some sense. It's off-topic for here, >> but if anyone is interested in learning more, you can email me off-list, >> or google for "countable vs uncountable infinity".) >> >> Basically, you're trying to squeeze an infinite number of real numbers >> into a finite amount of memory. It can't be done. Consequently, there >> will *always* be some calculations where the true value simply cannot be >> calculated and the answer you get is slightly too big or slightly too >> small. All the other floating point gotchas follow from that simple >> fact. >> >> >> > Admittedly, the only place where I?ve played with decimals extensively >> > is >> > on Microsoft?s SQL Server (where they are the default literal >> > ). I?ve stumbled >> > in >> > the past on my own decimal gotchas >> > , but looking at your >> > examples >> > and trying them on SQL Server I suspect that most of the problems you >> > show >> > are problems of precision and scale. >> >> No. Change the precision and scale, and some *specific* problems goes >> away, but they reappear with other numbers. >> >> Besides, at the point that you're talking about setting the precision, >> we're really not talking about making things easy for beginners any >> more. >> >> And not all floating point issues are related to precision and scale in >> decimal. You cannot divide a cake into exactly three equal pieces in >> Decimal any more than you can divide a cake into exactly three equal >> pieces in binary. All you can hope for is to choose a precision were the >> rounding errors in one part of your calculation will be cancelled by the >> rounding errors in another part of your calculation. And that precision >> will be different for any two arbitrary calculations. >> >> >> >> -- >> Steve >> >> >> ------------------------------ >> >> Message: 4 >> Date: Mon, 1 Jun 2015 20:10:29 -0700 >> From: Andrew Barnert >> To: Steven D'Aprano >> Cc: "python-ideas at python.org" >> Subject: Re: [Python-ideas] Python Float Update >> Message-ID: <79C16144-8BF7-4260-A356-DD4E8D97BAAD at yahoo.com> >> Content-Type: text/plain; charset=us-ascii >> >> On Jun 1, 2015, at 18:58, Steven D'Aprano wrote: >> > >> >> On Tue, Jun 02, 2015 at 10:08:37AM +1000, Nick Coghlan wrote: >> >> >> >> It seems to me that a potentially better option might be to adjust the >> >> implicit float->Decimal conversion in the Decimal constructor to use >> >> the same algorithm as we now use for float.__repr__ [1], where we look >> >> for the shortest decimal representation that gives the same answer >> >> when rendered as a float. At the moment you have to indirect through >> >> str() or repr() to get that behaviour: >> > >> > Apart from the questions of whether such a change would be allowed by >> > the Decimal specification, >> >> As far as I know, GDAS doesn't specify anything about implicit conversion >> from floats. As long as the required explicit conversion function (which I >> think is from_float?) exists and does the required thing. >> >> As a side note, has anyone considered whether it's worth switching to >> IEEE-754-2008 as the controlling specification? There may be a good reason >> not to do so; I'm just curious whether someone has thought it through and >> made the case. >> >> > and the breaking of backwards compatibility, >> > I would really hate that change for another reason. >> > >> > At the moment, a good, cheap way to find out what a binary float "really >> > is" (in some sense) is to convert it to Decimal and see what you get: >> > >> > Decimal(1.3) >> > -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> > >> > If you want conversion from repr, then you can be explicit about it: >> > >> > Decimal(repr(1.3)) >> > -> Decimal('1.3') >> > >> > ("Explicit is better than implicit", as they say...) >> > >> > Although in fairness I suppose that if this change happens, we could >> > keep the old behaviour in the from_float method: >> > >> > # hypothetical future behaviour >> > Decimal(1.3) >> > -> Decimal('1.3') >> > Decimal.from_float(1.3) >> > -> Decimal('1.3000000000000000444089209850062616169452667236328125') >> > >> > But all things considered, I don't think we're doing people any favours >> > by changing the behaviour of float->Decimal conversions to implicitly >> > use the repr() instead of being exact. I expect this strategy is like >> > trying to flatten a bubble under wallpaper: all you can do is push the >> > gotchas and surprises to somewhere else. >> > >> > Oh, another thought... Decimals could gain yet another conversion >> > method, one which implicitly uses the float repr, but signals if it was >> > an inexact conversion or not. Explicitly calling repr can never signal, >> > since the conversion occurs outside of the Decimal constructor and >> > Decimal sees only the string: >> > >> > Decimal(repr(1.3)) cannot signal Inexact. >> > >> > But: >> > >> > Decimal.from_nearest_float(1.5) # exact >> > Decimal.from_nearest_float(1.3) # signals Inexact >> > >> > That might be useful, but probably not to beginners. >> >> I think this might be worth having whether the default constructor is >> changed or not. >> >> I can't think of too many programs where I'm pretty sure I have an >> exactly-representable decimal as a float but want to check to be sure... but >> for interactive use in IPython (especially when I'm specifically trying to >> explain to someone why just using Decimal instead of float will/will not >> solve their problem) I could see using it. >> >> ------------------------------ >> >> Subject: Digest Footer >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> >> >> ------------------------------ >> >> End of Python-ideas Digest, Vol 103, Issue 15 >> ********************************************* > > > > > -- > -Surya Subbarao -- -Surya Subbarao From ethan at stoneleaf.us Wed Jun 3 00:32:21 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 02 Jun 2015 15:32:21 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 15 In-Reply-To: References: Message-ID: <556E2EF5.4040108@stoneleaf.us> On 06/01/2015 10:22 PM, Andrew Barnert via Python-ideas wrote: > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person > wrote: > >> I think you're right. I was also considering ... "editing" my Python distribution. If they didn't implement my suggestion for correcting floats, at least they can fix this, instead of making people >> hack Python for good results! > > If you're going to reply to digests, please learn how to reply inline instead of top-posting (and how to trim out all the irrelevant stuff). It's next to impossible to tell which part of which of the > messages you're replying to even in simple cases like this one, with only 4 messages in the digest. This would have been a better example had you trimmed the cruft yourself. ;) -- ~Ethan~ From phd at phdru.name Wed Jun 3 00:44:20 2015 From: phd at phdru.name (Oleg Broytman) Date: Wed, 3 Jun 2015 00:44:20 +0200 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: Message-ID: <20150602224420.GA9022@phdru.name> On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person wrote: > What do you mean by replying inine? https://en.wikipedia.org/wiki/Posting_style A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing in e-mail? > On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert wrote: > > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person > > wrote: > > > > I think you're right. I was also considering ... "editing" my Python > > distribution. If they didn't implement my suggestion for correcting floats, > > at least they can fix this, instead of making people hack Python for good > > results! > > > > > > If you're going to reply to digests, please learn how to reply inline > > instead of top-posting (and how to trim out all the irrelevant stuff). It's > > next to impossible to tell which part of which of the messages you're > > replying to even in simple cases like this one, with only 4 messages in the > > digest. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From tjreedy at udel.edu Wed Jun 3 02:05:42 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 02 Jun 2015 20:05:42 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On 6/2/2015 3:40 PM, Chris Angelico wrote: > On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas >> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. >> > > I thought there was no such thing as a dict/list/set literal, only > display syntax? Correct. Only number and string literals. Displays are atomic runtime expressions. 'expression_list' and 'comprehension' are alternate contents of a display. 6.2.4. Displays for lists, sets and dictionaries -- Terry Jan Reedy From abarnert at yahoo.com Wed Jun 3 02:36:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 17:36:22 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: <20150602193346.GG26357@tonks> References: <20150602193346.GG26357@tonks> Message-ID: <307E0C52-38E3-418A-806D-FFFC8922907A@yahoo.com> On Jun 2, 2015, at 12:33, Florian Bruhin wrote: > > * Andrew Barnert via Python-ideas [2015-06-02 12:03:25 -0700]: >> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. > > I actually had the exact same thing in mind recently, and never > brought it up because it seemed too crazy to me. It seems I'm not the > only one! :D > >> Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`. > > I think a big issue is that it's non-obvious syntactic sugar. You > wouldn't expect 1.2x to actually be a function call, and for newcomers > this might be rather confusing... Well, newcomers won't be creating user-defined literals, so they won't have to even know there's a function call (unless whoever wrote the library that supplies them has a bug). >> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. > > That actually was the use-case I had in mind. I think > > {'spam': 1, 'eggs': 2}_o > > is less ugly (and less error-prone!) than > > OrderedDict([('spam', 1), ('eggs': 2)]) Well, I suppose that's one advantage of the literals being user-defined: you can use _o in your project, and I can not use it. :) But you still have to deal with the other issue I mentioned if you want to extend it to collection literals: again, they aren't really literals, or even easy to define except as "displays that aren't comprehensions". A quick hack like this is actually pretty easy to write (especially because in a quick hack, who cares whether using it on a comprehension gives the wrong error, or accidentally "works"); a real design and implementation may be harder. > Also, it's immediately apparent that it is some kind of dict. That is a good point. Not that it isn't immediately apparent that OrderedDict(?) is some kind of dict as well... But compared to Swift using ArrayLiteralConvertible to define sets or C++ using array-like initializer lists to do the same thing, this is definitely not as bad. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jun 3 02:47:03 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 17:47:03 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Jun 2, 2015, at 12:40, Chris Angelico wrote: > > On Wed, Jun 3, 2015 at 5:03 AM, Andrew Barnert via Python-ideas > wrote: >> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice. > > There's probably no solution to the literal_imal problem, but the > easiest fix for literal_ump is to have 21j be parsed the same way - > it's a 21 modified by j, same as 21jump is a 21 modified by jump. Thanks; I should have thought of that--especially since that's exactly how C++ solves similar problems. (Although reserving all suffixes that don't start with an underscore for the implementation's use doesn't hurt...) >> Unlike C++, the lookup of that literal function happens at runtime, so `1.2z3` is no longer a SyntaxError, but a NameError on `literal_z3`. Also, this means `literal_d` has to be in scope in every module you want decimal literals in, which often means a `from ? import` (or something worse, like monkeypatching builtins). C++ doesn't have that problem because of argument-dependent lookup, but that doesn't work for any other language. I think this is the biggest flaw in the proposal. > > I'd much rather see it be done at compile time. Something like this: > > compile("x = 1d/10", "<>", "exec") > > would immediately call literal_d("1") and embed its return value in > the resulting code as a literal. (Since the peephole optimizer > presumably doesn't currently understand Decimals, this would probably > keep the division, but if it got enhanced, this could end up > constant-folding to Decimal("0.1") before returning the code object.) > So it's only the compilation step that needs to know about all those > literal_* functions. Should there be a way to globally register them > for default usage, or is this too much action-at-a-distance? It would definitely be nicer to have it done at compile time if possible. I'm just not sure there's a good design that makes it possible. In particular, with your suggestion (which I considered), it seems a bit opaque to me that 1.2d is an error unless you _or some other module_ first imported decimalliterals; it's definitely more explicit if you (not some other module) have to from decimalliterals import literal_d. (And when you really want to be implicit, you can inject it into other modules or into builtins, the same as any other rare case where you really want to be implicit.) But many real projects are either complex enough to need centralized organization or simple enough to fit in one script, so maybe it wouldn't turn out too "magical" in practice. >> Also unlike C++, there's no overloading on different kinds of literals; the conversion function has no way of knowing whether the user actually typed a string or a number. This could easily be changed (e.g., by using different names, or just by passing the repr of the string instead of the string itself), but I don't think it's necessary. > > I'd be inclined to simply always provide a string. The special case > would be that the quotes can sometimes be omitted, same as redundant > parens on genexprs can sometimes be omitted. Yes, that's what I thought too. The only real use case C++ has for this is allowing the same suffix to mean different things for different types, which I think would be more of a bug magnet than a feature if anyone actually did it... > Otherwise, 1.2d might > still produce wrong results. > >> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. > > I thought there was no such thing as a dict/list/set literal, only > display syntax? That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem. > In any case, that can always be left for a future > extension to the proposal. > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Wed Jun 3 03:05:09 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Jun 2015 11:05:09 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert wrote: >>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. >> >> I thought there was no such thing as a dict/list/set literal, only >> display syntax? > > That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem. > Yeah. The significance is that literals get snapshotted into the code object as constants and simply called up when they're needed, but displays are executable code: >>> dis.dis(lambda: "Literal") 1 0 LOAD_CONST 1 ('Literal') 3 RETURN_VALUE >>> dis.dis(lambda: ["List","Display"]) 1 0 LOAD_CONST 1 ('List') 3 LOAD_CONST 2 ('Display') 6 BUILD_LIST 2 9 RETURN_VALUE >>> dis.dis(lambda: ("Tuple","Literal")) 1 0 LOAD_CONST 3 (('Tuple', 'Literal')) 3 RETURN_VALUE My understanding of "literal" is something which can be processed entirely at compile time, and retained in the code object, just like strings are. Once the code's finished being compiled, there's no record of what type of string literal was used (raw, triple-quoted, etc), only the type of string object (bytes/unicode). Custom literals could be the same - come to think of it, it might be nice to have pathlib.Path literals, represented as p"/home/rosuav" or something. In any case, they'd be evaluated using only compile-time information, and would then be saved as constants. That implies that only immutables should have literal syntaxes. I'm not sure whether that's significant or not. ChrisA From abarnert at yahoo.com Wed Jun 3 03:35:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 18:35:09 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Jun 2, 2015, at 12:40, Nathaniel Smith wrote: > > On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas > wrote: >> This is a straw-man proposal for user-defined literal suffixes, similar to the design in C++. >> >> In the thread on decimals, a number of people suggested that they'd like to have decimal literals. Nick Coghlan explained why decimal.Decimal literals don't make sense in general (primarily, but not solely, because they're inherently context-sensitive), so unless we first add a fixed type like decimal64, that idea is a non-starter. However, there was some interest in either having Swift-style convertible literals or C++-style user-defined literals. Either one would allow users who want decimal literals for a particular app where it makes sense (because there's a single fixed context, and the performance cost of Decimal('1.2') vs. a real constant is irrelevant) to add them without too much hassle or hackery. > > Are there any use cases besides decimals? > Wouldn't it be easier to > just add, say, a fixed "0d" prefix for decimals? I suggested that on the other thread, but go back and read the first paragraph of this thread. We don't want a standard literal syntax for decimal.Decimal. Some users may want it for some projects, but they should have to do something explicit to get it. Meanwhile, a literal syntax for decimal64 would be very useful, but there's no such type in the stdlib, so anyone who wants it has to go get it on PyPI, which means the PyPI module, not Python itself, would have to supply the literal. And, since I don't know of any implementation of decimal64 without decimal32 and decimal128, I can easily imagine wanting separate literals for all three. And f or r for fraction came up in the other thread. Beyond that? I don't know. If you look at the C++ proposal (N2750) and the various blog posts written around 2008-2012, here's what comes up repeatedly, in (to me) decreasing order of usefulness in Python: * One or more decimal float types. * Custom string types, like a string that iterates graphemes clusters instead of code units (Java and Swift have these; I don't know of an implementation for Python), or a mutable rope-based implementation, or the bytes-that-knows-its-encoding type that Nick Coghlan suggested some time last year. * Integers specified in arbitrary bases. * Quaternions or other number-like types beyond complex. * Points or vectors represented as 3x + 4z. * Units. Which I'm not sure is a good idea. (200*km seems just as readable to me as 200km, and only the former extends in an obvious way to 200*km/sec...) And I think the same goes for similar things like CSS units (1.2*em seems as good as 1.2_em to me). * Various things Python already has (real string objects instead of char*, real Unicode strings, binary integers, arbitrary-precision integers, etc.). * Cases where a constructor call would actually be just as nice, except for some other deficiency of C++ (e.g., you can't use a constexpr constructor expression as a template argument in C++11). * Blatantly silly things, like integral radians or integral halfs (which people keep saying physicists could use, only for physicists to ask "where would I use that?"). > 0x1001 # hex > 0b1001 # binary > 0d1.001 # decimal > >> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. > > Also there's the idea floating around of making *all* dicts ordered > (as PyPy has done), which would be much cleaner if it can be managed, > so I'm guessing that would have to be tried and fail before any new > syntax would be added for this use case. Well, OrderedDict isn't the only container class, even in the stdlib. But the real point would be types outside the stdlib. You could construct a sorted dict using blist or SortedContainers without having to first construct a dict in arbitrary order and then copy-sort it. Or build a NumPy array without building a list. And so on. But, again, I think the additional problems with container literals (which, again, aren't really literals) mean it would be worth leaving this out of any 1.0 proposal (and if containers are the only good selling point for the whole thing, that may mean the whole thing isn't worth having). From bruce at leban.us Wed Jun 3 03:50:53 2015 From: bruce at leban.us (Bruce Leban) Date: Tue, 2 Jun 2015 18:50:53 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > Any number or string token followed by a name (identifier) token is > currently illegal. This would change so that, if there's no whitespace > between them, it's legal, and equivalent to a call to a function named > `literal_{name}({number-or-string})`. For example, `1.2d` becomes > `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also > becomes `literal_d('1.2')`. > > Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` > becomes `literal_ump('21j'), which are not at all useful, and potentially > confusing, but I don't think that would be a serious problem in practice. > You seem to suggest that the token should start with an underscore when you write 1.2_dec and {...}_o but not when you write 1.2d and 1.2jump. Requiring the underscore solves the ambiguity and would make literals more readable. I would also require an alphabetic character after the _ and prohibit _ inside the name to avoid confusion. 1.2_d => literal_d('1.2') 1.2j_ump => literal_ump('1.2j') 1.2_jump => literal_jump('1.2') 0x12dec_imal => literal_imal('0x12dec') 0x12_decimal => literal_decimal('0x12') "1.2"_ebcdic => literal_ebcdic('1.2') 1.2d => error 0x12decimal => error 1_a_b => error 1_2 => error I do think the namescape thing is an issue but requiring me to write from literals import literal_jump isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means? The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value: (1, 3, 4)_xyzspace {'a': 1 + 2}_o {'a', 'b': 3}_o ("abc")_x ("abc", "def")_x "abc" "def"_x ("abc" "def")_x ("abc" "def",)_x --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jun 3 03:56:13 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 18:56:13 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> On Jun 2, 2015, at 18:05, Chris Angelico wrote: > > On Wed, Jun 3, 2015 at 10:47 AM, Andrew Barnert wrote: >>>> Similarly, this idea could be extended to handle all literal types, so you can do `{'spam': 1, 'eggs': 2}_o` to create an OrderedDict literal, but I think that's ugly enough to not be worth proposing. (A prefix looks better there... but a prefix doesn't work for numbers or strings. And I'm not sure it's unambiguously parseable even for list/set/dict.) Plus, there's the problem that comprehensions and actual literals are both parsed as displays, but you wouldn't want user-defined comprehensions. >>> >>> I thought there was no such thing as a dict/list/set literal, only >>> display syntax? >> >> That's what I meant in the last sentence: technically, there's no such thing as a dict literal, just a dict display that isn't a comprehension. I don't think you want user-defined suffixes on comprehensions, and coming up with a principled and simply-implementable way to make them work on literal-type displays but not comprehension-type displays doesn't seem like an easy problem. > > Yeah. The significance is that literals get snapshotted into the code > object as constants and simply called up when they're needed, but > displays are executable code: > >>>> dis.dis(lambda: "Literal") > 1 0 LOAD_CONST 1 ('Literal') > 3 RETURN_VALUE >>>> dis.dis(lambda: ["List","Display"]) > 1 0 LOAD_CONST 1 ('List') > 3 LOAD_CONST 2 ('Display') > 6 BUILD_LIST 2 > 9 RETURN_VALUE >>>> dis.dis(lambda: ("Tuple","Literal")) > 1 0 LOAD_CONST 3 (('Tuple', 'Literal')) > 3 RETURN_VALUE > > My understanding of "literal" is something which can be processed > entirely at compile time, and retained in the code object, just like > strings are. The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.) Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway. This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine. > Once the code's finished being compiled, there's no > record of what type of string literal was used (raw, triple-quoted, > etc), only the type of string object (bytes/unicode). Custom literals > could be the same But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here. And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file? > - come to think of it, it might be nice to have > pathlib.Path literals, represented as p"/home/rosuav" or something. In > any case, they'd be evaluated using only compile-time information, and > would then be saved as constants. > > That implies that only immutables should have literal syntaxes. I'm > not sure whether that's significant or not. But pathlib.Path isn't immutable. Meanwhile, that reminds me: one of the frequent selling points for Swift's related feature is for NSURL literals (which Cocoa uses for local paths as well as remote resources); I should go through the Swift selling points to see if they've found other things that the C++ community hasn't (but that can be ported to the C++ design, and that don't depend on peculiarities of Cocoa to be interesting). From steve at pearwood.info Wed Jun 3 04:52:07 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 3 Jun 2015 12:52:07 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: <20150603025206.GA1325@ando.pearwood.info> On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote: > I explored the convertible literals a while ago, and I'm pretty sure > that doesn't work in a duck-typed language. But the C++ design does > work, as long as you're willing to have the conversion (including the > lookup of the conversion function itself) done at runtime. I'm torn. On the one hand, some sort of extensible syntax for literals would be nice. I say "nice" rather than useful because there are advantages and disadvantages and there's no way of really knowing which outweighs the other. But, really, your proposal is in no way, shape or form syntax for *literals*, it's a new syntax for an unary postfix operator or function. The whole point of something being a literal is that it is parsed and converted at compile time. Now you might (and do) say that worrying about this is "premature optimization", but call me a pedant if you like, I don't think we should call something a literal if it's a runtime function call. Otherwise, we might as well say that from fractions import Fraction Fraction(2) is a literal, in which case I can say your proposal is unnecessary as we already have user-specified literals in Python. I can think of some interesting uses for postfix operators, or literals, or whatever we want to call them: 45? 10!! 23.5d 3d6 35'24" 15ell I've deliberately not explained what I mean by each of them. You can probably guess some, or all, but I hope it demonstrates one problem with this suggestion. Like operator overloading, it risks making code less clear rather than more. -- Steve From rosuav at gmail.com Wed Jun 3 05:12:52 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 3 Jun 2015 13:12:52 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert wrote: > On Jun 2, 2015, at 18:05, Chris Angelico wrote: >> My understanding of "literal" is something which can be processed >> entirely at compile time, and retained in the code object, just like >> strings are. > > The problem is that Python doesn't really define what it means by "literal" anywhere, and the documentation is not consistent. There are at least two places (not counting tutorial and howtos) that Python 3.4 refers to list or dict literals. (That's not based on a search; someone wrote a StackOverflow question asking what those two places meant.) > > Which I don't actually think is much of a problem. It means that in cases like this proposal, you have to be explicit about exactly what you mean by "literal" because Python doesn't do it for you. And it comes up when teaching people about how the parser and compiler work. And... That's about it. You can (as the docs do) loosely use "literal" to include non-comprehension displays in some places but not others, or even to include -2 or 1+2j in some places but not others, and nobody gets confused, except in those special contexts where you're going to have to get into the details anyway. > > This is similar to the fact that Python doesn't actually define the semantics of numeric literals anywhere. It's still obvious to anyone what they're supposed to be. The Python docs are a language reference manual, not a rigorous specification, and that's fine. > Yes, it's a bit tricky. Part of the confusion comes from the peephole optimizer; "1+2j" looks like a constant, but it's actually a compile-time expression. It wouldn't be a big problem to have an uber-specific definition of "literal" that cuts out things like that; for the most part, it's not going to be a problem (eg if you define a fractions.Fraction literal, you could use "1/2frac" or "1frac/2" and you'd get back Fraction(1, 2) either way, simply because division of Fraction and int works correctly; you could even have a "mixed number literal" like "1+1/2frac" and it'd evaluate just fine). >> Once the code's finished being compiled, there's no >> record of what type of string literal was used (raw, triple-quoted, >> etc), only the type of string object (bytes/unicode). Custom literals >> could be the same > > But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here. > Well, an additional parameter to compile() would do it. I've no idea how hard it is to write an import hook, but my notion was that you could do it that way and alter the behaviour of the compilation process. But I haven't put a lot of thought into implementation, nor do I know enough of the internals to know what's plausible and what isn't. > And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file? > It's to do with expectations. A literal should simply be itself, nothing else. When you have a string literal in your code, nothing can change what string that represents; at compilation time, it turns into a string object, and there it remains. Shadowing the name 'str' won't affect it. But if something that looks like a literal ends up being a function call, it could get extremely confusing - name lookups happening at run-time when the name doesn't occur in the code. Imagine the traceback: def calc_profit(hex): decimal = int(hex, 16) return 0.2d * decimal >>> calc_profit("1E2A") Traceback (most recent call last): File "", line 1, in File "", line 3, in calc_profit AttributeError: 'int' object has no attribute 'Decimal' Uhh... what? Sure, I shadowed the module name there, but I'm not *using* the decimal module! I'm just using a decimal literal! It's no problem to shadow the built-in function 'hex' there, because I'm not using the built-in function! Whatever name you use, there's the possibility that it'll have been changed at run-time, and that will cause no end of confusion. A literal shouldn't cause surprise function calls and name lookups. >> - come to think of it, it might be nice to have >> pathlib.Path literals, represented as p"/home/rosuav" or something. In >> any case, they'd be evaluated using only compile-time information, and >> would then be saved as constants. >> >> That implies that only immutables should have literal syntaxes. I'm >> not sure whether that's significant or not. > > But pathlib.Path isn't immutable. Huh, it isn't? That's a pity. In that case, I guess you can't have a path literal. In any case, I'm sure there'll be other string-like things that people can come up with literal syntaxes for. ChrisA From abarnert at yahoo.com Wed Jun 3 05:57:28 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 2 Jun 2015 20:57:28 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: Message-ID: On Jun 2, 2015, at 18:50, Bruce Leban wrote: > > >> On Tue, Jun 2, 2015 at 12:03 PM, Andrew Barnert via Python-ideas wrote: >> Any number or string token followed by a name (identifier) token is currently illegal. This would change so that, if there's no whitespace between them, it's legal, and equivalent to a call to a function named `literal_{name}({number-or-string})`. For example, `1.2d` becomes `literal_d('1.2')`, `1.2_dec` becomes `literal_dec('1.2')`, `"1.2"d` also becomes `literal_d('1.2')`. >> >> Of course `0x12decimal` becomes `literal_imal('0x12dec')`, and `21jump` becomes `literal_ump('21j'), which are not at all useful, and potentially confusing, but I don't think that would be a serious problem in practice. > > You seem to suggest that the token should start with an underscore when you write 1.2_dec and {...}_o but not when you write 1.2d and 1.2jump. Well, I was suggesting leaving it up to the user who defines the literals. Sure, it's possible to come up with confusing suffixes, but if we can trust end users to name a variable that holds an XML tree "root" instead "_12rawbytes", can't we trust library authors to name their suffixes appropriately? I think you will _often_ want the preceding underscore, at least for multi-character suffixes, and you will _almost never_ want multiple underscores, or strings of underscores and digits without letters, etc. But that seems more like something for PEP8 and other style guides and checkers than something the language would need to enforce. However, I noticed that I left off the extra underscore in literal__dec, and it really does look pretty ugly that way, so... Maybe you have a point here. > I do think the namescape thing is an issue but requiring me to write > > from literals import literal_jump > > isn't necessarily that bad. Without an explicit import, how would I go about tracking down what exactly 21_jump means? Thanks; that's the argument I was trying to make and not making very well. > The use of _o on a dict is strange since the thing you're attaching it to isn't a literal. I think there needs to be some more thought here if you want to apply it to anything other than a simple value: At least two people suggested that it's better to just explicitly put that whole question of collection "literals" off for the future (assuming the basic idea of numeric and string literal suffixes is worth considering at all), and I think they're right. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jun 3 06:48:47 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 03 Jun 2015 00:48:47 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote: > The problem is that Python doesn't really define what it means by > "literal" anywhere, The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation. It starts "Literals are notations for constant values of some built-in types." The relevant subsections are: 2.4.1. String and Bytes literals 2.4.2. String literal concatenation 2.4.3. Numeric literals 2.4.4. Integer literals 2.4.5. Floating point literals 2.4.6. Imaginary literals > and the documentation is not consistent. I'd call it a bit sloppy in places. > There > are at least two places (not counting tutorial and howtos) that > Python 3.4 refers to list or dict literals. (That's not based on a > search; someone wrote a StackOverflow question asking what those two > places meant.) Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people. > Which I don't actually think is much of a problem. It means that in > cases like this proposal, you have to be explicit about exactly what > you mean by "literal" because Python doesn't do it for you. Again, the Language Reference seems sufficiently explicit and detailed to write another implementation. 2.4.3 says "There are three types of numeric literals: integers, floating point numbers, and imaginary numbers. There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number). Note that numeric literals do not include a sign; a phrase like -1 is actually an expression composed of the unary operator ?-? and the literal 1." I will let you read the three specific subsections > This is similar to the fact that Python doesn't actually define the > semantics of numeric literals anywhere. I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex. There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0." Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float. -- Terry Jan Reedy From drekin at gmail.com Wed Jun 3 16:29:47 2015 From: drekin at gmail.com (drekin at gmail.com) Date: Wed, 03 Jun 2015 07:29:47 -0700 (PDT) Subject: [Python-ideas] Python Float Update In-Reply-To: <87fv6au3hd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <556f0f5b.c4c9c20a.331a.ffff99f7@mx.google.com> Stephen J. Turnbull writes: > Nick Coghlan writes: > > > the main concern I have with [a FloatLiteral that carries the > > original repr around] is that we'd be trading the status quo for a > > situation where "Decimal(1.3)" and "Decimal(13/10)" gave different > > answers. > > Yeah, and that kills the deal for me. Either Decimal is the default > representation for non-integers, or this is a no-go. And that isn't > going to happen. What if also 13/10 yielded a fraction? Anyway, what are the objections to integer division returning a fraction? They are coerced to floats when mixed with them. Also, the repr of Fraction class could be altered so repr(13 / 10) == "13 / 10" would hold. Regards, Drekin From drekin at gmail.com Wed Jun 3 16:44:06 2015 From: drekin at gmail.com (drekin at gmail.com) Date: Wed, 03 Jun 2015 07:44:06 -0700 (PDT) Subject: [Python-ideas] Python Float Update In-Reply-To: <79C16144-8BF7-4260-A356-DD4E8D97BAAD@yahoo.com> Message-ID: <556f12b6.4357b40a.42be.42ce@mx.google.com> Andrew Barnert wrote: >> Oh, another thought... Decimals could gain yet another conversion >> method, one which implicitly uses the float repr, but signals if it was >> an inexact conversion or not. Explicitly calling repr can never signal, >> since the conversion occurs outside of the Decimal constructor and >> Decimal sees only the string: >> >> Decimal(repr(1.3)) cannot signal Inexact. >> >> But: >> >> Decimal.from_nearest_float(1.5) # exact >> Decimal.from_nearest_float(1.3) # signals Inexact >> >> That might be useful, but probably not to beginners. > > I think this might be worth having whether the default constructor is changed or not. > > I can't think of too many programs where I'm pretty sure I have an exactly-representable decimal as a float but want to check to be sure... but for interactive use in IPython (especially when I'm specifically trying to explain to someone why just using Decimal instead of float will/will not solve their problem) I could see using it. How about more general Decimal.from_exact that does the same for argument of any type ??? float, int, Decimal object with possibly different precission, fraction, string. Just convert the argument to Decimal and signal if it cannot be done losslessly. The same constructor with the same semantics could be added to int, float, Fraction as well. Regards, Drekin From drekin at gmail.com Wed Jun 3 17:00:36 2015 From: drekin at gmail.com (drekin at gmail.com) Date: Wed, 03 Jun 2015 08:00:36 -0700 (PDT) Subject: [Python-ideas] Python Float Update In-Reply-To: <20150601145806.GB932@ando.pearwood.info> Message-ID: <556f1694.4358b40a.5fc1.ffffe817@mx.google.com> > On Mon, Jun 01, 2015 at 06:27:57AM +0000, Nicholas Chammas wrote: > >> Having decimal literals or something similar by default, though perhaps >> problematic from a backwards compatibility standpoint, is a) user friendly, >> b) easily understandable, and c) not surprising to beginners. None of these >> qualities apply to float literals. > > I wish this myth about Decimals would die, because it isn't true. The > only advantage of base-10 floats over base-2 floats -- and I'll admit it > can be a big advantage -- is that many of the numbers we commonly care > about can be represented in Decimal exactly, but not as base-2 floats. > In every other way, Decimals are no more user friendly, understandable, > or unsurprising than floats. Decimals violate all the same rules of > arithmetic that floats do. This should not come as a surprise, since > decimals *are* floats, they merely use base 10 rather than base 2. You are definitely right in "float vs. Decimal as representation of a real", but there is also a syntactical point that interpreting a float literal as Decimal rather than binary float is more natural since the literal itself *is* decimal. The there would be no counterpart of the following situation if the float literal was interpreted as Decimal rather than binary float. >>> 0.1 0.1 >>> Decimal(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625') Regards, Drekin From drekin at gmail.com Wed Jun 3 18:08:17 2015 From: drekin at gmail.com (drekin at gmail.com) Date: Wed, 03 Jun 2015 09:08:17 -0700 (PDT) Subject: [Python-ideas] Python Float Update In-Reply-To: Message-ID: <556f2671.c905c20a.6b43.ffffc3c7@mx.google.com> Stefan Behnel wrote: > random832 at fastmail.us schrieb am 01.06.2015 um 05:14: >> On Sun, May 31, 2015, at 22:25, u8y7541 The Awesome Person wrote: >>> First, I propose that a float's integer ratio should be accurate. For >>> example, (1 / 3).as_integer_ratio() should return (1, 3). Instead, it >>> returns(6004799503160661, 18014398509481984). >> >> Even though he's mistaken about the core premise, I do think there's a >> kernel of a good idea here - it would be nice to have a method (maybe >> as_integer_ratio, maybe with some parameter added, maybe a different >> method) to return with the smallest denominator that would result in >> exactly the original float if divided out, rather than merely the >> smallest power of two. > > The fractions module seems the obvious place to put this. Consider opening > a feature request. Target version would be Python 3.6. > > Stefan This makes sense for any floating point number, for example Decimal. It could be also a constructor of Fraction. >>> Fraction.simple_from(0.1) Fraction(1, 10) >>> Fraction.simple_from(Decimal(1) / Decimal(3)) Fraction(1, 3) Regards, Drekin From steve at pearwood.info Wed Jun 3 18:23:28 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Jun 2015 02:23:28 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: <556f2671.c905c20a.6b43.ffffc3c7@mx.google.com> References: <556f2671.c905c20a.6b43.ffffc3c7@mx.google.com> Message-ID: <20150603162327.GB1325@ando.pearwood.info> On Wed, Jun 03, 2015 at 09:08:17AM -0700, drekin at gmail.com wrote: > This makes sense for any floating point number, for example Decimal. > It could be also a constructor of Fraction. > > >>> Fraction.simple_from(0.1) > Fraction(1, 10) Guido's time machine strikes again: py> Fraction(0.1).limit_denominator(1000) Fraction(1, 10) > >>> Fraction.simple_from(Decimal(1) / Decimal(3)) > Fraction(1, 3) py> Fraction(Decimal(1)/Decimal(3)).limit_denominator(100) Fraction(1, 3) -- Steve From abarnert at yahoo.com Wed Jun 3 18:55:21 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 09:55:21 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On Jun 2, 2015, at 20:12, Chris Angelico wrote: > >> On Wed, Jun 3, 2015 at 11:56 AM, Andrew Barnert wrote: >>> On Jun 2, 2015, at 18:05, Chris Angelico wrote: > >>> Once the code's finished being compiled, there's no >>> record of what type of string literal was used (raw, triple-quoted, >>> etc), only the type of string object (bytes/unicode). Custom literals >>> could be the same >> >> But how? Without magic (like a registry or something similarly not locally visible in the source), how does the compiler know about user-defined literals at compile time? Python (unlike C++) doesn't have an extensible notion of "compile-time computation" to hook into here. > > Well, an additional parameter to compile() would do it. I don't understand what you mean. Sure, you can pass the magic registry a separate argument instead of leaving it in the local/global environment, but that doesn't really change anything. > I've no idea > how hard it is to write an import hook, but my notion was that you > could do it that way and alter the behaviour of the compilation > process. It's not _that_ hard to write an import hook. But what are you going to do in that hook? If you're trying to change the syntax of Python by adding a new literal suffix, you have to rewrite the parser. (My hack gets around that by tokenizing, modifying the token stream, untokenizing, and compiling. But you don't want to do that in real life.) So I assume your idea means something like: first we parse 2.3d into something like a new UserLiteral AST node, then if no hook translates that into something else before the AST is compiled, it's a SyntaxError? But that still means: * If you want to use a user-defined literal, you can't import it; you need another module to first import that literal's import hook and then import your module. * Your .pyc file won't get updated when that other module changes the hooks in place when your module gets imported. * That's a significant amount of boilerplate for each module that wants to offer a new literal. * While it isn't actually that hard, it is something most module developers have no idea how to write. (A HOWTO could maybe help here....) * Every import has to be hooked and transformed once for each literal you want to be available. Meanwhile, what exactly could the hook _do_ at compile time? It could generate the expression `Decimal('1.2')`, but that's no more "literal" than `literal_d('1.2')`, and now it means your script has to import `Decimal` into its scope instead. I suppose your import hook could push that import into the top of the script, but that seems even more magical. Or maybe you could generate an actual Decimal object, pickle it, compile in the expression `pickle.loads(b'cdecimal\nDecimal\np0\n(V1.2\np1\tp2\nRp3\n.')`, and push in a pickle import, but that doesn't really solve anything. Really, trying to force something into a "compile-time computation" in a language that doesn't have a full compile-time sub-language is a losing proposition. C++03 had a sort of accidental minimal compile-time sub-language based on template expansion and required constant folding for integer and pointer arithmetic, and that really wasn't sufficient, which is why C++11 and D both added ways to use most of the language explicitly at compile time (and C++11 still didn't get it right, which is why C++14 had to redo it). In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`? > But I haven't put a lot of thought into implementation, nor > do I know enough of the internals to know what's plausible and what > isn't. > >> And why do you actually care that it happens at compile time? If it's for optimization, that may be premature and irrelevant. (Certainly 1.2d isn't going to be any _worse_ than Decimal('1.2'), it just may not be better.) If it's because you want to reflect on code objects or something, that's not normal end-user code. Why should a normal user ever even know, much less care, whether 1.2d is stored as a constant or an expression in memory or in a .pyc file? > > It's to do with expectations. A literal should simply be itself, > nothing else. When you have a string literal in your code, nothing can > change what string that represents; at compilation time, it turns into > a string object, and there it remains. Shadowing the name 'str' won't > affect it. But if something that looks like a literal ends up being a > function call, it could get extremely confusing - name lookups > happening at run-time when the name doesn't occur in the code. Imagine > the traceback: > > def calc_profit(hex): > decimal = int(hex, 16) > return 0.2d * decimal > >>>> calc_profit("1E2A") > Traceback (most recent call last): > File "", line 1, in > File "", line 3, in calc_profit > AttributeError: 'int' object has no attribute 'Decimal' But that _can't_ happen with my design: the `0.2d` is compiled to `literal_d('0.2')`. The call to `decimal.Decimal` is in that function's scope, so nothing you do in your function can interfere with it. Sure, you can still redefine `literal_d`, but (a) why would you, and (b) even if you do, the problem will be a lot more obvious (especially since you had to explicitly `from decimalliterals import literal_d` at the top of the script, while you didn't have to even mention `decimal` or `Decimal` anywhere). But your design, or any design that does the translation at compile time, _would_ have this problem. If you compile `0.2d` directly into `decimal.Decimal('0.2')`, then it's `decimal` that has to be in scope. Also, notice that my design leaves the door open for later coming up with a special bytecode to look up translation functions following different rules (a registry, an explicit global lookup that ignores local shadowing, etc.); translating into a normal constructor expression doesn't. > Uhh... what? Sure, I shadowed the module name there, but I'm not > *using* the decimal module! I'm just using a decimal literal! It's no > problem to shadow the built-in function 'hex' there, because I'm not > using the built-in function! > > Whatever name you use, there's the possibility that it'll have been > changed at run-time, and that will cause no end of confusion. A > literal shouldn't cause surprise function calls and name lookups. > >>> - come to think of it, it might be nice to have >>> pathlib.Path literals, represented as p"/home/rosuav" or something. In >>> any case, they'd be evaluated using only compile-time information, and >>> would then be saved as constants. >>> >>> That implies that only immutables should have literal syntaxes. I'm >>> not sure whether that's significant or not. >> >> But pathlib.Path isn't immutable. > > Huh, it isn't? That's a pity. In that case, I guess you can't have a > path literal. I don't understand why you think this is important. Literal values, compile-time-computable/accessible values, and run-time-constant values are certainly not unrelated, but they're not the same thing. Other languages don't try to force them to be the same. In C++, for example, a literal has to evaluate into a compile-time-computable expression that only uses constant compile-time-accessible values, but the value it doesn't have to be constant at runtime. In fact, it's quite common for it not to be. > In any case, I'm sure there'll be other string-like > things that people can come up with literal syntaxes for. From abarnert at yahoo.com Wed Jun 3 20:26:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 11:26:54 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: I think this is off-topic, but it's important enough to answer anyway. On Jun 2, 2015, at 21:48, Terry Reedy wrote: > >> On 6/2/2015 9:56 PM, Andrew Barnert via Python-ideas wrote: >> >> The problem is that Python doesn't really define what it means by >> "literal" anywhere, > > The reference manual seems quite definite to me. The definitive section is "Section 2.4. Literals". I should have all the information needed to write a new implementation. No, that defines what literals mean for the purpose of lexical analysis. > It starts "Literals are notations for constant values of some built-in types." By the rules in this section, ..., None, True, and False are not literals, even though they are called literals everywhere else they appear in the documentation except for the Lexical Analysis chapter. In fact, even within that chapter, in 2.6 Delimiters, it explains that "A sequence of three periods has a special meaning as an ellipsis literal." By the rules in this section, "-2" is not a literal, even though, e.g., in the data model section it says "co_consts is a tuple containing the literals used by the bytecode", and in every extant Python implementation -2 will be stored in co_consts. By the rules in this section, "()" and "{}" are not literals, even though, e.g., in the set displays section it says "An empty set cannot be constructed with {}; this literal constructs an empty dictionary." And so on. And that's fine. None of those things are literals for the purpose of lexical analysis, even though they are things that represent literal values. And using the word "literal" somewhat loosely isn't confusing anywhere. Where a more specific definition is needed, as when documenting the lexical analysis phase of the language, a specific definition is given. And this is what allows ast.literal_eval to refer to "the following Python literal structures: strings, bytes, numbers, tuples, dicts, sets, booleans, and None" instead of having to say "the following Python literal structures: strings, bytes, and numbers; the negation of a literal number; the addition or subtraction of a non-imaginary literal number and an imaginary literal number; expression lists containing at least one comma; empty parentheses; the following container displays when not containing comprehensions: lists, dicts, sets; the keywords True, False, and None". I don't think that's a bad thing. If you want to know what the "literal structure... None" means, it's easy to find out, and the fact that None is tokenized as a keyword rather than as a literal does not hamper you in any way. If you actually need to write a tokenizer, then the fact that None is tokenized as a keyword makes a difference--and you can find that out easily as well. > > and the documentation is not consistent. > > I'd call it a bit sloppy in places. I wouldn't call it sloppy. I'd call it somewhat loose and informal in places, but that's often a good thing. >> There >> are at least two places (not counting tutorial and howtos) that >> Python 3.4 refers to list or dict literals. (That's not based on a >> search; someone wrote a StackOverflow question asking what those two >> places meant.) > > Please open a tracker issue to correct the sloppiness and reference the SO issue as evidence that it confuses people. But it doesn't confuse people in any relevant way. The user who asked that question had no problem figuring out how to interpret code that includes a (), or even how that code should be and is compiled. He could have written a Python interpreter with the knowledge he had. Maybe he couldn't have written a specification, but who cares? He doesn't need to. >> This is similar to the fact that Python doesn't actually define the >> semantics of numeric literals anywhere. > > I am again puzzled by your claim. There are 3 builtin number classes: int, float, and complex. There are 3 type of numeric literals: integer, float, and imaginary. "An imaginary literal yields a complex number with a real part of 0.0." Anyone capable of programming Python should be able to match 'integer' with 'int' and 'float' with 'float. Yes, and they should also be able to tell that the integer literal "42" should evaluate to an int whose value is equal to 42, and that "the value may be approximated in the case of floating point" means that the literal "1.2" should evaluate to the float whose value is closest to 1.2 rather than some different approximation, and so on. But the documentation doesn't actually define any of that. It doesn't have to, because it assumes it's being read by a non-idiot who's capable of programming Python (and won't deliberately make stupid decisions in interpreting it just because he's technically allowed to). The C++ specification defines all of that, and more (that the digits are interpreted with the leftmost as most significant, that the runtime value of an integer literal is not an lvalue, that it counts as a compile-time constant value, and so on). It attempts to make no assumptions at all (and there have been cases where C++ compiler vendors _have_ made deliberately obtuse interpretations just to make a point about the standard). That's exactly why reference documentation is more useful than a specification: because it leaves out the things that should be obvious to anyone capable of programming Python. To learn how integer literals work in Python, I need to look at two short and accessible paragraphs; to learn how integer literals work in C++, I have to read 2 full-page sections plus parts of at least 2 others, all written in impenetrable legalese. From abarnert at yahoo.com Wed Jun 3 21:43:00 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 12:43:00 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: <20150603025206.GA1325@ando.pearwood.info> References: <20150603025206.GA1325@ando.pearwood.info> Message-ID: <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> On Jun 2, 2015, at 19:52, Steven D'Aprano wrote: > >> On Tue, Jun 02, 2015 at 12:03:25PM -0700, Andrew Barnert via Python-ideas wrote: >> >> I explored the convertible literals a while ago, and I'm pretty sure >> that doesn't work in a duck-typed language. But the C++ design does >> work, as long as you're willing to have the conversion (including the >> lookup of the conversion function itself) done at runtime. > > I'm torn. On the one hand, some sort of extensible syntax for literals > would be nice. I say "nice" rather than useful because there are > advantages and disadvantages and there's no way of really knowing > which outweighs the other. That's exactly why I came up with something I could hack up without any changes to the interpreter. It means anyone can try it out and see whether the advantages outweigh the disadvantages for them. (Of course there are additional disadvantages to the hack in efficiency, hackiness, and possibly debugability, so it may unfairly bias people who don't keep that in mind--but if so, it can only bias them in the conservative direction of rejecting the idea, which I think is ok.) > But, really, your proposal is in no way, shape or form syntax for > *literals*, It's a syntax for things that are somewhat like `2`, more like `-2`, even more like `(2,)`, but still not exactly the same as even that. If you don't like using the word "literal" for that, you can come up with a different word. I called it a "literal" because "user-defined literals" is what people were asking for when they asked for `2.3d`, and it has clear parallels with a very similar feature with the same name in other languages. But I'm fine calling it something different, as long as people who are looking for it will know how to find it. > it's a new syntax for an unary postfix operator That's fair; C++ in fact defines its user literal syntax in terms of special constexpr operator overloads, and points out the similarities to postfix operator++ in a note. > or function. > The whole point of something being a literal is that it is parsed and > converted at compile time. > Now you might (and do) say that worrying > about this is "premature optimization", but call me a pedant if you > like, I don't think we should call something a literal if it's a > runtime function call. I don't think this is the right distinction. A literal is a notation for expressing some value that means what it says in a sufficiently simple way. That concept has significant overlap with "compile-time evaluable", and with "constant", but they're not the same concepts. And this is especially true for a language that doesn't define any compile-time computation phase. In Python, `-2` may be compiled to UNARY_NEGATIVE on the compiled-in constant value 2, or just to the compiled-in constant value -2, depending on what the implementation wants to optimize. Do you want to call it a literal in some implementations but not others? No reasonable user code that isn't reflecting on the internals is going to care, or even know, what the implementation is doing. Being "user-defined" means that the "sufficiently simple way" the notation gets its meaning has to involve user code. In a language with a compile-time computation phase like C++, that can mean "constexpr" user code, but Python doesn't define a "constexpr"-like phase. At any rate, again, if you want to call it something different, that's fine, as long as people looking for "what does `1.2d` mean in this program" or "how do I do the Python equivalent of a C++ user-defined literal" will be able to understand it. > Otherwise, we might as well say that > > from fractions import Fraction > Fraction(2) > > is a literal, in which case I can say your proposal is unnecessary as we > already have user-specified literals in Python. In C++, a constructor expression like Fraction(2) may be evaluable at compile time, and may evaluate to something that's constant at both compile time and runtime, and yet it's still not a literal. Why? Because their rule for what counts as "sufficiently simple" includes constexpr postfix user-literal operators, but not constexpr function or constructor calls. I don't know of anyone who's confused by that. It's a useful (and intuitively useful) distinction, separate from the "constexpr" and "const" distinctions. > I can think of some interesting uses for postfix operators, or literals, > or whatever we want to call them: > > 45? > 10!! > 23.5d > 3d6 > 35'24" > 15ell > > I've deliberately not explained what I mean by each of them. You can > probably guess some, or all, but I hope it demonstrates one problem with > this suggestion. Like operator overloading, it risks making code less > clear rather than more. Sure. In fact, it's very closely analogous--both of them are ways to allow a user-defined type to act more like a builtin type, which can be abused to do completely different things instead. The C++ proposal specifically pointed out this comparison. I think the risk is lower in Python than in C++ just because Python idiomatically discourages magical or idiosyncratic programming much more strongly in general, and that means operator overloading is already used more consistently and less confusingly than in C++, so the same is more likely to be true with this new feature. But of course the risk isn't zero. Again, I'm hoping people will play around with it, come up with example code they can show to other people for impressions, etc., rather than trying to guess, or come up with some abstract argument. It's certainly possible that everything that looks like a good example when you think of it will look too magical to anyone who reads your code. Then the idea can be rejected, and if anyone thinks of a similar idea in the future, they can be pointed to the existing examples and asked, "Can your idea solve these problems?" From abarnert at yahoo.com Wed Jun 3 22:17:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 13:17:09 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <556f0f5b.c4c9c20a.331a.ffff99f7@mx.google.com> References: <556f0f5b.c4c9c20a.331a.ffff99f7@mx.google.com> Message-ID: On Jun 3, 2015, at 07:29, drekin at gmail.com wrote: > > Stephen J. Turnbull writes: > >> Nick Coghlan writes: >> >>> the main concern I have with [a FloatLiteral that carries the >>> original repr around] is that we'd be trading the status quo for a >>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different >>> answers. >> >> Yeah, and that kills the deal for me. Either Decimal is the default >> representation for non-integers, or this is a no-go. And that isn't >> going to happen. > > What if also 13/10 yielded a fraction? That was raised near the start of the thread. In fact, I think the initial proposal was that 13/10 evaluated to Fraction(13, 10) and 1.2 evaluated to something like Fraction(12, 10). > Anyway, what are the objections to integer division returning a fraction? They are coerced to floats when mixed with them. As mentioned earlier in the thread, the language that inspired Python, ABC, used exactly this design: computations were kept as exact rationals until you mixed them with floats or called irrational functions like root. So it's not likely Guido didn't think of this possibility; he deliberately chose not to do things this way. He even wrote about this a few years ago; search for "integer division" on his Python-history blog. So, what are the problems? When you stay with exact rationals through a long series of computations, the result can grow to be huge in memory, and processing time. (I'm ignoring the fact that CPython doesn't even have a fast fraction implementation, because one could be added easily. It's still going to be orders of magnitude slower to add two fractions with gigantic denominators than to add the equivalent floats or decimals.) Plus, it's not always obvious when you've lost exactness. For example, exponentiation between rationals is exact only if the power simplifies to a whole fraction (and hasn't itself become a float somewhere along the way). Since the fractions module doesn't have IEEE-style flags for inexactness/rounding, it's harder to notice when this happens. Except in very trivial cases, the repr would be much less human-readable and -debuggable, not more. (Or do you find 1728829813 / 2317409 easier to understand than 7460.181958816937?) Fractions and Decimals can't be mixed or interconverted directly. There are definitely cases where a rational type is the right thing to use (it wouldn't be in the stdlib otherwise), but I think they're less common than the cases where a floating-point type (whether binary or decimal) is the right thing to use. (And even many cases where you think you want rationals, what you actually want is SymPy-style symbolic computation--which can give you exact results for things with roots or sins or whatever as long as they cancel out in the end.) From rosuav at gmail.com Wed Jun 3 23:48:30 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Jun 2015 07:48:30 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert wrote: > In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`? > That's exactly the thing: 1.2d should be atomic. It should not be an expression. The three examples you gave are syntactically expressions, but they act very much like literals thanks to constant folding: >>> dis.dis(lambda: -2) 1 0 LOAD_CONST 2 (-2) 3 RETURN_VALUE >>> dis.dis(lambda: 1+2j) 1 0 LOAD_CONST 3 ((1+2j)) 3 RETURN_VALUE >>> dis.dis(lambda: (1, 2)) 1 0 LOAD_CONST 3 ((1, 2)) 3 RETURN_VALUE which means they behave the way people expect them to. There is no way for run-time changes to affect what any of those expressions yields. Whether you're talking about shadowing the name Decimal or the name literal_d, the trouble is that it's happening at run-time. Here's another confusing case: import decimal from fractionliterals import literal_fr # oops, forgot to import literal_d # If we miss off literal_fr, we get an immediate error, because # 1/2fr gets evaluated at def time. def do_stuff(x, y, portion=1/2fr): try: result = decimal.Decimal(x*y*portion) except OverflowError: return 0.0d You won't know that your literal has failed until something actually triggers the error. That is extremely unobvious, especially since the token "literal_d" doesn't occur anywhere in do_stuff(). Literals look like atoms, and if they behave like expressions, sooner or later there'll be a ton of Stack Overflow questions saying "Why doesn't my code work? I just changed this up here, and now I get this weird error". Is that how literals should work? No. ChrisA From surya.subbarao1 at gmail.com Thu Jun 4 00:05:24 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Wed, 3 Jun 2015 15:05:24 -0700 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 29 In-Reply-To: References: Message-ID: > > Stephen J. Turnbull writes: > >> Nick Coghlan writes: >> >>> the main concern I have with [a FloatLiteral that carries the >>> original repr around] is that we'd be trading the status quo for a >>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different >>> answers. >> >> Yeah, and that kills the deal for me. Either Decimal is the default >> representation for non-integers, or this is a no-go. And that isn't >> going to happen. > > What if also 13/10 yielded a fraction? Yeah, either Decimal becomes default or 13/10 is a fraction. If Decimal becomes default, we could have Decimal(13 / 10) = Decimal(13) / Decimal(10). We would have "expected" results. Also, >Fractions and Decimals can't be mixed or interconverted directly. If Decimals are default, Fractions can have a .divide() method which returns Decimal(Numerator) / Decimal(Denominator), which is used when Fractions and Decimals are mixed. -Surya Subbarao From surya.subbarao1 at gmail.com Thu Jun 4 00:17:23 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Wed, 3 Jun 2015 15:17:23 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update Message-ID: > I?m going to show a few examples of how Decimals violate the fundamental > laws of mathematics just as floats do. Decimal is also uses sign and mantissa, except it's Base 10. I think Decimal should use numerators and denominators, because they are more accurate. That's why even Decimal defies the laws of mathematics. From surya.subbarao1 at gmail.com Thu Jun 4 00:18:51 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Wed, 3 Jun 2015 15:18:51 -0700 Subject: [Python-ideas] Python Float Update Message-ID: > I'm going to show a few examples of how Decimals violate the fundamental > laws of mathematics just as floats do. Decimal is also uses sign and mantissa, except it's Base 10. I think Decimal should use numerators and denominators, because they are more accurate. That's why even Decimal defies the laws of mathematics. -Surya Subbarao From breamoreboy at yahoo.co.uk Thu Jun 4 00:23:17 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 03 Jun 2015 23:23:17 +0100 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: On 03/06/2015 23:18, u8y7541 The Awesome Person wrote: >> I'm going to show a few examples of how Decimals violate the fundamental >> laws of mathematics just as floats do. > > Decimal is also uses sign and mantissa, except it's Base 10. I think > Decimal should use numerators and denominators, because they are more > accurate. That's why even Decimal defies the laws of mathematics. > > -Surya Subbarao Defying the laws of mathematics isn't a key issue here as practicality beats purity. Try beating the laws of the BDFL and the core devs and it's the Comfy Chair, terribly sorry and all that. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From abarnert at yahoo.com Thu Jun 4 00:35:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 15:35:34 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: Message-ID: <92A5B283-CB7D-45B3-8D71-FB6674128240@yahoo.com> On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person wrote: >> I?m going to show a few examples of how Decimals violate the fundamental >> laws of mathematics just as floats do. > > Decimal is also uses sign and mantissa, except it's Base 10. I think > Decimal should use numerators and denominators, because they are more > accurate. So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM? > That's why even Decimal defies the laws of mathematics. From surya.subbarao1 at gmail.com Thu Jun 4 00:46:00 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Wed, 3 Jun 2015 15:46:00 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update Message-ID: On Wed, Jun 3, 2015 at 3:35 PM, Andrew Barnert wrote: > On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person wrote: > >>> I?m going to show a few examples of how Decimals violate the fundamental >>> laws of mathematics just as floats do. >> >> Decimal is also uses sign and mantissa, except it's Base 10. I think >> Decimal should use numerators and denominators, because they are more >> accurate. > > So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM? > You can't represent sqrt(2) exactly with sign and mantissa either. When Decimal detects a non-repeating decimal, it should round it, and assign it a numerator and denominator something like 14142135623730951 / 10000000000000000 simplified. That's better than sign and mantissa errors. Or an alternative could be a hybrid of sign and mantissa and fraction representation... I don't think that's a good idea though. -- -Surya Subbarao From abarnert at yahoo.com Thu Jun 4 01:03:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 16:03:35 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On Jun 3, 2015, at 14:48, Chris Angelico wrote: > >> On Thu, Jun 4, 2015 at 2:55 AM, Andrew Barnert wrote: >> In Python, it's perfectly fine that -2 and 1+2j and (1, 2) are all compiled into expressions, so why isn't it fine that 1.2d is compiled into an expression? And, once you accept that, what's wrong with the expression being `literal_d('1.2')` instead of `Decimal('1.2')`? > > That's exactly the thing: 1.2d should be atomic. It should not be an > expression. The three examples you gave are syntactically expressions, > but they act very much like literals thanks to constant folding: > >>>> dis.dis(lambda: -2) > 1 0 LOAD_CONST 2 (-2) > 3 RETURN_VALUE >>>> dis.dis(lambda: 1+2j) > 1 0 LOAD_CONST 3 ((1+2j)) > 3 RETURN_VALUE >>>> dis.dis(lambda: (1, 2)) > 1 0 LOAD_CONST 3 ((1, 2)) > 3 RETURN_VALUE > > which means they behave the way people expect them to. But that's not something that's guaranteed by Python. It's something that implementations are allowed to do, and that CPython happens to do. If user code actually relied on that optimization, that code would be nonportable. But the reason Python allows that optimization in the first place is that user code actually doesn't care whether these expressions are evaluated "atomically" or at compile time, so it's ok to do so behind users' backs. It's not surprising because no one is going to monkeypatch int.__neg__ between definition time and call time (which CPython doesn't, but some implementations do), or call dis and read the bytecode if they don't even understand what a compile-time optimization is, and so on. > There is no way > for run-time changes to affect what any of those expressions yields. > Whether you're talking about shadowing the name Decimal or the name > literal_d, the trouble is that it's happening at run-time. Here's > another confusing case: > > import decimal > from fractionliterals import literal_fr > # oops, forgot to import literal_d > > # If we miss off literal_fr, we get an immediate error, because > # 1/2fr gets evaluated at def time. > def do_stuff(x, y, portion=1/2fr): > try: result = decimal.Decimal(x*y*portion) > except OverflowError: return 0.0d > > You won't know that your literal has failed until something actually > triggers the error. If that's a problem, then you're using the wrong language. You also won't know that you've typo'd OvreflowError or reslt, or called d.sqrt() instead of decimal.sqrt(d), or all kinds of other errors until something actually triggers the error. Which means either executing the code, or running a static linter. Which would be exactly the same for 1.2d. > That is extremely unobvious, especially since the > token "literal_d" doesn't occur anywhere in do_stuff(). This really isn't going to be confusing in real life. You get an error saying you forgot to define literal_d. You say, "Nuh uh, I did define it right at the top, same way I did literal_fr, in this imp... Oops, looks like I forgot to import it". > Literals look > like atoms, and if they behave like expressions, sooner or later > there'll be a ton of Stack Overflow questions saying "Why doesn't my > code work? I just changed this up here, and now I get this weird > error". Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one. Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages. And if you're going to suggest "what if I just redefine literal_d for no reason", ask yourself who would ever do that? Redefining decimal makes sense, because that's a reasonable name for a variable; redefining literal_d is as silly as redefining __name__. (But if you think those are different because double underscores are special, I suppose __literal_d__ doesn't bother me.) From abarnert at yahoo.com Thu Jun 4 01:20:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 16:20:15 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: Message-ID: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> On Jun 3, 2015, at 15:46, u8y7541 The Awesome Person wrote: > >> On Wed, Jun 3, 2015 at 3:35 PM, Andrew Barnert wrote: >> On Jun 3, 2015, at 15:17, u8y7541 The Awesome Person wrote: >> >>>> I?m going to show a few examples of how Decimals violate the fundamental >>>> laws of mathematics just as floats do. >>> >>> Decimal is also uses sign and mantissa, except it's Base 10. I think >>> Decimal should use numerators and denominators, because they are more >>> accurate. >> >> So sqrt(2) should be represented as an exact fraction? Do you have infinite RAM? > > You can't represent sqrt(2) exactly with sign and mantissa either. That's exactly the point: Decimal never _pretends_ to be exact, and therefore there's no problem when it can't be. By the way, it's not just "sign and mantissa" (that just gives you an integer, or maybe a fixed-point number), it's sign, mantissa, _and exponent_. > When Decimal detects a non-repeating decimal, it should round it, and > assign it a numerator and denominator something like 14142135623730951 > / 10000000000000000 simplified. > That's better than sign and mantissa > errors. No, that's exactly the same value as mantissa 1.4142135623730951 and exponent 0, and therefore it has exactly the same error. You haven't gained anything over using Decimal. And meanwhile, you've lost some efficiency (it takes twice as much memory because you have to store all those zeroes, where in Decimal they're implied by the exponent), and you've lost the benefit of a well-designed standard to follow (how many digits should you keep? what rounding rule should you use? should there be some way to optionally signal the user that rounding has occurred? and so on...). And, again, you've made things more surprising, not less, because now you have a type that's always exact, except when it isn't. Meanwhile, when you asked about the problems, I gave you a whole list of them. Have you thought about the others, or only the third one on the list? For example, do you really want adding up a long string of simple numbers to give you a value that takes 500x as much memory to store and 500x as long to calculate with if you don't need the exactness? Or is there going to be another rounding rule that when the fraction gets "too big" you truncate it to a smaller approximation? And meanwhile, if you do need the exactness, why don't you need to be able to carry around exact rational multiplies of pi or an exact representation of 2 ** 0.5 (both of which SymPy can do for you, by representing numbers symbolically, the way humans do when they need to)? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jun 4 01:40:13 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Jun 2015 09:40:13 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert wrote: > Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one. > > Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages. > Anything that causes a different code path to be executed can do this. ChrisA From surya.subbarao1 at gmail.com Thu Jun 4 02:19:01 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Wed, 3 Jun 2015 17:19:01 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> Message-ID: >(it takes twice as much memory because you have to store all those zeroes, where in Decimal they're implied by the exponent), and you've lost the benefit of a well-designed standard to follow (how many digits should you keep? what rounding rule should you use? should there be some way to optionally signal the user that rounding has occurred? and so on...). You are right about memory... LOL, I just thought about having something like representing it as a float / float for numerator / denominator! But that would be slower... There's got to be a workaround for those zeros. Especially if I'm dealing with stuff like 57 / 10^100 (57 is prime!). -- -Surya Subbarao From rosuav at gmail.com Thu Jun 4 02:24:05 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 4 Jun 2015 10:24:05 +1000 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> Message-ID: On Thu, Jun 4, 2015 at 10:19 AM, u8y7541 The Awesome Person wrote: > You are right about memory... > LOL, I just thought about having something like representing it as a > float / float for numerator / denominator! But that would be slower... How would that even help? ChrisA From guido at python.org Thu Jun 4 03:01:38 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 3 Jun 2015 18:01:38 -0700 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> Message-ID: At this point I feel compelled to explain why I'm against using fractions/rationals to represent numbers given as decimals. >From 1982 till 1886 I participated in the implementation of ABC ( http://homepages.cwi.nl/~steven/abc/) which did implement numbers as arbitrary precision fractions. (An earlier prototype implemented them as fractions of two floats, but that was wrong for many other reasons -- two floats are not better than one. :-) The design using arbitrary precision fractions was intended to avoid newbie issues with decimal numbers (these threads have elaborated plenty on those newbie issues). For reasons that should also be obvious by now, we converted these fractions back to decimal before printing them. But there was a big issue that we didn't anticipate. During the course of a simple program it was quite common for calculations to slow down dramatically, because numbers with ever-larger numerators and denominators were being computed (and rational arithmetic quickly slows down as those get bigger). So e.g. you might be computing your taxes with a precision of a million digits -- only to be rounding them down to dollars for display. These issues were quite difficult to debug because the normal approach to debugging ("just use print statements") didn't work -- unless you came up with the idea of printing the numbers as a fraction. For this reason I think that it's better not to use rational arithmetic by default. FWIW the same reasoning does *not* apply to using Decimal or something like decimal128. But then again those don't really address most issues with floating point -- the rounding issue exists for decimal as well as for binary. Anyway, that's a separate discussion to have. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jun 4 03:06:19 2015 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 04 Jun 2015 02:06:19 +0100 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> Message-ID: <556FA48B.3050909@mrabarnett.plus.com> On 2015-06-04 02:01, Guido van Rossum wrote: > At this point I feel compelled to explain why I'm against using > fractions/rationals to represent numbers given as decimals. > > From 1982 till 1886 I participated in the implementation of ABC > (http://homepages.cwi.nl/~steven/abc/) which did implement numbers as > arbitrary precision fractions. (An earlier prototype implemented them as > fractions of two floats, but that was wrong for many other reasons -- > two floats are not better than one. :-) > Was that when the time machine was first used? :-) [snip] From abarnert at yahoo.com Thu Jun 4 03:03:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Wed, 3 Jun 2015 18:03:43 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <84A948E3-222A-4056-84A6-F0C9065B23CA@yahoo.com> Message-ID: <042008B3-718E-437B-BFBD-81CE6DB89371@yahoo.com> On Jun 3, 2015, at 16:40, Chris Angelico wrote: > >> On Thu, Jun 4, 2015 at 9:03 AM, Andrew Barnert wrote: >> Can you come up with an actual example where changing this up here gives this weird error somewhere else? If not, I doubt even the intrepid noobs of StackOverflow will come up with one. >> >> Neither of the examples so far qualifies--the first one is an error that the design can never produce, and the second one is not weird or confusing any more than any other error in any dynamic languages. > > Anything that causes a different code path to be executed can do this. Well, any expression causes a different code path to be executed than any different expression, or what would be the point? But how is this relevant here? Is there an example where 1.2d would lead to "changing this up here gives this weird error somewhere else" that doesn't apply just as well to spam.eggs (or that's relevant or likely to come up or whatever in the case of 1.2d but not in the case of spam.eggs)? Otherwise, you're just presenting an argument against dynamic languages--or maybe even against programming languages full stop (after all, the same kinds of things can happen in Haskell or C++, they just often happen at compile time, so you get to debug the same "weird error" earlier). From stephen at xemacs.org Thu Jun 4 10:33:16 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 04 Jun 2015 17:33:16 +0900 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 29 In-Reply-To: References: Message-ID: <87eglrub3n.fsf@uwakimon.sk.tsukuba.ac.jp> u8y7541 The Awesome Person writes: > Yeah, either Decimal becomes default or 13/10 is a fraction. If > Decimal becomes default, we could have Decimal(13 / 10) = > Decimal(13) / Decimal(10). We would have "expected" results. I gather you haven't read anybody's replies, because, no, you don't get expected results with Decimal: Decimal can still violate all the invariants that binary floats can. Binary has better approximation properties if you care about the *degree* of inexactness rather than the *frequency*[1] of inexactness. Fractions can quickly become very inefficient. Of course you could try approximate fractions with fixed slash or floating slash calculations[2] to get bounds on the complexity of "simple" arithmetic, but then you're back in a world with approximations. Footnotes: [1] In some sense. One important sense is "how often humans would care or even notice", which is likely to be even less frequent than how often inexactness is introduced, for Decimal. But that varies by human. [2] It's in Knuth, Seminumerical Algorithms IIRC. From steve at pearwood.info Thu Jun 4 14:08:36 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 4 Jun 2015 22:08:36 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> Message-ID: <20150604120834.GC1325@ando.pearwood.info> On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote: > On Jun 2, 2015, at 19:52, Steven D'Aprano wrote: [...] > > But, really, your proposal is in no way, shape or form syntax for > > *literals*, > > It's a syntax for things that are somewhat like `2`, more like `-2`, > even more like `(2,)`, but still not exactly the same as even that. Not really. It's a syntax for something that is not very close to *any* of those examples. Unlike all of those example, it is a syntax for calling a function at runtime. Let's take (-2, 1+3j) as an example. As you point out in another post, Python may constant-fold it, but isn't required to. Python 3.3 compiles it to a single constant: LOAD_CONST 6 ((-2, (1+3j))) but Python 1.5 compiles it to a series of byte-code operations: LOAD_CONST 0 (2) UNARY_NEGATIVE LOAD_CONST 1 (1) LOAD_CONST 2 (3j) BINARY_ADD BUILD_TUPLE 2 But that's just implementation detail. Whether Python 3.3 or 1.5, both expressions have something in common: the *operation* is immutable (I don't mean the object itself); there is nothing you can do, from pure python code, to make the literal (-2, 1+3j) something other than a two-tuple consisting of -2 and 1+3j. You can shadow int, complex and tuple, and it won't make a lick of difference. For lack of a better term, I'm going to call this a "static operation" (as opposed to dynamic operations like called len(x), which can be shadowed or monkey-patched). I don't wish to debate the definition of "literal", as that may be very difficult. For example, is 2+3j actually a literal, or an expression containing only literals? If a literal, how about 2*3**4/5 for that matter? As soon as Python compilers start doing compile-time constant folding, the boundary between literals and constant expressions becomes fuzzy. But that boundary is actually not very interesting. What is interesting is that every literal shares at least the property that I refer to above, that you cannot redefine the result of that literal at runtime by shadowing or monkey-patching. Coming from that perspective, a literal *defined* at runtime as you suggest is a contradiction in terms. I don't care so much if the actual operation that evaluates the literal happens at runtime, so long as it is static in the above sense. If it's dynamic, then it's not a literal, it's just a function call with ugly syntax. > If > you don't like using the word "literal" for that, you can come up with > a different word. I called it a "literal" because "user-defined > literals" is what people were asking for when they asked for `2.3d`, If you asked for a turkey and cheese sandwich on rye bread, and I said "Well, I haven't got any turkey, or rye, but I can give you a slice of cheese on white bread and we'll just call it a turkey and cheese rye sandwich", you probably wouldn't be impressed :-) > A literal is a notation for expressing some value that means what it > says in a sufficiently simple way. I don't think that works. "Sufficiently simple" is a problematic concept. If "123_d" is sufficiently simply, surely "d(123)" is equally simple? It's only one character more, and it's a much more familiar and conventional syntax. Especially since *_d ends up calling a function, which might as well be called d(). And if it is called d, why not a more_meaningful_name() instead? I would hope that the length of the function name is not the defining characteristic of "sufficiently simple"? (Consider 123_somereallylongbutmeaningfulnamehere.) I don't wish to argue about other languages, but I think for Python, the important characteristic of "literals" is that they are static, as above, not "simple". An expression with nested containers isn't necessarily simple: {0: [1, 2, {3, 4, (5, 6)}]} # add arbitrary levels of complexity nor is it necessarily constructed as a compile-time constant, but it is static in the above sense. [...] > > Otherwise, we might as well say that > > > > from fractions import Fraction > > Fraction(2) > > > > is a literal, in which case I can say your proposal is unnecessary as we > > already have user-specified literals in Python. > > In C++, a constructor expression like Fraction(2) may be evaluable at > compile time, and may evaluate to something that's constant at both > compile time and runtime, and yet it's still not a literal. Why? > Because their rule for what counts as "sufficiently simple" includes > constexpr postfix user-literal operators, but not constexpr function > or constructor calls. What is the logic for that rule? If it is just an arbitrary decision that "literals cannot include parentheses" then I equally arbitrarily dismiss that rule and say "of course they can, the C++ standard not withstanding, and the fact that Fraction(2) is a constant evaluated at compile time is proof of that fact". In any case, this is Python, and arguing over definitions from C++ is not productive. Our understanding of what makes a literal can be informed by other languages, but cannot be defined by other languages -- if for no other reason that other languages may not all agree on what is and isn't a literal. -- Steve From drekin at gmail.com Thu Jun 4 14:52:10 2015 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Thu, 4 Jun 2015 14:52:10 +0200 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <556f0f5b.c4c9c20a.331a.ffff99f7@mx.google.com> Message-ID: Thank you very much for a detailed explanation. Regards, Drekin On Wed, Jun 3, 2015 at 10:17 PM, Andrew Barnert wrote: > On Jun 3, 2015, at 07:29, drekin at gmail.com wrote: > > > > Stephen J. Turnbull writes: > > > >> Nick Coghlan writes: > >> > >>> the main concern I have with [a FloatLiteral that carries the > >>> original repr around] is that we'd be trading the status quo for a > >>> situation where "Decimal(1.3)" and "Decimal(13/10)" gave different > >>> answers. > >> > >> Yeah, and that kills the deal for me. Either Decimal is the default > >> representation for non-integers, or this is a no-go. And that isn't > >> going to happen. > > > > What if also 13/10 yielded a fraction? > > That was raised near the start of the thread. In fact, I think the initial > proposal was that 13/10 evaluated to Fraction(13, 10) and 1.2 evaluated to > something like Fraction(12, 10). > > > Anyway, what are the objections to integer division returning a > fraction? They are coerced to floats when mixed with them. > > As mentioned earlier in the thread, the language that inspired Python, > ABC, used exactly this design: computations were kept as exact rationals > until you mixed them with floats or called irrational functions like root. > So it's not likely Guido didn't think of this possibility; he deliberately > chose not to do things this way. He even wrote about this a few years ago; > search for "integer division" on his Python-history blog. > > So, what are the problems? > > When you stay with exact rationals through a long series of computations, > the result can grow to be huge in memory, and processing time. (I'm > ignoring the fact that CPython doesn't even have a fast fraction > implementation, because one could be added easily. It's still going to be > orders of magnitude slower to add two fractions with gigantic denominators > than to add the equivalent floats or decimals.) > > Plus, it's not always obvious when you've lost exactness. For example, > exponentiation between rationals is exact only if the power simplifies to a > whole fraction (and hasn't itself become a float somewhere along the way). > Since the fractions module doesn't have IEEE-style flags for > inexactness/rounding, it's harder to notice when this happens. > > Except in very trivial cases, the repr would be much less human-readable > and -debuggable, not more. (Or do you find 1728829813 / 2317409 easier to > understand than 7460.181958816937?) > > Fractions and Decimals can't be mixed or interconverted directly. > > There are definitely cases where a rational type is the right thing to use > (it wouldn't be in the stdlib otherwise), but I think they're less common > than the cases where a floating-point type (whether binary or decimal) is > the right thing to use. (And even many cases where you think you want > rationals, what you actually want is SymPy-style symbolic > computation--which can give you exact results for things with roots or sins > or whatever as long as they cancel out in the end.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jun 4 15:06:12 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Jun 2015 14:06:12 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: <20150604120834.GC1325@ando.pearwood.info> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On 4 June 2015 at 13:08, Steven D'Aprano wrote: > I don't wish to argue about other languages, but I think for Python, the > important characteristic of "literals" is that they are static, as > above, not "simple". An expression with nested containers isn't > necessarily simple: > > {0: [1, 2, {3, 4, (5, 6)}]} # add arbitrary levels of complexity > > nor is it necessarily constructed as a compile-time constant, but it is > static in the above sense. I think that the main reason that people keep asking for things like 1.2d in place of D('1.2') is basically that the use of a string literal, for some reason "feels different". It's not a technical issue, nor is it one of compile time constants or static values - it's simply about not wanting to *think* of the process as passing a string literal to a function. They want "a syntax for a decimal" rather than "a means of getting a decimal from a string" because that's how they think of what they are doing. People aren't asking for decimal literals because they don't know that they can do D('1.2'). They want to avoid the quotes because they don't "feel right", that's all. That's why the common question is "why doesn't D(1.2) do what I expect?" rather than "how do I include a decimal constant in my program?" "Literal" syntax is about taking a chunk of the source code as a string, and converting it into a runtime object. For built in types the syntax is known to the lexer and the compiler knows how to create the runtime constants (that applies as much to Python as to C or any other language). The fundamental question here is whether there is a Pythonic way of extending that to user-defined forms. That would have to be handled at runtime, so the *syntax* would need to be immutable, but the *semantics* could be defined in terms of runtime, without violating the spirit of the request. Such a syntax could be used for lots of things - regular expressions are a common type that gets dedicated syntax (Javascript, Perl). As a straw man how about a new syntax (this won't work as written, because it'll clash with the "<" operator, but the basic idea works): LITERAL_CALL = PRIMARY "<" * ">" which is a new option for PRIMARY alongside CALL. This translates directly into PRIMARY(str) where str is a string composed of the source characters within <...>. Decimal "literals" would then be from decimal import Decimal as D x = D<1.2> Code objects could be compile. Regular expressions could be from re import compile as RE regex = RE As you can see the potential for line noise and unreadable code is there, but regular expressions always have that problem :-) Also, this proposal gives a "literal syntax" that works with existing features, rather than being a specialised add-on. Maybe that's a benefit (or maybe it's over-generalisation). Paul From ncoghlan at gmail.com Thu Jun 4 15:48:43 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 4 Jun 2015 23:48:43 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On 4 June 2015 at 23:06, Paul Moore wrote: > As a straw man how about a new syntax (this won't work as written, > because it'll clash with the "<" operator, but the basic idea works): > > LITERAL_CALL = PRIMARY "<" angle bracket>* ">" The main idea I've had for compile time metaprogramming that I figured I might be able to persuade Guido not to hate is: python_ast, names2cells, unbound_names = !(this_is_an_arbitrary_python_expression) As suggested by the assignment target names, the default behaviour would be to compile the expression to a Python AST, and then at runtime provide some relevant information about the name bindings referenced from it. (I haven't even attempted to implement this, although I've suggested it to some of the SciPy folks as an idea they might want to explore to make R style lazy evaluation easier) By using the prefix+delimiters notation, it would become possible to later have variants that were similarly transparent to the compiler, but *called* a suitably registered callable at compile time to do the conversion to runtime Python objects. For example: !sh(shell command) !format(format string with implicit interpolation) !sql(SQL query) So for custom numeric types, you could register: d = !decimal(1.2) r = !rational(22/7) This isn't an idea I'm likely to have time to pursue myself any time soon (if ever), but I think it addresses the key concern with syntax customisation by ensuring that customisations have *names*, and that they're clearly distinguished from normal code. Cheers, Nick. From p.f.moore at gmail.com Thu Jun 4 16:25:58 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 4 Jun 2015 15:25:58 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On 4 June 2015 at 14:48, Nick Coghlan wrote: > On 4 June 2015 at 23:06, Paul Moore wrote: >> As a straw man how about a new syntax (this won't work as written, >> because it'll clash with the "<" operator, but the basic idea works): >> >> LITERAL_CALL = PRIMARY "<" > angle bracket>* ">" > > The main idea I've had for compile time metaprogramming that I figured > I might be able to persuade Guido not to hate is: > > python_ast, names2cells, unbound_names = > !(this_is_an_arbitrary_python_expression) The fundamental difference between this proposal and mine is (I think) that you're assuming an arbitrary Python expression in there (which is parsed), whereas I'm proposing an *unparsed* string. For example, your suggestion of !decimal(1.2) would presumably pass to the "decimal" function, an AST consisting of a literal float node for 1.2. Which has the same issues as anything else that parses 1.2 before the decimal constructor gets its hands on it - you've already lost the original that the people wanting decimal literals need access to. And I don't think your shell script example works - something like !sh(echo $PATH) would be a syntax error, surely? My proposal is specifically about allowing access to the *unevaluated* source string, to allow the runtime function to take control of the parsing. We have various functions already that take string representations and parse them to objects (Decimal, re.compile, compile...) - all I'm suggesting is a lighter-weight syntax than ("...") for "call with a string value". It's very hard to justify this, as it doesn't add any new functionality, and it doesn't add that much brevity. But it seems to me that it *does* add a strong measure of "doing what people expect" - something that's hard to quantify, but once you go looking for examples, it's applicable to a *lot* of longstanding requests. The more I look, the more uses I can think of (e.g., Windows paths via pathlib - Path). The main issue I see with my proposal (other than "Guido could well hate it" :-)) is that it has no answer to the fact that you can't include the closing delimiter in the string - as soon as you try to work around that, the syntax starts to lose its elegant simplicity *very* fast. (Raw strings have similar problems - the rules on backslashes in raw strings are clumsy at best). Like you, though, I don't have time to work on this, so it's just an idea if anyone else wants to pick up on it. Paul From steve at pearwood.info Thu Jun 4 17:11:39 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Jun 2015 01:11:39 +1000 Subject: [Python-ideas] Python Float Update In-Reply-To: References: Message-ID: <20150604151138.GA20701@ando.pearwood.info> On Wed, Jun 03, 2015 at 03:18:51PM -0700, u8y7541 The Awesome Person wrote: > > I'm going to show a few examples of how Decimals violate the fundamental > > laws of mathematics just as floats do. > > Decimal is also uses sign and mantissa, except it's Base 10. I think > Decimal should use numerators and denominators, because they are more > accurate. That's why even Decimal defies the laws of mathematics. The decimal module is an implementation of the decimal floating point arithmetic based on the General Decimal Arithmetic Specification: http://speleotrove.com/decimal/decarith.html and IEEE standard 854-1987: www.cs.berkeley.edu/~ejr/projects/754/private/drafts/854-1987/dir.html The decimal module is not free to do whatever we want. It can only do what is specified by those standards. If you want to modify the decimal module to behave as you suggest, you are free to copy the module's source code and modify it. (It is open source, like all of Python.) This would be an interesting experiment for somebody to do. -- Steve From mertz at gnosis.cx Thu Jun 4 17:21:09 2015 From: mertz at gnosis.cx (David Mertz) Date: Thu, 4 Jun 2015 08:21:09 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: <20150604151138.GA20701@ando.pearwood.info> References: <20150604151138.GA20701@ando.pearwood.info> Message-ID: On Jun 4, 2015 8:12 AM, "Steven D'Aprano" wrote: > The decimal module is not free to do whatever we want. It can only do > what is specified by those standards. That's not quite true. The class decimal.Decimal must obey those standards. We could easily add decimal32/64/128 types to the module for those different objects (and I think we probably should). For that matter, it wouldn't violate the standards to add decimal.PythonDecimal with some other behaviors. But I can't really think of any desirable behaviors that are mathematically possible to put there. There isn't going to be a decimal.NeverSurprisingDecimal class in there. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 4 21:14:38 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 12:14:38 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: <20150604120834.GC1325@ando.pearwood.info> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> On Jun 4, 2015, at 05:08, Steven D'Aprano wrote: > >> On Wed, Jun 03, 2015 at 12:43:00PM -0700, Andrew Barnert wrote: >> On Jun 2, 2015, at 19:52, Steven D'Aprano wrote: > [...] >>> But, really, your proposal is in no way, shape or form syntax for >>> *literals*, >> >> It's a syntax for things that are somewhat like `2`, more like `-2`, >> even more like `(2,)`, but still not exactly the same as even that. > > Not really. It's a syntax for something that is not very close to *any* > of those examples. Unlike all of those example, it is a syntax for > calling a function at runtime. > > Let's take (-2, 1+3j) as an example. As you point out in another post, > Python may constant-fold it, but isn't required to. Python 3.3 compiles > it to a single constant: > > LOAD_CONST 6 ((-2, (1+3j))) > > > but Python 1.5 compiles it to a series of byte-code operations: > > LOAD_CONST 0 (2) > UNARY_NEGATIVE > LOAD_CONST 1 (1) > LOAD_CONST 2 (3j) > BINARY_ADD > BUILD_TUPLE 2 > > > But that's just implementation detail. Whether Python 3.3 or 1.5, both > expressions have something in common: the *operation* is immutable (I > don't mean the object itself); there is nothing you can do, from pure > python code, to make the literal (-2, 1+3j) something other than a > two-tuple consisting of -2 and 1+3j. You can shadow int, complex and > tuple, and it won't make a lick of difference. For lack of a better > term, I'm going to call this a "static operation" (as opposed to dynamic > operations like called len(x), which can be shadowed or monkey-patched). But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called. As an implementation-specific detail, CPython 3.4 doesn't let you modify the complex type. Python allows this, but doesn't require it, and some other implementations do let you modify it. So, if it's important to your code that 1+3j is a "static operation", then your code is non-portable at best. But once again, I suspect that the reason you haven't thought about this is that you've never written any code that actually cares what is or isn't a static operation. It's a typical "consenting adults" case. > I don't wish to debate the definition of "literal", as that may be very > difficult. For example, is 2+3j actually a literal, or an expression > containing only literals? If a literal, how about 2*3**4/5 for that > matter? As soon as Python compilers start doing compile-time constant > folding, the boundary between literals and constant expressions becomes > fuzzy. But that boundary is actually not very interesting. What is > interesting is that every literal shares at least the property that I > refer to above, that you cannot redefine the result of that literal at > runtime by shadowing or monkey-patching. What you're arguing here, and for the rest of the message, can be summarized in one sentence: the difference between user-defined literals and implementation-defined literals is that the former are user-defined. To which I have no real answer. > >> If >> you don't like using the word "literal" for that, you can come up with >> a different word. I called it a "literal" because "user-defined >> literals" is what people were asking for when they asked for `2.3d`, > > If you asked for a turkey and cheese sandwich on rye bread, and I said > "Well, I haven't got any turkey, or rye, but I can give you a slice of > cheese on white bread and we'll just call it a turkey and cheese rye > sandwich", you probably wouldn't be impressed :-) But if I asked for a turkey and cheese hoagie, and you said I have turkey and cheese and a roll, but that doesn't count as a hoagie by my definition so you can't have it, I'd say just put the turkey and cheese on the roll and call it whatever you want to call it. If people are asking for user-defined literals like 2.3d, and your argument is not that we can't or shouldn't do it, but that the term "user-defined literal" is contradictory, then the answer is the same: just call it something different. I don't know how else to put this. I already said, in two different ways, that if you want to call it something different that's fine. You replied by saying you don't want to argue about the definition of literals, followed by multiple paragraphs arguing about the definition of literals. >> A literal is a notation for expressing some value that means what it >> says in a sufficiently simple way. > > I don't think that works. "Sufficiently simple" is a problematic > concept. If "123_d" is sufficiently simply, surely "d(123)" is equally > simple? It's only one character more, and it's a much more familiar > and conventional syntax. If you're talking about APL or J, the number of characters might be a relevant measure of simplicity. But in the vast majority of languages, including Python, it has very little relevance. Of course "simple" inherently a vague concept, and it will be different in different languages and contexts. But it's still one of the most important concepts. That's why language design is an art, and why we have a Zen of Python and not an Assembly Manual of Python. Trying to reduce it to something the wc program can measure means reducing it to the point of meaninglessness. Let's give a different example. I could claim that currying makes higher-order expressions simpler. You could rightly point out that it makes the simplest function calls less simple. If we disagree on those points, or on the relative importance of them, we might draw up a bunch of examples to look at the human readability and writability or computer parsability of different expressions, in the context of idiomatic code in the language we were designing. If the rest of the language were a lot like Haskell, we'd probably agree that curried functions were simpler; if it were a lot like Python, we'd probably agree on the reverse. But at no point would the fact that f(1,2) is one character shorter than f(1)(2) come into the discussion. The closest we'd reasonably get might a discussion of the fact that the parens feel "big" and "get in the way" of reading the "more important" parts of the expression, or encourage the reader to naturally partition up the expression in a way that isn't appropriate to the intended meaning, or other such things. (See the "grit on Tim's monitor" appeal.) But those are still vague and subjective things. There's no objective measure to appeal to. Otherwise, every language proposal, Guido would just run the objective simplicity measurement program and it would say yes or no. >> In C++, a constructor expression like Fraction(2) may be evaluable at >> compile time, and may evaluate to something that's constant at both >> compile time and runtime, and yet it's still not a literal. Why? >> Because their rule for what counts as "sufficiently simple" includes >> constexpr postfix user-literal operators, but not constexpr function >> or constructor calls. > > What is the logic for that rule? In the case of C++, a committee actually sat down and hammered out a rigorous definition that codified the intuitive sense they were going for; if you want to read it, you can. But that isn't going to apply to anything but C++. And if you want to argue about it, the place to do so is the C++17 ISO committee. Just declaring that the C++ standard definition of literals doesn't define what you want to call literals doesn't really accomplish anything. From abarnert at yahoo.com Thu Jun 4 21:49:49 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 12:49:49 -0700 Subject: [Python-ideas] Python Float Update In-Reply-To: References: <20150604151138.GA20701@ando.pearwood.info> Message-ID: On Jun 4, 2015, at 08:21, David Mertz wrote: > On Jun 4, 2015 8:12 AM, "Steven D'Aprano" wrote: > > The decimal module is not free to do whatever we want. It can only do > > what is specified by those standards. > > That's not quite true. The class decimal.Decimal must obey those standards. We could easily add decimal32/64/128 types to the module for those different objects (and I think we probably should). > If we add decimal32/64/128 types, presumably they'd act like the types of the same names as specified in the same standards. Otherwise, that would be very surprising. Also, I'm not entirely sure we want to add those types to decimal in the stdlib. It would be a lot less work to implement them in terms of an existing C implementation (maybe using the native types if the exist, Intel's library if they don't). But I don't think that's necessarily desirable for the stdlib. Is this a big enough win to justify CPython being written in C11 instead of C90 (or, worse, sometimes one and sometimes the other, or a combination of the two), or to add a dependency on a library that isn't preinstalled on most systems and takes longer to build than all of current CPython 3.4? For a PyPI library, none of those things matter (in fact, a PyPI library could use C++ on platforms where the native types from the C++ TR exist but the C ones don't, or use Cython or CFFI or Boost::Python instead of native code, etc.), and the stdlib's decimal docs could just point to that PyPI library. > For that matter, it wouldn't violate the standards to add decimal.PythonDecimal with some other behaviors. But I can't really think of any desirable behaviors that are mathematically possible to put there. > I think it would still make sense to put it somewhere else. A module that declares that its behavior corresponds to a standard that adds a little bit of standard-irrelevant behavior is fine. For example, a conversion from Fraction to Decimal that worked "in the spirit of the standards" and documented that it wasn't specified by either standard. But adding a whole new class as complex as Decimal means half the module is now standard-irrelevant, which seems a lot more potentially confusing to me. > There isn't going to be a decimal.NeverSurprisingDecimal class in there. > Start with your favorite axioms for the integers and your favorite construction of the reals, then negate the successor axiom. Now you have a fully-consistent, never-surprising, easily-implementable real number type that follows all the usual mathematical laws over its entire set of values, {0}. :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 4 21:49:06 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Jun 2015 12:49:06 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas < python-ideas at python.org> wrote: > But this isn't actually true. That BINARY_ADD opcode looks up the addition > method at runtime and calls it. And that means that if you monkeypatch > complex.__radd__, your method will get called. > Wrong. You can't moneypatch complex.__radd__. That's a feature of the language. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Thu Jun 4 22:18:56 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 13:18:56 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: On Jun 4, 2015, at 12:49, Guido van Rossum wrote: > >> On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas wrote: >> But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called. > > Wrong. You can't moneypatch complex.__radd__. That's a feature of the language. I may well have missed it, but I went looking through the Built-in Types library documentation, the Data Model and other chapters of the language reference documentation, and every relevant PEP I could think of, and I can't find anything that says this is true. The best I can find is the rationale section for PEP 3119 saying "there are good reasons to keep the built-in types immutable", which is why PEP 3141 was changed to not require mutating the built-in types. But "there are good reasons to allow implementations to forbid it" isn't the same thing as "all implementations must forbid it". And at least some implementations do allow it, like Brython and one of the two embedded pythons. (And the rationale in PEP 3119 doesn't apply to them--Brython doesn't share built-in types between different Python interpreters in different browser windows, even if they're in the same address space.) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jun 4 23:05:29 2015 From: guido at python.org (Guido van Rossum) Date: Thu, 4 Jun 2015 14:05:29 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: OK, you can attribute that to lousy docs. The intention is that builtin types are immutable. On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert wrote: > On Jun 4, 2015, at 12:49, Guido van Rossum wrote: > > On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas < > python-ideas at python.org> wrote: > >> But this isn't actually true. That BINARY_ADD opcode looks up the >> addition method at runtime and calls it. And that means that if you >> monkeypatch complex.__radd__, your method will get called. >> > > Wrong. You can't moneypatch complex.__radd__. That's a feature of the > language. > > > I may well have missed it, but I went looking through the Built-in Types > library documentation, the Data Model and other chapters of the language > reference documentation, and every relevant PEP I could think of, and I > can't find anything that says this is true. > > The best I can find is the rationale section for PEP 3119 saying "there > are good reasons to keep the built-in types immutable", which is why PEP > 3141 was changed to not require mutating the built-in types. But "there are > good reasons to allow implementations to forbid it" isn't the same thing as > "all implementations must forbid it". > > And at least some implementations do allow it, like Brython and one of the > two embedded pythons. (And the rationale in PEP 3119 doesn't apply to > them--Brython doesn't share built-in types between different Python > interpreters in different browser windows, even if they're in the same > address space.) > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jun 4 23:23:41 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 5 Jun 2015 07:23:41 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: On Fri, Jun 5, 2015 at 6:18 AM, Andrew Barnert via Python-ideas wrote: > The best I can find is the rationale section for PEP 3119 saying "there are > good reasons to keep the built-in types immutable", which is why PEP 3141 > was changed to not require mutating the built-in types. But "there are good > reasons to allow implementations to forbid it" isn't the same thing as "all > implementations must forbid it". > > And at least some implementations do allow it, like Brython and one of the > two embedded pythons. (And the rationale in PEP 3119 doesn't apply to > them--Brython doesn't share built-in types between different Python > interpreters in different browser windows, even if they're in the same > address space.) Huh. Does that imply that Brython has to construct a brand-new integer object for absolutely every operation and constant, in case someone monkeypatched something? Once integers (and other built-in types) lose their immutability, they become distinguishable: x = 2 monkey_patch(x) y = 2 In CPython (and, I think, in the Python spec), the two 2s in x and y will be utterly indistinguishable, like fermions. CPython goes further and uses the exact same object for both 2s, *because it can*. Is there something you can do inside monkey_patch() that will "mark" one of those 2s such that it's somehow different (add an attribute, change a dunder method, etc)? And does Brython guarantee that id(x)!=id(y) because of that? ChrisA From random832 at fastmail.us Thu Jun 4 23:34:53 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 04 Jun 2015 17:34:53 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: <1433453693.1961890.287185857.27F097A2@webmail.messagingengine.com> On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote: > Huh. Does that imply that Brython has to construct a brand-new integer > object for absolutely every operation and constant, in case someone > monkeypatched something? Once integers (and other built-in types) lose > their immutability, they become distinguishable: > > x = 2 > monkey_patch(x) > y = 2 Er, we're talking about monkey-patching the int *class* (well, the complex class, but the same idea applies), not an individual int object. From rosuav at gmail.com Thu Jun 4 23:45:39 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 5 Jun 2015 07:45:39 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: <1433453693.1961890.287185857.27F097A2@webmail.messagingengine.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <1433453693.1961890.287185857.27F097A2@webmail.messagingengine.com> Message-ID: On Fri, Jun 5, 2015 at 7:34 AM, wrote: > On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote: >> Huh. Does that imply that Brython has to construct a brand-new integer >> object for absolutely every operation and constant, in case someone >> monkeypatched something? Once integers (and other built-in types) lose >> their immutability, they become distinguishable: >> >> x = 2 >> monkey_patch(x) >> y = 2 > > Er, we're talking about monkey-patching the int *class* (well, the > complex class, but the same idea applies), not an individual int object. Ah okay. Even so, it would be very surprising if "1+2" could evaluate to anything other than 3. ChrisA From yselivanov.ml at gmail.com Fri Jun 5 00:13:50 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 04 Jun 2015 18:13:50 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: <5570CD9E.4030403@gmail.com> On 2015-06-04 5:23 PM, Chris Angelico wrote: > Huh. Does that imply that Brython has to construct a brand-new integer > object for absolutely every operation and constant, in case someone > monkeypatched something? FWIW, numbers (as well as strings) are immutable in JavaScript. And there is Object.freeze to make things immutable where you need that. Yury From ncoghlan at gmail.com Fri Jun 5 00:31:57 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Jun 2015 08:31:57 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On 5 Jun 2015 00:25, "Paul Moore" wrote: > > On 4 June 2015 at 14:48, Nick Coghlan wrote: > > On 4 June 2015 at 23:06, Paul Moore wrote: > >> As a straw man how about a new syntax (this won't work as written, > >> because it'll clash with the "<" operator, but the basic idea works): > >> > >> LITERAL_CALL = PRIMARY "<" >> angle bracket>* ">" > > > > The main idea I've had for compile time metaprogramming that I figured > > I might be able to persuade Guido not to hate is: > > > > python_ast, names2cells, unbound_names = > > !(this_is_an_arbitrary_python_expression) > > The fundamental difference between this proposal and mine is (I think) > that you're assuming an arbitrary Python expression in there (which is > parsed), whereas I'm proposing an *unparsed* string. No, when you supplied a custom parser, the parser would have access to the raw string (as well as the name -> cell mapping for the current scope). The "quoted AST parser" would just be the default one. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Fri Jun 5 00:40:11 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 4 Jun 2015 18:40:11 -0400 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> Message-ID: On Wed, Jun 3, 2015 at 9:01 PM, Guido van Rossum wrote: > But there was a big issue that we didn't anticipate. During the course of > a simple program it was quite common for calculations to slow down > dramatically, because numbers with ever-larger numerators and denominators > were being computed (and rational arithmetic quickly slows down as those > get bigger). The problem of unlimited growth can be solved by rounding, but the result is in many ways worse that floating point numbers. One obvious problem is that unlike binary floating point where all bit patterns represent different numbers, only about 60% of fractions with limited numerators and denominators represent unique values. The rest are reducible by dividing the numerator and denominator by the GCD. Furthermore, the fractions with limited numerators are distributed very unevenly on the number line. This problem is present in binary floats as well: floats between 1 and 2 are twice as dense as floats between 2 and 4, but with fractions it is much worse. Since a/b - c/d = (ad-bc)/(bd), a fraction nearest to a/b is at a distance of 1/(bd) from it. So if the denominators are limited by D (|b| < D and |d| < D), for small b's the nearest fraction to a/b is at distance ~ 1/D, but if b ~ D, it is at a distance of 1/D^2. For example, if we limit denominators to 10 decimal digits, the gaps between fractions can vary from ~ 10^(-10) to ~ 10^(-20) even if the fractions are of similar magnitude - say between 1 and 2. These two problems rule out the use of fractions as a general purpose number. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jun 5 00:43:53 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 15:43:53 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <1433453693.1961890.287185857.27F097A2@webmail.messagingengine.com> Message-ID: On Jun 4, 2015, at 14:45, Chris Angelico wrote: > >> On Fri, Jun 5, 2015 at 7:34 AM, wrote: >>> On Thu, Jun 4, 2015, at 17:23, Chris Angelico wrote: >>> Huh. Does that imply that Brython has to construct a brand-new integer >>> object for absolutely every operation and constant, in case someone >>> monkeypatched something? Once integers (and other built-in types) lose >>> their immutability, they become distinguishable: >>> >>> x = 2 >>> monkey_patch(x) >>> y = 2 >> >> Er, we're talking about monkey-patching the int *class* (well, the >> complex class, but the same idea applies), not an individual int object. > > Ah okay. Even so, it would be very surprising if "1+2" could evaluate > to anything other than 3. It's surprising that int('3') could evaluate to 4, or that print(1+2) could print 4, or that adding today and a 1-day timedelta could give you a date in 1918, or that accessing sys.stdout could play a trumpet sound and then read a 300MB file over the network, but there's nothing in the language stopping you from shadowing or replacing or monkeypatching any of those things, there's just your own common sense, and your trust in the common sense of other people working on the code with you. And, getting this back on point: That also means there would be nothing stopping you from accidentally or maliciously redefining literal_d to play a trumpet sound and then read a 300MB file over the network instead of giving you a Decimal value, but that's not a problem the language has to solve, any more than it's a problem that you can replace int or print or sys.__getattr__. The fact that people might overuse user-defined literals (e.g., I think using it for units, like the _ms suffix that C++'s timing library uses, is a bad idea), that's potentially a real problem. The fact that people might stupidly or maliciously interfere with some-other-user's-defined literals is not. Yes, you can surprise people that way, but Python already gives you a lot of much easier ways to surprise people. Python doesn't have a secure loader or enforced privates and constants or anything of the sort; it's designed to be used by consenting adults, and that works everywhere else, so why wouldn't it work here? From random832 at fastmail.us Fri Jun 5 01:02:18 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 04 Jun 2015 19:02:18 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: <5570CD9E.4030403@gmail.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <5570CD9E.4030403@gmail.com> Message-ID: <1433458938.339885.287241337.7676811C@webmail.messagingengine.com> On Thu, Jun 4, 2015, at 18:13, Yury Selivanov wrote: > FWIW, numbers (as well as strings) are immutable in JavaScript. numbers and strings are, but Numbers and Strings aren't. Remember, in Javascript, the former aren't objects. From abarnert at yahoo.com Fri Jun 5 01:03:16 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 16:03:16 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> On Jun 4, 2015, at 06:48, Nick Coghlan wrote: > >> On 4 June 2015 at 23:06, Paul Moore wrote: >> As a straw man how about a new syntax (this won't work as written, >> because it'll clash with the "<" operator, but the basic idea works): >> >> LITERAL_CALL = PRIMARY "<" > angle bracket>* ">" > > The main idea I've had for compile time metaprogramming that I figured > I might be able to persuade Guido not to hate is: > > python_ast, names2cells, unbound_names = > !(this_is_an_arbitrary_python_expression) > > As suggested by the assignment target names, the default behaviour > would be to compile the expression to a Python AST, and then at > runtime provide some relevant information about the name bindings > referenced from it. (I haven't even attempted to implement this, > although I've suggested it to some of the SciPy folks as an idea they > might want to explore to make R style lazy evaluation easier) > > By using the prefix+delimiters notation, it would become possible to > later have variants that were similarly transparent to the compiler, > but *called* a suitably registered callable at compile time to do the > conversion to runtime Python objects. For example: > > !sh(shell command) > !format(format string with implicit interpolation) > !sql(SQL query) > > So for custom numeric types, you could register: > > d = !decimal(1.2) > r = !rational(22/7) But what would that get you? If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value? Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)? > This isn't an idea I'm likely to have time to pursue myself any time > soon (if ever), but I think it addresses the key concern with syntax > customisation by ensuring that customisations have *names*, and that > they're clearly distinguished from normal code. > > Cheers, > Nick. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From abarnert at yahoo.com Fri Jun 5 01:20:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 16:20:34 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> Message-ID: <7DDDD203-8B93-4691-9524-B6191C2EBD25@yahoo.com> On Jun 4, 2015, at 14:05, Guido van Rossum wrote: > > OK, you can attribute that to lousy docs. The intention is that builtin types are immutable. I can go file bugs against those other implementations, but first, what's the rationale? The ABC PEP, the numbers PEP discussion, and the type/class unification tutorial all use the same reason: In CPython, different interpreters in the same memory space (as with mod_python) share the same built-in types. From the numbers discussion, it sounds like this was the only reason to reject the idea of just patching float.__bases__. But most other Python implementations don't have process-wide globals like that to worry about; patching int in one interpreter can't possibly affect any other interpreter. "Because CPython can't do it, nobody else should do it, to keep code portable" might be a good enough rationale for something this fundamental, but if that's not the one you're thinking of, I don't want to put those words in your mouth. >> On Thu, Jun 4, 2015 at 1:18 PM, Andrew Barnert wrote: >>> On Jun 4, 2015, at 12:49, Guido van Rossum wrote: >>> >>>> On Thu, Jun 4, 2015 at 12:14 PM, Andrew Barnert via Python-ideas wrote: >>>> But this isn't actually true. That BINARY_ADD opcode looks up the addition method at runtime and calls it. And that means that if you monkeypatch complex.__radd__, your method will get called. >>> >>> Wrong. You can't moneypatch complex.__radd__. That's a feature of the language. >> >> I may well have missed it, but I went looking through the Built-in Types library documentation, the Data Model and other chapters of the language reference documentation, and every relevant PEP I could think of, and I can't find anything that says this is true. >> >> The best I can find is the rationale section for PEP 3119 saying "there are good reasons to keep the built-in types immutable", which is why PEP 3141 was changed to not require mutating the built-in types. But "there are good reasons to allow implementations to forbid it" isn't the same thing as "all implementations must forbid it". >> >> And at least some implementations do allow it, like Brython and one of the two embedded pythons. (And the rationale in PEP 3119 doesn't apply to them--Brython doesn't share built-in types between different Python interpreters in different browser windows, even if they're in the same address space.) > > > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Jun 5 01:44:37 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 05 Jun 2015 11:44:37 +1200 Subject: [Python-ideas] [Python Ideas] Python Float Update In-Reply-To: <556FA48B.3050909@mrabarnett.plus.com> References: <6D7BBDBE-2AF3-46F3-B1EC-BF8D8D5BA002@yahoo.com> <556FA48B.3050909@mrabarnett.plus.com> Message-ID: <5570E2E5.3020500@canterbury.ac.nz> MRAB wrote: > On 2015-06-04 02:01, Guido van Rossum wrote: > >> From 1982 till 1886 I participated in the implementation of ABC > > Was that when the time machine was first used? :-) Must have been a really big project if you had to give yourself nearly 100 years of development time! -- Greg From yselivanov.ml at gmail.com Fri Jun 5 04:11:17 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 04 Jun 2015 22:11:17 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: <1433458938.339885.287241337.7676811C@webmail.messagingengine.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <5570CD9E.4030403@gmail.com> <1433458938.339885.287241337.7676811C@webmail.messagingengine.com> Message-ID: <55710545.2030300@gmail.com> On 2015-06-04 7:02 PM, random832 at fastmail.us wrote: > On Thu, Jun 4, 2015, at 18:13, Yury Selivanov wrote: >> >FWIW, numbers (as well as strings) are immutable in JavaScript. > numbers and strings are, but Numbers and Strings aren't. Remember, in > Javascript, the former aren't objects. I know. Although you can't mutate the inner-value of Number or String objects, you can only attach properties. Yury From surya.subbarao1 at gmail.com Fri Jun 5 04:12:31 2015 From: surya.subbarao1 at gmail.com (u8y7541 The Awesome Person) Date: Thu, 4 Jun 2015 19:12:31 -0700 Subject: [Python-ideas] Python Float Update Message-ID: > But there was a big issue that we didn't anticipate. During the course of a > simple program it was quite common for calculations to slow down > dramatically, because numbers with ever-larger numerators and denominators > were being computed (and rational arithmetic quickly slows down as those > get bigger). So e.g. you might be computing your taxes with a precision of > a million digits -- only to be rounding them down to dollars for display. (Quote by Guido van Rossum) > Decimal can still violate all the > invariants that binary floats can. Binary has better approximation > properties if you care about the *degree* of inexactness rather than > the *frequency*[1] of inexactness. Fractions can quickly become very > inefficient. (Quote by Stephen J. Turnbull) I have a solution. To add, we convert to decimal and then add, then change back to fraction. If we have hard-to-represent decimals like 1 / 3, we can add with numerators. Hybrid additions. This could really speed up things... -- -Surya Subbarao From guettliml at thomas-guettler.de Fri Jun 5 07:36:59 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Fri, 05 Jun 2015 07:36:59 +0200 Subject: [Python-ideas] Better Type Hinting Message-ID: <5571357B.4080100@thomas-guettler.de> It would be great to have better type hinting in IDEs. My usecase: logger = logging.getLogger(__name__) try: ... except FooException, exc: logger.warn('...') I remember there was a way to show the traceback via logger.warn(). I could use my favorite search engine, but a short cut via the IDE would be much easier. How can the IDE know what kind of duck "logger" is? Many IDEs parse the docstrings, but a lot of code does not provide it. How can this be improved? Regards, Thomas G?ttler PS: I don't mention the name of my IDE intentionally :-) It does not matter for this question. From stefan_ml at behnel.de Fri Jun 5 07:59:58 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 05 Jun 2015 07:59:58 +0200 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <5571357B.4080100@thomas-guettler.de> References: <5571357B.4080100@thomas-guettler.de> Message-ID: Thomas G?ttler schrieb am 05.06.2015 um 07:36: > It would be great to have better type hinting in IDEs. Sounds more like a topic for python-list than python-ideas. > My usecase: > > logger = logging.getLogger(__name__) > try: > ... > except FooException, exc: > logger.warn('...') > > > I remember there was a way to show the traceback via logger.warn(). > > I could use my favorite search engine, but a short cut via the IDE would > be much easier. > > How can the IDE know what kind of duck "logger" is? > > Many IDEs parse the docstrings, but a lot of code does not provide it. > > How can this be improved? > > PS: I don't mention the name of my IDE intentionally :-) > It does not matter for this question. Yes it does. It sounds like you want to use an IDE instead that supports the above. Or install a plugin for the one you're using that improves its capabilities for type introspection. There are a couple of IDE plugins that embed jedi, for example. Stefan From abarnert at yahoo.com Fri Jun 5 08:01:14 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 4 Jun 2015 23:01:14 -0700 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <5571357B.4080100@thomas-guettler.de> References: <5571357B.4080100@thomas-guettler.de> Message-ID: <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> On Jun 4, 2015, at 22:36, Thomas G?ttler wrote: > > It would be great to have better type hinting in IDEs. Is PEP 484 not sufficient for this purpose? Of course you'll have to wait until 3.5 or use an external backport (I think the goal is for MyPy to include stub files for 2.7/3.3/3.4, and/or for them to be published separately on PyPI?), and even longer for every library you depend on to get on board. And of course your favorite IDE has to actually do something with these type hints, and integrate an inference engine (whether that means running something like MyPy in the background or implementing something themselves). But I don't think there's anything Python itself can do to speed any of that up. And meanwhile, I suppose it's possible that the PEP 484 design will turn out to be insufficient, but there's no way we're going to know that until the IDEs try to use it and fail. > My usecase: > > logger = logging.getLogger(__name__) > try: > ... > except FooException, exc: > logger.warn('...') Is there a reason you're using syntax that's deprecate since Python 2.6 and doesn't work in 3.x? Any proposal for the future of the Python language isn't going to help you if you're still using 2.5. From me+python at ixokai.io Fri Jun 5 08:19:30 2015 From: me+python at ixokai.io (Stephen Hansen) Date: Thu, 04 Jun 2015 23:19:30 -0700 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> Message-ID: <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: > On Jun 4, 2015, at 22:36, Thomas G?ttler > wrote: > > > > It would be great to have better type hinting in IDEs. > > Is PEP 484 not sufficient for this purpose? It's really not. For one thing, PEP 484 isn't going to result in the standard library being hinted all up (though I assume someone may make stubs). But really, the specific issue that the OP is running into is because of the signature of logging.warn -- msg, *args, **kwargs. These kinds of signatures are very useful in certain circumstances but they are also completely opaque. They're intentionally taking "anything" and passing it off to another function. PEP 484 doesn't say anything about the realities of logging.warn, as all the work is being done in the private _log where we can examine and learn that it takes two optional keyword parameters named "exc_info" and "extra", or what those mean or what valid values are for them. All my preferred IDE tells me is, "msg, *args, **kwargs", which leaves me befuddled if I don't remember the signature or have the docs in hand. If it were to display the docstring too, I'd know "exc_info" is a valid keyword argument that does something useful, but I'd still have no idea about "extra" (and I actually have no idea what extra does and am not looking it up right now on purpose). I don't really think that this is a problem for Python the language, but maybe the style guide: don't use *args or **kwargs unless you either document the details of what those should be, ooor, maybe include a @functools.passes (fictional device) that in some fashion documents "look at this other function for the things I'm passing along blindly). The problem the OP is demonstrating is really completely out of scope for what PEP 484 is addressing, I think. It has little to do with type hinting and more to do with, IMHO, "should the stdlib provide more introspectable signatures" (which then IDE's could use). -- Stephen Hansen m e @ i x o k a i . i o From ncoghlan at gmail.com Fri Jun 5 09:06:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Jun 2015 17:06:09 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: On 5 June 2015 at 09:03, Andrew Barnert wrote: > On Jun 4, 2015, at 06:48, Nick Coghlan wrote: >> >>> On 4 June 2015 at 23:06, Paul Moore wrote: >>> As a straw man how about a new syntax (this won't work as written, >>> because it'll clash with the "<" operator, but the basic idea works): >>> >>> LITERAL_CALL = PRIMARY "<" >> angle bracket>* ">" >> >> The main idea I've had for compile time metaprogramming that I figured >> I might be able to persuade Guido not to hate is: >> >> python_ast, names2cells, unbound_names = >> !(this_is_an_arbitrary_python_expression) >> >> As suggested by the assignment target names, the default behaviour >> would be to compile the expression to a Python AST, and then at >> runtime provide some relevant information about the name bindings >> referenced from it. (I haven't even attempted to implement this, >> although I've suggested it to some of the SciPy folks as an idea they >> might want to explore to make R style lazy evaluation easier) >> >> By using the prefix+delimiters notation, it would become possible to >> later have variants that were similarly transparent to the compiler, >> but *called* a suitably registered callable at compile time to do the >> conversion to runtime Python objects. For example: >> >> !sh(shell command) >> !format(format string with implicit interpolation) >> !sql(SQL query) >> >> So for custom numeric types, you could register: >> >> d = !decimal(1.2) >> r = !rational(22/7) > > But what would that get you? > > If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? > > Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value? > > Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)? The larger idea (again, keeping in mind I haven't actually fully thought through how to implement this) is to give the parsers access to the surrounding namespace, which means that the compiler needs to be made aware of any *actual* name references, and the *way* names are referenced would be parser dependent (shell variables, format string interpolation, SQL interpolation, etc). So, for example: print(!format(The {item} cost {amount} {units})) Would roughly translate to: print("The {item} cost {amount} {units}".format(item=item, amount=amount, units=units)) It seemed relevant in this context, as a compile time AST transformation would let folks define their own pseudo-literals. Since marshal wouldn't know how to handle them, the AST produced at compile time would still need to be for a runtime constructor call rather than for a value to be stored in co_consts. These cases: d = !decimal(1.2) r = !rational(22/7) Might simply translate directly to the following as the runtime code: d = decimal.Decimal("1.2") r = fractions.Fraction(22, 7) With the difference being that the validity of the passed in string would be checked at compile time rather than at runtime, so you could only use it for literal values, not to construct values from variables. As far as registration goes, yes, there'd need to be a way to hook the compiler to notify it of the existence of these compile time AST generation functions. Dave Malcolm's patch to allow parts of the compiler to be written in Python rather than C (https://bugs.python.org/issue10399 ) might be an interest place to start on that front. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jun 5 09:10:56 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 5 Jun 2015 17:10:56 +1000 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> Message-ID: On 5 June 2015 at 16:19, Stephen Hansen wrote: > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: >> On Jun 4, 2015, at 22:36, Thomas G?ttler >> wrote: >> > >> > It would be great to have better type hinting in IDEs. >> >> Is PEP 484 not sufficient for this purpose? > > It's really not. > > For one thing, PEP 484 isn't going to result in the standard library > being hinted all up (though I assume someone may make stubs). Doing exactly that is a core part of the PEP 484 effort, since it's needed to assist in Python 2 -> 3 migrations: https://github.com/JukkaL/typeshed One of the advantages of that is that more specific signatures can be added to stdlib stubs and benefit IDEs in existing Python versions, rather than having to wait for more explicit signatures in future Python versions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guettliml at thomas-guettler.de Fri Jun 5 09:21:33 2015 From: guettliml at thomas-guettler.de (=?windows-1252?Q?Thomas_G=FCttler?=) Date: Fri, 05 Jun 2015 09:21:33 +0200 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> Message-ID: <55714DFD.3090100@thomas-guettler.de> Am 05.06.2015 um 08:19 schrieb Stephen Hansen: > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: >> On Jun 4, 2015, at 22:36, Thomas G?ttler >> wrote: >>> >>> It would be great to have better type hinting in IDEs. >> >> Is PEP 484 not sufficient for this purpose? > > It's really not. > > For one thing, PEP 484 isn't going to result in the standard library > being hinted all up (though I assume someone may make stubs). But > really, the specific issue that the OP is running into is because of the > signature of logging.warn -- msg, *args, **kwargs. I am using logger.warn() not logging.warn(). The question is: How to know which kind of duck "logger" is? "logger" was created by "logging.getLogger(__name__)" It is not the question how to implement better guessing in the IDE. The basics needs to be solved. Everything else is "toilet paper programming" (Ah, smell inside, ... let's write an wrapper ...) Regards, Thomas G?ttler From cory at lukasa.co.uk Fri Jun 5 09:36:18 2015 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 5 Jun 2015 08:36:18 +0100 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <55714DFD.3090100@thomas-guettler.de> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> <55714DFD.3090100@thomas-guettler.de> Message-ID: <28B1805F-C378-419B-8458-E7DF20A41AA4@lukasa.co.uk> > On 5 Jun 2015, at 08:21, Thomas G?ttler wrote: > > I am using logger.warn() not logging.warn(). > > The question is: How to know which kind of duck "logger" is? > > "logger" was created by "logging.getLogger(__name__)" > > It is not the question how to implement better guessing in the IDE. > > The basics needs to be solved. Everything else is "toilet paper programming" > (Ah, smell inside, ... let's write an wrapper ...) This question is unanswerable unless you actually execute the code at runtime under the exact same conditions as you expect to encounter it. Because Python allows for monkey patching at runtime by any other code running in the process you can make no assumptions about what kind of duck that will be. Even without monkey patching you can?t know, because someone may have adjusted sys.path ahead of time, causing you to import an entirely unexpected module called ?logging?. In certain specialised cases, if you limit yourself to special rules, you *might* be able to statically assert the type of this object, but in the general case it simply cannot be done. So the only way to improve this is to implement better guessing in the IDE. Hence the improvements that were proposed to you. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From me+python at ixokai.io Fri Jun 5 09:41:37 2015 From: me+python at ixokai.io (Stephen Hansen) Date: Fri, 05 Jun 2015 00:41:37 -0700 Subject: [Python-ideas] Better Type Hinting In-Reply-To: References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> Message-ID: <1433490097.1269715.287483745.6968D820@webmail.messagingengine.com> On Fri, Jun 5, 2015, at 12:10 AM, Nick Coghlan wrote: > On 5 June 2015 at 16:19, Stephen Hansen wrote: > > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: > >> On Jun 4, 2015, at 22:36, Thomas G?ttler > >> wrote: > >> > > >> > It would be great to have better type hinting in IDEs. > >> > >> Is PEP 484 not sufficient for this purpose? > > > > It's really not. > > > > For one thing, PEP 484 isn't going to result in the standard library > > being hinted all up (though I assume someone may make stubs). > > Doing exactly that is a core part of the PEP 484 effort, since it's > needed to assist in Python 2 -> 3 migrations: > https://github.com/JukkaL/typeshed How so? Nothing in PEP 484 addresses signatures which take *args and **kwargs arguments. Please, correct me if I'm wrong, but my understanding is you can uselessly specify types for a Mapping, but you can't specify what actual keys are valid in that mapping. That's the problem with logging's specification, its functions take "anything" and pass on "anything", in their signature... In reality they take up two keyword arguments -- exc_info and extra. Args really is "anything" as its just formatted against the string. Unless I'm missing something, PEP484 only allows defining *types* of specific arguments -- but this isn't about types. This is about what arguments are valid (and then, after you have that bit of knowing, what types come next). When used with API's that take *args and **kwargs, I don't see how PEP484 is useful at all. I'm not arguing against PEP484. but it has nothing at all to do with the specific problem mentioned here. Dynamic API's that take "any args" and "any kwargs" are opaque things it doesn't tell anything about. Logger.warn (and, logging.warn) is one such API. On Fri, Jun 5, 2015, at 12:21 AM, Thomas G?ttler wrote: > > Am 05.06.2015 um 08:19 schrieb Stephen Hansen: > > On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: > >> On Jun 4, 2015, at 22:36, Thomas G?ttler > >> wrote: > >>> > >>> It would be great to have better type hinting in IDEs. > >> > >> Is PEP 484 not sufficient for this purpose? > > > > It's really not. > > > > For one thing, PEP 484 isn't going to result in the standard library > > being hinted all up (though I assume someone may make stubs). But > > really, the specific issue that the OP is running into is because of the > > signature of logging.warn -- msg, *args, **kwargs. > > I am using logger.warn() not logging.warn(). Same difference. Logging.warn is just a thin wrapper around the root logger's warn(). They still have the same completely opaque signature, of msg, *args, **kwargs that it passes along, which is why your IDE can't report a useful signature that tells you that exc_info=True is what you want. --S From abarnert at yahoo.com Fri Jun 5 10:38:32 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 01:38:32 -0700 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <55714DFD.3090100@thomas-guettler.de> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> <55714DFD.3090100@thomas-guettler.de> Message-ID: On Jun 5, 2015, at 00:21, Thomas G?ttler wrote: > >> Am 05.06.2015 um 08:19 schrieb Stephen Hansen: >>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: >>> On Jun 4, 2015, at 22:36, Thomas G?ttler >>> wrote: >>>> >>>> It would be great to have better type hinting in IDEs. >>> >>> Is PEP 484 not sufficient for this purpose? >> >> It's really not. >> >> For one thing, PEP 484 isn't going to result in the standard library >> being hinted all up (though I assume someone may make stubs). But >> really, the specific issue that the OP is running into is because of the >> signature of logging.warn -- msg, *args, **kwargs. > > I am using logger.warn() not logging.warn(). > > The question is: How to know which kind of duck "logger" is? That is _exactly_ what PEP 484 addresses. If `logging.getLogger` is annotated or stubbed to specify that it returns a `logging.Logger` (which it will be), then a static type checker (whether MyPy or a competing checker or custom code in the IDE) can trivially infer that `logger` is a `logging.Logger`. If you needed to annotate exactly which subclass of `Logger` was returned (unlikely, but not impossible--maybe you conditionally do a `logging.set_logger_class`, and _you_ know what the type is going to be even though a static analyzer can't infer it, and your subclass has a different API than the base class), then you can use a variable type comment. As Stephen Hansen argues, that may still not solve all of your problems. But it definitely does solve the "how to know which kind of duck" problem you're asking about. > "logger" was created by "logging.getLogger(__name__)" > > It is not the question how to implement better guessing in the IDE. > > The basics needs to be solved. Everything else is "toilet paper programming" Have you read PEP 484? What part of the basics do you think it's not solving? Because it sounds an awful lot like you're just demanding that someone write something exactly like PEP 484. From abarnert at yahoo.com Fri Jun 5 10:47:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 01:47:43 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: <244352B9-ADBB-4665-B930-9C3205BEBA57@yahoo.com> On Jun 5, 2015, at 00:06, Nick Coghlan wrote: > >> On 5 June 2015 at 09:03, Andrew Barnert wrote: >>> On Jun 4, 2015, at 06:48, Nick Coghlan wrote: >>> >>>> On 4 June 2015 at 23:06, Paul Moore wrote: >>>> As a straw man how about a new syntax (this won't work as written, >>>> because it'll clash with the "<" operator, but the basic idea works): >>>> >>>> LITERAL_CALL = PRIMARY "<" >>> angle bracket>* ">" >>> >>> The main idea I've had for compile time metaprogramming that I figured >>> I might be able to persuade Guido not to hate is: >>> >>> python_ast, names2cells, unbound_names = >>> !(this_is_an_arbitrary_python_expression) >>> >>> As suggested by the assignment target names, the default behaviour >>> would be to compile the expression to a Python AST, and then at >>> runtime provide some relevant information about the name bindings >>> referenced from it. (I haven't even attempted to implement this, >>> although I've suggested it to some of the SciPy folks as an idea they >>> might want to explore to make R style lazy evaluation easier) >>> >>> By using the prefix+delimiters notation, it would become possible to >>> later have variants that were similarly transparent to the compiler, >>> but *called* a suitably registered callable at compile time to do the >>> conversion to runtime Python objects. For example: >>> >>> !sh(shell command) >>> !format(format string with implicit interpolation) >>> !sql(SQL query) >>> >>> So for custom numeric types, you could register: >>> >>> d = !decimal(1.2) >>> r = !rational(22/7) >> >> But what would that get you? >> >> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? >> >> Also, what's the point of it being compile-time? Unless there's some way to write arbitrary code that operates at compile time (like Lisp special forms, or C++ constexpr functions), what code is going to care about the difference between a compile-time decimal value and a run-time decimal value? >> >> Also, where and how do you define sh, decimal, sql, etc.? I'm having a hard time seeing how you have any different options than my proposal does. You could have a function named bang_decimal that's looked up normally, or some way to register_bang_function('decimal', my_decimal_parser), or any of the other options mentioned in this thread, but what's the difference (other than there being a default "no-name" function that does an AST parse and name binding, which doesn't really seem related to any of the non-default examples)? > > The larger idea (again, keeping in mind I haven't actually fully > thought through how to implement this) is to give the parsers access > to the surrounding namespace, which means that the compiler needs to > be made aware of any *actual* name references, and the *way* names are > referenced would be parser dependent (shell variables, format string > interpolation, SQL interpolation, etc). > > So, for example: > > print(!format(The {item} cost {amount} {units})) > > Would roughly translate to: > > print("The {item} cost {amount} {units}".format(item=item, > amount=amount, units=units)) > > It seemed relevant in this context, as a compile time AST > transformation would let folks define their own pseudo-literals. Since > marshal wouldn't know how to handle them, the AST produced at compile > time would still need to be for a runtime constructor call rather than > for a value to be stored in co_consts. These cases: > > d = !decimal(1.2) > r = !rational(22/7) > > Might simply translate directly to the following as the runtime code: > > d = decimal.Decimal("1.2") > r = fractions.Fraction(22, 7) > > With the difference being that the validity of the passed in string > would be checked at compile time rather than at runtime, so you could > only use it for literal values, not to construct values from > variables. Note that, as discussed earlier in this thread, it is far easier to accidentally shadow `decimal` than something like `literal_decimal` or `bang_parser_decimal`, so there's a cost to doing this half-way at compile time, not just a benefit. Also, a registry is definitely more "magical" than an explicit import: something some other module imported that isn't even visible in this module has changed the way this module is run, and even compiled. Of course that's true for import hooks as well, but I think in the case of import hooks there's really no avoiding the magic; in this case, there is. Obviously explicit vs. implicit isn't the only factor in usability/readability, so it's possible it would be better anyway, but I'm not sure it is. At any rate, although you haven't shown how you expect these functions to be implemented, I think this proposal ends up being roughly equivalent to mine. Sure, the `bang_parser_decimal` function can compile the source to an AST and look up names in some way, but `literal_decimal` can do that too. And presumably whatever helper functions you were imagining to make that easier could still be written. So it's ultimately just bikeshedding the syntax, and whether you use a registry vs. normal lookup. > As far as registration goes, yes, there'd need to be a way to hook the > compiler to notify it of the existence of these compile time AST > generation functions. Dave Malcolm's patch to allow parts of the > compiler to be written in Python rather than C > (https://bugs.python.org/issue10399 ) might be an interest place to > start on that front. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Fri Jun 5 11:29:43 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 02:29:43 -0700 Subject: [Python-ideas] Hooking between lexer and parser Message-ID: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> Compiling a module has four steps: * bytes->str (based on encoding declaration or default) * str->token stream * token stream->AST * AST->bytecode You can very easily hook at every point in that process except the token stream. There _is_ a workaround: re-encode the text to bytes, wrap it in a BytesIO, call tokenize, munge the token stream, call untokenize, re-decode back to text, then pass that to compile or ast.parse. But, besides being a bit verbose and painful, that means your line and column numbers get screwed up. So, while its fine for a quick&dirty toy like my user-literal-hack, it's not something you'd want to do in a real import hook for use in real code. This could be solved by just changing ast.parse to accept an iterable of tokens or tuples as well as a string, and likewise for compile. That isn't exactly a trivial change, because under the covers the _ast module is written in C, partly auto-generated, and expects as input a CST, which is itself created from a different tokenizer written in C with an similar but different API (since C doesn't have iterators). And adding a PyTokenizer_FromIterable or something seems like it might raise some fun bootstrapping issues that I haven't thought through yet. But I think it ought to be doable without having to reimplement the whole parser in pure Python. And I think it would be worth doing. While we're at it, a few other (much smaller) changes would be nice: * Allow tokenize to take a text file instead of making it take a binary file and repeat the encoding detection. * Allow tokenize to take a file instead of its readline method. * Allow tokenize to take a str/bytes instead of requiring a file. * Add flags to compile to stop at any stage (decoded text, tokens, AST, or bytecode) instead of just the last two. (The funny thing is that the C tokenizer actually already does support strings and bytes and file objects.) I realize that doing all of these changes would mean that compile can now get an iterable and not know whether it's a file or a token stream until it tries to iterate it. So maybe that isn't the best API; maybe it's better to explicitly call tokenize, then ast.parse, then compile instead of calling compile repeatedly with different flags. From stefan at bytereef.org Fri Jun 5 12:52:24 2015 From: stefan at bytereef.org (s.krah) Date: Fri, 05 Jun 2015 10:52:24 +0000 Subject: [Python-ideas] Decimal facts Message-ID: <14dc359af40.128389f9f35569.7813901213748999215@bytereef.org> > Also, I'm not entirely sure we want to add those types to decimal in the stdlib. It would be a lot less work to implement them in terms of an existing C implementation (maybe using the native types if the exist, Intel's library if they don't). But I don't think that's necessarily desirable for the stdlib. Wrong. The specification we're following is almost IEEE-2008. Mike Cowlishaw evolved the spec while he was on the IEEE commitee. The Intel library is very slow for decimal128, and last time I looked it did not promise correct rounding! That fact was deeply buried in the docs. As an aside, I'm not sure how serious the "Float Update" thread was, given that the OP tried to sneak the Grothendieck prime past us. Stefan Krah -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jun 5 14:09:20 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jun 2015 13:09:20 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On 4 June 2015 at 23:31, Nick Coghlan wrote: >> The fundamental difference between this proposal and mine is (I think) >> that you're assuming an arbitrary Python expression in there (which is >> parsed), whereas I'm proposing an *unparsed* string. > > No, when you supplied a custom parser, the parser would have access to the > raw string (as well as the name -> cell mapping for the current scope). > > The "quoted AST parser" would just be the default one. Ah, I see now what you meant. Apologies, I'd not fully understood what you were proposing. In which case yes, your proposal is strictly more powerful than mine. You still have the same problem as me, that what's inside !xxx(...) cannot contain a ")" character. (Or maybe can't contain an unmatched ")", or an unescaped ")", depending on what restrictions you feel like putting on the form of the unparsed expression...) But I think that's fundamental to any form of syntax embedding, so it's not exactly a showstopper. Paul From p.f.moore at gmail.com Fri Jun 5 14:18:43 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jun 2015 13:18:43 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: On 5 June 2015 at 00:03, Andrew Barnert wrote: > If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? Well, Python bytecode has no way of holding any form of constant Decimal value, so if that's what you want you need a change to the bytecode (and hence the interperter). I'm not sure how that qualifies as "user-defined". We seem to be talking at cross purposes here. The questions you're asking are ones I would direct at you (assuming it's you that's after a compile-time value, I'm completely lost as to who is arguing for what any more :-() My position is that "compile-time" user-defined literals don't make sense in Python, what people actually want is probably more along the lines of "better syntax for writing constant values of user-defined types". Oh, and just as a point of reference see http://en.cppreference.com/w/cpp/language/user_literal - C++ user defined literals translate into a *runtime* function call. So even static languages don't work the way you suggest in the comment above. Paul From random832 at fastmail.us Fri Jun 5 14:28:46 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 05 Jun 2015 08:28:46 -0400 Subject: [Python-ideas] User-defined literals In-Reply-To: <55710545.2030300@gmail.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <5570CD9E.4030403@gmail.com> <1433458938.339885.287241337.7676811C@webmail.messagingengine.com> <55710545.2030300@gmail.com> Message-ID: <1433507326.2184943.287667025.18BCFAF0@webmail.messagingengine.com> On Thu, Jun 4, 2015, at 22:11, Yury Selivanov wrote: > I know. Although you can't mutate the inner-value of Number or > String objects, you can only attach properties. You can shadow valueOf, which gets you close enough for many purposes. From encukou at gmail.com Fri Jun 5 14:30:06 2015 From: encukou at gmail.com (Petr Viktorin) Date: Fri, 5 Jun 2015 14:30:06 +0200 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> Message-ID: On Fri, Jun 5, 2015 at 2:09 PM, Paul Moore wrote: > On 4 June 2015 at 23:31, Nick Coghlan wrote: >>> The fundamental difference between this proposal and mine is (I think) >>> that you're assuming an arbitrary Python expression in there (which is >>> parsed), whereas I'm proposing an *unparsed* string. >> >> No, when you supplied a custom parser, the parser would have access to the >> raw string (as well as the name -> cell mapping for the current scope). >> >> The "quoted AST parser" would just be the default one. > > Ah, I see now what you meant. Apologies, I'd not fully understood what > you were proposing. In which case yes, your proposal is strictly more > powerful than mine. > > You still have the same problem as me, that what's inside !xxx(...) > cannot contain a ")" character. (Or maybe can't contain an unmatched > ")", or an unescaped ")", depending on what restrictions you feel like > putting on the form of the unparsed expression...) But I think that's > fundamental to any form of syntax embedding, so it's not exactly a > showstopper. Parsing consumes tokens. The tokenizer already tracks parentheses (for ignoring indentation between them), so umatched parens would throw off the tokenizer itself. It'd be reasonable to require !macros to only contain valid Python tokens, and have matched parentheses tokens (i.e. ignoring parens in comments/string literals.) From stefan_ml at behnel.de Fri Jun 5 15:11:47 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 05 Jun 2015 15:11:47 +0200 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <28B1805F-C378-419B-8458-E7DF20A41AA4@lukasa.co.uk> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> <55714DFD.3090100@thomas-guettler.de> <28B1805F-C378-419B-8458-E7DF20A41AA4@lukasa.co.uk> Message-ID: Cory Benfield schrieb am 05.06.2015 um 09:36: >> On 5 Jun 2015, at 08:21, Thomas G?ttler wrote: >> >> I am using logger.warn() not logging.warn(). >> >> The question is: How to know which kind of duck "logger" is? >> >> "logger" was created by "logging.getLogger(__name__)" >> >> It is not the question how to implement better guessing in the IDE. >> >> The basics needs to be solved. Everything else is "toilet paper >> programming" (Ah, smell inside, ... let's write an wrapper ...) > > This question is unanswerable unless you actually execute the code at > runtime under the exact same conditions as you expect to encounter it. That doesn't mean that it's impossible to find enough type information to make an IDE present something helpful to a user. In all interesting cases, the object returned by logging.getLogger() will be a logger instance with a well-known interface, and tools can just know that. Tools like Jedi and PyCharm show that this is definitely possible. Stefan From guettliml at thomas-guettler.de Fri Jun 5 15:32:36 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Fri, 05 Jun 2015 15:32:36 +0200 Subject: [Python-ideas] Better Type Hinting In-Reply-To: References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> <55714DFD.3090100@thomas-guettler.de> Message-ID: <5571A4F4.2050704@thomas-guettler.de> Am 05.06.2015 um 10:38 schrieb Andrew Barnert: > On Jun 5, 2015, at 00:21, Thomas G?ttler wrote: >> >>> Am 05.06.2015 um 08:19 schrieb Stephen Hansen: >>>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: >>>> On Jun 4, 2015, at 22:36, Thomas G?ttler >>>> wrote: >>>>> >>>>> It would be great to have better type hinting in IDEs. >>>> >>>> Is PEP 484 not sufficient for this purpose? >>> >>> It's really not. >>> >>> For one thing, PEP 484 isn't going to result in the standard library >>> being hinted all up (though I assume someone may make stubs). But >>> really, the specific issue that the OP is running into is because of the >>> signature of logging.warn -- msg, *args, **kwargs. >> >> I am using logger.warn() not logging.warn(). >> >> The question is: How to know which kind of duck "logger" is? > > That is _exactly_ what PEP 484 addresses. Now I read it and it does exactly what I was looking for. > If `logging.getLogger` is annotated or stubbed to specify that it returns a `logging.Logger` (which it will be), > then a static type checker (whether MyPy or a competing checker or custom code in the IDE) > can trivially infer that `logger` is a `logging.Logger`. Unfortunately we still use Python2.7, but maybe it is time for change ... Just one thing left: > **If** `logging.getLogger` is annotated .... What is the policy of the standard library? Will there be type hints for methods like logging.getLogger() in the standard library in the future? Since it is quite easy to add them, will patches be accepted? Regards, Thomas G?ttler From ncoghlan at gmail.com Fri Jun 5 16:53:27 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Jun 2015 00:53:27 +1000 Subject: [Python-ideas] Better Type Hinting In-Reply-To: <5571A4F4.2050704@thomas-guettler.de> References: <5571357B.4080100@thomas-guettler.de> <300DF472-8A14-4841-A506-E4E2E4F47FC7@yahoo.com> <1433485170.1246772.287450105.441D9729@webmail.messagingengine.com> <55714DFD.3090100@thomas-guettler.de> <5571A4F4.2050704@thomas-guettler.de> Message-ID: On 5 Jun 2015 23:34, "Thomas G?ttler" wrote: > > > > Am 05.06.2015 um 10:38 schrieb Andrew Barnert: >> >> On Jun 5, 2015, at 00:21, Thomas G?ttler wrote: >>> >>> >>>> Am 05.06.2015 um 08:19 schrieb Stephen Hansen: >>>>> >>>>> On Thu, Jun 4, 2015, at 11:01 PM, Andrew Barnert via Python-ideas wrote: >>>>> On Jun 4, 2015, at 22:36, Thomas G?ttler >>>>> wrote: >>>>>> >>>>>> >>>>>> It would be great to have better type hinting in IDEs. >>>>> >>>>> >>>>> Is PEP 484 not sufficient for this purpose? >>>> >>>> >>>> It's really not. >>>> >>>> For one thing, PEP 484 isn't going to result in the standard library >>>> being hinted all up (though I assume someone may make stubs). But >>>> really, the specific issue that the OP is running into is because of the >>>> signature of logging.warn -- msg, *args, **kwargs. >>> >>> >>> I am using logger.warn() not logging.warn(). >>> >>> The question is: How to know which kind of duck "logger" is? >> >> >> That is _exactly_ what PEP 484 addresses. > > > Now I read it and it does exactly what I was looking for. >> If `logging.getLogger` is annotated or stubbed to specify that it returns a `logging.Logger` (which it will be), > > > then a static type checker (whether MyPy or a competing checker or custom code in the IDE) > > can trivially infer that `logger` is a `logging.Logger`. > > Unfortunately we still use Python2.7, but maybe it is time for change ... The typeshed project provides stubs for both Python 2 & 3: https :// github.com / JukkaL / typeshed Type hinting your own code where appropriate would be easier in Python 3 (since you can use inline type hints) > Just one thing left: > > > **If** `logging.getLogger` is annotated .... > > What is the policy of the standard library? Will there be type hints > for methods like logging.getLogger() in the standard library in the future? The standard library won't be getting native annotations any time soon, but the typeshed annotations are expected to fill the gap. > Since it is quite easy to add them, will patches be accepted? Contributions to the typeshed stubs would be preferable. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Fri Jun 5 17:45:52 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 08:45:52 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: On Jun 5, 2015, at 05:18, Paul Moore wrote: > >> On 5 June 2015 at 00:03, Andrew Barnert wrote: >> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? > > Well, Python bytecode has no way of holding any form of constant > Decimal value, so if that's what you want you need a change to the > bytecode (and hence the interperter). I'm not sure how that qualifies > as "user-defined". That's the point I was making. Nick proposed this syntax in reply to a message where I said that being a compile-time value is both irrelevant and impossible, so I thought he was claiming that this syntax somehow solved that problem where mine didn't. > We seem to be talking at cross purposes here. The questions you're > asking are ones I would direct at you (assuming it's you that's after > a compile-time value, I'm completely lost as to who is arguing for > what any more :-() My position is that "compile-time" user-defined > literals don't make sense in Python, what people actually want is > probably more along the lines of "better syntax for writing constant > values of user-defined types". Be careful of that word "constant". Python doesn't really have a distinction between constant and non-constant values. There are values of immutable and mutable types, and there are read-only attributes and members of immutable collections, but there's no such thing as a constant list value or a non-constant decimal value. So people can't be asking to create constant decimal values when they ask for literal decimal values. So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing. Notice that the C++ committee didn't start out by trying to define "literal" so they could define "user-defined literal"; they started with a vague notion that 1.2d could be a literal in the same sense that 0x1F is, came up with a proposal for that, hashed out that proposal through a series of revisions, translated the proposal into standardese, and then pointed at it and defined "literal" in terms of that. They could have instead decided "You know what, we don't like the term 'literal' for this after all" and called it something different in the final standard, and it still would have served the same needs, and I'm fine if people want to take that tack with Python. A name isn't meaningless, but it's not the most important part of the meaning; the semantics of the feature and the idiomatic uses of it are what matter. > Oh, and just as a point of reference see > http://en.cppreference.com/w/cpp/language/user_literal - C++ user > defined literals translate into a *runtime* function call. No, if you define the operator constexpr, and it returns a value constructed with a constexpr constructor, 1.2d is a compile-time value that can be used in further compile-time computation. That's the point I made earlier in the thread: the notion of "compile-time value" only really makes sense if you have a notion of "compile-time computation"; otherwise, it's irrelevant to any (non-reflective) computation. Therefore, the fact that my proposal leaves that part out of the C++ feature doesn't matter. (Of course Python doesn't quite have _no_ compile-time computation; it has optional constant folding. But if you try to build on top of that without biting the bullet and just declaring the whole language accessible at compile time, you end up with the mess that was C++03, where compile-time code is slow, clumsy, and completely different from runtime code, which is a large part of why we have C++11, and also why we have D and various other languages. I don't think Python should add _anything_ new at compile time. You can always simulate compile time with import time, where the full language is available, so there's no compelling reason to make the same mistake C++ did.) From p.f.moore at gmail.com Fri Jun 5 17:55:40 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jun 2015 16:55:40 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: On 5 June 2015 at 16:45, Andrew Barnert wrote: > So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing. OK, my apologies, we're basically agreeing violently, then. IMO, people typically *actually* want a nicer syntax for Decimal values known at source-code-writing time. They probably don't actually really think much about whether the value could be affected by monkeypatching, or runtime changes, because they won't actually do that in practice. So just documenting a clear, sane and suitably Pythonic behaviour should be fine in practice (it won't stop the bikeshedding of course :-)) And "it's the same as Decimal('1.2')" is likely to be sufficiently clear, sane and Pythonic, even if it isn't actually a "literal" in any real sense. That's certainly true for me - I'd be happy with a syntax that worked like that. Paul. From abarnert at yahoo.com Fri Jun 5 18:13:37 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 09:13:37 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: <73B1651C-1B69-423E-AB89-3FD05B04B8D7@yahoo.com> On Jun 5, 2015, at 08:55, Paul Moore wrote: > >> On 5 June 2015 at 16:45, Andrew Barnert wrote: >> So, what does "literal" mean, if it's neither the same thing as "compile-time" nor the same thing as "constant" but just happens to overlap those perfectly in the simplest cases? Well, I think the sense in which these things should "act like literals" is intuitively obvious, but very hard to nail down precisely. Hence the intentionally vague "sufficiently simple" definition I gave. But it doesn't _need_ to be nailed down precisely, because a proposal can be precise, and you can then check it against the cases people intuitively want, and see if they do the right thing. > > OK, my apologies, we're basically agreeing violently, then. > > IMO, people typically *actually* want a nicer syntax for Decimal > values known at source-code-writing time. They probably don't actually > really think much about whether the value could be affected by > monkeypatching, or runtime changes, because they won't actually do > that in practice. So just documenting a clear, sane and suitably > Pythonic behaviour should be fine in practice (it won't stop the > bikeshedding of course :-)) And "it's the same as Decimal('1.2')" is > likely to be sufficiently clear, sane and Pythonic, even if it isn't > actually a "literal" in any real sense. That's certainly true for me - > I'd be happy with a syntax that worked like that. Thank you; I think you've just stated exactly my rationale in one paragraph better than all my longer attempts. :) Well, I think it actually _is_ a literal in some useful sense, but I don't see much point in arguing about that. As long as the syntax and semantics are useful, and the name is something I can remember well enough to search for and tell other people about, I'm happy. Anyway, the important question for me is whether people want this for any other type than Decimal (or, really, for decimal64, but unfortunately they don't have that option). That's why I created a hacky implementation, so anyone who thinks they have a good use case for fractions or a custom string type* or whatever can play with it and see if the code actually reads well to themselves and others. If it really is only Decimal that people want, we're better off with something specific rather than general. (* My existing hack doesn't actually handle strings. Once I realized I'd left that out, I was hoping someone would bring it up, so I'd know someone was actually playing with it, at which point I can add it in a one-liner change. But apparently none of the people who downloaded it has actually tried it beyond running the included tests on 1.2d...) From p.f.moore at gmail.com Fri Jun 5 19:42:03 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Jun 2015 18:42:03 +0100 Subject: [Python-ideas] User-defined literals In-Reply-To: <73B1651C-1B69-423E-AB89-3FD05B04B8D7@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> <73B1651C-1B69-423E-AB89-3FD05B04B8D7@yahoo.com> Message-ID: On 5 June 2015 at 17:13, Andrew Barnert wrote: > Anyway, the important question for me is whether people want this for any other type than Decimal Personally, I don'tuse decimals enough to care. But I like Nick's generalised version, and I can easily imagine using that for a number of things: unevaluated code objects or SQL snippets, for example. I'd like to be able to use it as a regex literal, as well, but I don't think it lends itself to that (I suspect a bare regex would choke the Python lexer far too much). But yes, the big question is whether it would be used sufficiently to justify the work. And of course, it'd be Python 3.6+ only, so people doing single-source code supporting older versions wouldn't be able to use it for some time anyway. That's a high bar for *any* new syntax, though, not specific to this. Paul From mistersheik at gmail.com Fri Jun 5 22:38:27 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 5 Jun 2015 13:38:27 -0700 (PDT) Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> Message-ID: <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Actually CPython has another step between the AST and the bytecode, which validates the AST to block out trees that violate various rules that were not easily incorporated into the LL(1) grammar. This means that when you want to change parsing, you have to change: the grammar, the AST library, the validation library, and Python's exposed parsing module. Modern parsers do not separate the grammar from tokenizing, parsing, and validation. All of these are done in one place, which not only simplifies changes to the grammar, but also protects you from possible inconsistencies. It was really hard for me when I was making changes to the parser to keep my conception of these four things synchronized. So in my opinion, if you're going to modernize the parsing, then put it all together into one simple library that deals with all of it. It seems like what you're suggesting would add complexity, whereas a merged solution would simplify the code. If it's hard to write a fast parser, then consider writing a parser generator in Python that generates the C code you want. Best, Neil On Friday, June 5, 2015 at 5:30:23 AM UTC-4, Andrew Barnert via Python-ideas wrote: > > Compiling a module has four steps: > > * bytes->str (based on encoding declaration or default) > * str->token stream > * token stream->AST > * AST->bytecode > > You can very easily hook at every point in that process except the token > stream. > > There _is_ a workaround: re-encode the text to bytes, wrap it in a > BytesIO, call tokenize, munge the token stream, call untokenize, re-decode > back to text, then pass that to compile or ast.parse. But, besides being a > bit verbose and painful, that means your line and column numbers get > screwed up. So, while its fine for a quick&dirty toy like my > user-literal-hack, it's not something you'd want to do in a real import > hook for use in real code. > > This could be solved by just changing ast.parse to accept an iterable of > tokens or tuples as well as a string, and likewise for compile. > > That isn't exactly a trivial change, because under the covers the _ast > module is written in C, partly auto-generated, and expects as input a CST, > which is itself created from a different tokenizer written in C with an > similar but different API (since C doesn't have iterators). And adding a > PyTokenizer_FromIterable or something seems like it might raise some fun > bootstrapping issues that I haven't thought through yet. But I think it > ought to be doable without having to reimplement the whole parser in pure > Python. And I think it would be worth doing. > > While we're at it, a few other (much smaller) changes would be nice: > > * Allow tokenize to take a text file instead of making it take a binary > file and repeat the encoding detection. > * Allow tokenize to take a file instead of its readline method. > * Allow tokenize to take a str/bytes instead of requiring a file. > * Add flags to compile to stop at any stage (decoded text, tokens, AST, > or bytecode) instead of just the last two. > > (The funny thing is that the C tokenizer actually already does support > strings and bytes and file objects.) > > I realize that doing all of these changes would mean that compile can now > get an iterable and not know whether it's a file or a token stream until it > tries to iterate it. So maybe that isn't the best API; maybe it's better to > explicitly call tokenize, then ast.parse, then compile instead of calling > compile repeatedly with different flags. > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From luciano at ramalho.org Sat Jun 6 00:55:24 2015 From: luciano at ramalho.org (Luciano Ramalho) Date: Fri, 5 Jun 2015 19:55:24 -0300 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar wrote: > Modern parsers do not separate the grammar from tokenizing, parsing, and > validation. All of these are done in one place, which not only simplifies > changes to the grammar, but also protects you from possible inconsistencies. Hi, Neil, thanks for that! Having studied only ancient parsers, I'd love to learn new ones. Can you please post references to modern parsing? Actual parsers, books, papers, anything you may find valuable. I have I hunch you're talking about PEG parsers, but maybe something else, or besides? Thanks! Best, Luciano -- Luciano Ramalho | Author of Fluent Python (O'Reilly, 2015) | http://shop.oreilly.com/product/0636920032519.do | Professor em: http://python.pro.br | Twitter: @ramalhoorg From abarnert at yahoo.com Sat Jun 6 00:58:07 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 5 Jun 2015 15:58:07 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <58B125C2-CE07-4AAB-B618-6F6EEBBD0E14@yahoo.com> On Jun 5, 2015, at 13:38, Neil Girdhar wrote: > > Actually CPython has another step between the AST and the bytecode, which validates the AST to block out trees that violate various rules that were not easily incorporated into the LL(1) grammar. Yes, and it also builds CST nodes before building the AST, and there's a step after AST validation and before bytecode generation where the symbol table for the scope is built and LEG rules are applies. But none of those are things that seem particularly useful to hook. Maybe that's just a failure of imagination, but I've never wanted to do it. Hooking the token stream, on the other hand, has pretty obvious uses. For example, in the user-defined literal thread, Paul Moore suggested that for Nick Coghlan's "compile-time expression" idea, requiring valid Python syntax would be way too restrictive, but requiring valid Python tokens is probably OK, and it automatically solves the quoting problem, and it would usually be easier than parsing text. I think he's right about all three parts of that, but unfortunately you can't implement it that way in an import hook because an import hook can't get access to the token stream. And of course my hack for simulating user-defined literals relies on a workaround to fake hooking the token stream; it would be a whole lot harder without that, while it would be a little easier and a whole lot cleaner if I could just hook the token stream. > This means that when you want to change parsing, you have to change: the grammar, the AST library, the validation library, and Python's exposed parsing module. And the code generation and other post-validation steps (unless you're just trying to create a do-nothing construct--which can be useful to give you something new to, e.g., feed to MacroPy, but it's not something you're going to check in to core, or use in your own production code). So yes, changing the grammar is painful. Which is just one more reason that being able to hack on Python without having to hack on Python is very useful. And as of 3.4, all of the pieces are there to do that, and dead-easy to use, and robust enough for production code--as long as the level you want to hack on is source text, AST, or bytecode, not token stream. > Modern parsers do not separate the grammar from tokenizing, parsing, and validation. All of these are done in one place, which not only simplifies changes to the grammar, but also protects you from possible inconsistencies. It was really hard for me when I was making changes to the parser to keep my conception of these four things synchronized. > > So in my opinion, if you're going to modernize the parsing, then put it all together into one simple library that deals with all of it. It seems like what you're suggesting would add complexity, whereas a merged solution would simplify the code. Rewriting the entire parsing mechanism from scratch might simplify things, but it also means rewriting the entire parsing mechanism from scratch. I'm sure you could implement a GLR parser generator that takes a complete declarative grammar and generates something that goes right from source code to a SAX- or iterparse-style pre-validated AST, and that would be a really cool thing. But besides being a lot of work, it would also be a huge amount of risk. You'd almost certainly end up with new bugs, new places where syntax errors are harder to diagnose, new places where compiling is slower than it used to be, etc. Also, Python is defined as hacking a separate lexical analysis phase, and a module named tokenize that does the same thing as this phase, and tests that test it, and so on. So, you can't just throw all that out and remain backward compatible. Meanwhile, adding functions to create a token state struct out of a Python iterable, drive it, and expose that functionality to Python is a lot less work, very unlikely to have any effect on the existing default mechanism (if you don't hook the token stream, the existing code runs the same as today, except for an if check inside the next-token function), and much easier to make exactly compatible with existing behavior even when you do hook the token stream (if creating and driving the token state works, all the other code is the same as it ever was). And it has no backward compat implications. > If it's hard to write a fast parser, then consider writing a parser generator in Python that generates the C code you want. It's not that it's hard to write a _fast_ parser in Python, but that it's hard to write a parser that does the exact same thing as Python's own parser (and even more so one that does the exact same thing as whatever version of Python you're running under's parser). From ncoghlan at gmail.com Sat Jun 6 01:31:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Jun 2015 09:31:09 +1000 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <1652CE64-D5EF-433E-9215-F7F82DF4490C@yahoo.com> Message-ID: On 6 Jun 2015 01:45, "Andrew Barnert" wrote: > > On Jun 5, 2015, at 05:18, Paul Moore wrote: > > > >> On 5 June 2015 at 00:03, Andrew Barnert wrote: > >> If it's meant to be a "compile-time decimal value"... What kind of value is that? What ends up in your co_consts? An instance of decimal.Decimal? How does that get marshaled? > > > > Well, Python bytecode has no way of holding any form of constant > > Decimal value, so if that's what you want you need a change to the > > bytecode (and hence the interperter). I'm not sure how that qualifies > > as "user-defined". > > That's the point I was making. Nick proposed this syntax in reply to a message where I said that being a compile-time value is both irrelevant and impossible, so I thought he was claiming that this syntax somehow solved that problem where mine didn't. I was mainly replying to Paul's angle bracket syntax proposal, not specifically to anything you proposed. The problem I have with your original suggestion is purely syntactic - I don't *want* user-defined syntax to look like language-defined syntax, because it makes it too hard for folks to know where to look things up, and I especially don't want a suffix like "j" to mean "this is a complex literal" while "k" means "this is a different way of spelling a normal function call that accepts a single string argument". I didn't say anything about my preferred syntactic idea *only* being usable for a compile time construct, I just only consider it *interesting* if there's a compile time AST transformation component, as that lets the hook parse a string and break it down into its component parts to make it transparent to the compiler, including giving it the ability to influence the compiler's symbol table construction pass. That extra power above and beyond a normal function call is what would give the construct its rationale for requesting new syntax - it would be a genuinely new capability to integrate "not Python code" with the Python compilation toolchain, rather than an alternate spelling for existing features. I've also been pondering the idea of how you'd notify the compiler of such hooks, since I agree you'd want them declared inline in the module that used them. For that, I think the idea of a "bang import" construct might work, where a module level line of the form "from x import !y" would not only be a normal runtime import of "y", but also allow "!y(implicitly quoted input)" as a compile time construct. There'd still be some tricky questions to resolve from a pragmatic perspective, as you'd likely need a way for the bang import to make additional runtime data available to the rendered AST produced by the bang calls, without polluting the module global namespace, but it might suffice to pass in a cell reference that is then populated at runtime by the bang import step. > > We seem to be talking at cross purposes here. The questions you're > > asking are ones I would direct at you (assuming it's you that's after > > a compile-time value, I'm completely lost as to who is arguing for > > what any more :-() That confusion is likely at least partly my fault - while this thread provided the name, the bang call concept is one I've been pondering in various forms (most coherently with some of the folks at SciPy last year) since the last time we discussed switch statements (and the related "once" statement), and it goes far beyond just defining pseudo-literals. I brought it up here, because *as a side-effect*, it would provide pseudo-literals by way of compile time constructs that didn't have any variable references in the generated AST (other than constructor references). > (Of course Python doesn't quite have _no_ compile-time computation; it has optional constant folding. But if you try to build on top of that without biting the bullet and just declaring the whole language accessible at compile time, you end up with the mess that was C++03, where compile-time code is slow, clumsy, and completely different from runtime code, which is a large part of why we have C++11, and also why we have D and various other languages. I don't think Python should add _anything_ new at compile time. You can always simulate compile time with import time, where the full language is available, so there's no compelling reason to make the same mistake C++ did.) Updated with the bang import idea to complement the bang calls, my vague notion would actually involve adding two pieces: * a compile time hook that lets you influence both the symbol table pass and the AST generation pass (bang import & bang call working together) * an import time hook that lets you reliably provide required data (like references to type constructors and other functions) to the AST generated in step 1 (probably through bang import populating a cell made available to the corresponding bang call invocations) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Jun 5 22:40:04 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 5 Jun 2015 13:40:04 -0700 (PDT) Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <9d5656d9-65fb-4bdf-812a-98cbcdf10bee@googlegroups.com> While we're at it, we can also fix "(1if 0 else 2)" :) On Friday, June 5, 2015 at 4:38:27 PM UTC-4, Neil Girdhar wrote: > > Actually CPython has another step between the AST and the bytecode, which > validates the AST to block out trees that violate various rules that were > not easily incorporated into the LL(1) grammar. This means that when you > want to change parsing, you have to change: the grammar, the AST library, > the validation library, and Python's exposed parsing module. > > Modern parsers do not separate the grammar from tokenizing, parsing, and > validation. All of these are done in one place, which not only simplifies > changes to the grammar, but also protects you from possible > inconsistencies. It was really hard for me when I was making changes to > the parser to keep my conception of these four things synchronized. > > So in my opinion, if you're going to modernize the parsing, then put it > all together into one simple library that deals with all of it. It seems > like what you're suggesting would add complexity, whereas a merged solution > would simplify the code. If it's hard to write a fast parser, then > consider writing a parser generator in Python that generates the C code you > want. > > Best, > > Neil > > On Friday, June 5, 2015 at 5:30:23 AM UTC-4, Andrew Barnert via > Python-ideas wrote: >> >> Compiling a module has four steps: >> >> * bytes->str (based on encoding declaration or default) >> * str->token stream >> * token stream->AST >> * AST->bytecode >> >> You can very easily hook at every point in that process except the token >> stream. >> >> There _is_ a workaround: re-encode the text to bytes, wrap it in a >> BytesIO, call tokenize, munge the token stream, call untokenize, re-decode >> back to text, then pass that to compile or ast.parse. But, besides being a >> bit verbose and painful, that means your line and column numbers get >> screwed up. So, while its fine for a quick&dirty toy like my >> user-literal-hack, it's not something you'd want to do in a real import >> hook for use in real code. >> >> This could be solved by just changing ast.parse to accept an iterable of >> tokens or tuples as well as a string, and likewise for compile. >> >> That isn't exactly a trivial change, because under the covers the _ast >> module is written in C, partly auto-generated, and expects as input a CST, >> which is itself created from a different tokenizer written in C with an >> similar but different API (since C doesn't have iterators). And adding a >> PyTokenizer_FromIterable or something seems like it might raise some fun >> bootstrapping issues that I haven't thought through yet. But I think it >> ought to be doable without having to reimplement the whole parser in pure >> Python. And I think it would be worth doing. >> >> While we're at it, a few other (much smaller) changes would be nice: >> >> * Allow tokenize to take a text file instead of making it take a binary >> file and repeat the encoding detection. >> * Allow tokenize to take a file instead of its readline method. >> * Allow tokenize to take a str/bytes instead of requiring a file. >> * Add flags to compile to stop at any stage (decoded text, tokens, AST, >> or bytecode) instead of just the last two. >> >> (The funny thing is that the C tokenizer actually already does support >> strings and bytes and file objects.) >> >> I realize that doing all of these changes would mean that compile can now >> get an iterable and not know whether it's a file or a token stream until it >> tries to iterate it. So maybe that isn't the best API; maybe it's better to >> explicitly call tokenize, then ast.parse, then compile instead of calling >> compile repeatedly with different flags. >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 6 02:28:32 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Jun 2015 17:28:32 -0700 Subject: [Python-ideas] User-defined literals In-Reply-To: <7DDDD203-8B93-4691-9524-B6191C2EBD25@yahoo.com> References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <7DDDD203-8B93-4691-9524-B6191C2EBD25@yahoo.com> Message-ID: On Thu, Jun 4, 2015 at 4:20 PM, Andrew Barnert wrote: > On Jun 4, 2015, at 14:05, Guido van Rossum wrote: > > OK, you can attribute that to lousy docs. The intention is that builtin > types are immutable. > > > I can go file bugs against those other implementations, but first, what's > the rationale? > > The ABC PEP, the numbers PEP discussion, and the type/class unification > tutorial all use the same reason: In CPython, different interpreters in the > same memory space (as with mod_python) share the same built-in types. From > the numbers discussion, it sounds like this was the only reason to reject > the idea of just patching float.__bases__. > > But most other Python implementations don't have process-wide globals like > that to worry about; patching int in one interpreter can't possibly affect > any other interpreter. > > "Because CPython can't do it, nobody else should do it, to keep code > portable" might be a good enough rationale for something this fundamental, > but if that's not the one you're thinking of, I don't want to put those > words in your mouth. > Why do you need a better rationale? The builtins are shared between all modules in a way that other things aren't. Nothing good can come from officially recognizing the ability to monkey-patch the builtin types -- it would just lead to paranoia amongst library developers. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Jun 6 02:50:56 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 05 Jun 2015 19:50:56 -0500 Subject: [Python-ideas] User-defined literals In-Reply-To: References: <20150603025206.GA1325@ando.pearwood.info> <204B3069-240B-4DF1-B5EF-93176859505C@yahoo.com> <20150604120834.GC1325@ando.pearwood.info> <064B3A93-88F4-44E3-B7B4-62909AF24A93@yahoo.com> <7DDDD203-8B93-4691-9524-B6191C2EBD25@yahoo.com> Message-ID: <885999E5-6331-4555-ACA4-EE9A40A4F782@gmail.com> On June 5, 2015 7:28:32 PM CDT, Guido van Rossum wrote: >On Thu, Jun 4, 2015 at 4:20 PM, Andrew Barnert >wrote: > >> On Jun 4, 2015, at 14:05, Guido van Rossum wrote: >> >> OK, you can attribute that to lousy docs. The intention is that >builtin >> types are immutable. >> >> >> I can go file bugs against those other implementations, but first, >what's >> the rationale? >> >> The ABC PEP, the numbers PEP discussion, and the type/class >unification >> tutorial all use the same reason: In CPython, different interpreters >in the >> same memory space (as with mod_python) share the same built-in types. >From >> the numbers discussion, it sounds like this was the only reason to >reject >> the idea of just patching float.__bases__. >> >> But most other Python implementations don't have process-wide globals >like >> that to worry about; patching int in one interpreter can't possibly >affect >> any other interpreter. >> >> "Because CPython can't do it, nobody else should do it, to keep code >> portable" might be a good enough rationale for something this >fundamental, >> but if that's not the one you're thinking of, I don't want to put >those >> words in your mouth. >> > >Why do you need a better rationale? > >The builtins are shared between all modules in a way that other things >aren't. Nothing good can come from officially recognizing the ability >to >monkey-patch the builtin types -- it would just lead to paranoia >amongst >library developers. Like javascript:void hacks to avoid undefined being re-defined. -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From mistersheik at gmail.com Sat Jun 6 04:08:03 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 5 Jun 2015 22:08:03 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <58B125C2-CE07-4AAB-B618-6F6EEBBD0E14@yahoo.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <58B125C2-CE07-4AAB-B618-6F6EEBBD0E14@yahoo.com> Message-ID: On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert wrote: > On Jun 5, 2015, at 13:38, Neil Girdhar wrote: > > > > Actually CPython has another step between the AST and the bytecode, > which validates the AST to block out trees that violate various rules that > were not easily incorporated into the LL(1) grammar. > > Yes, and it also builds CST nodes before building the AST, and there's a > step after AST validation and before bytecode generation where the symbol > table for the scope is built and LEG rules are applies. But none of those > are things that seem particularly useful to hook. Maybe that's just a > failure of imagination, but I've never wanted to do it. > > Hooking the token stream, on the other hand, has pretty obvious uses. For > example, in the user-defined literal thread, Paul Moore suggested that for > Nick Coghlan's "compile-time expression" idea, requiring valid Python > syntax would be way too restrictive, but requiring valid Python tokens is > probably OK, and it automatically solves the quoting problem, and it would > usually be easier than parsing text. I think he's right about all three > parts of that, but unfortunately you can't implement it that way in an > import hook because an import hook can't get access to the token stream. > > And of course my hack for simulating user-defined literals relies on a > workaround to fake hooking the token stream; it would be a whole lot harder > without that, while it would be a little easier and a whole lot cleaner if > I could just hook the token stream. > Yes, I think I understand your motivation. Can you help me understand the what the hook you would write would look like? > > > This means that when you want to change parsing, you have to change: the > grammar, the AST library, the validation library, and Python's exposed > parsing module. > > And the code generation and other post-validation steps (unless you're > just trying to create a do-nothing construct--which can be useful to give > you something new to, e.g., feed to MacroPy, but it's not something you're > going to check in to core, or use in your own production code). > > So yes, changing the grammar is painful. Which is just one more reason > that being able to hack on Python without having to hack on Python is very > useful. And as of 3.4, all of the pieces are there to do that, and > dead-easy to use, and robust enough for production code--as long as the > level you want to hack on is source text, AST, or bytecode, not token > stream. > Yes. > > > Modern parsers do not separate the grammar from tokenizing, parsing, and > validation. All of these are done in one place, which not only simplifies > changes to the grammar, but also protects you from possible > inconsistencies. It was really hard for me when I was making changes to > the parser to keep my conception of these four things synchronized. > > > > So in my opinion, if you're going to modernize the parsing, then put it > all together into one simple library that deals with all of it. It seems > like what you're suggesting would add complexity, whereas a merged solution > would simplify the code. > > Rewriting the entire parsing mechanism from scratch might simplify things, > but it also means rewriting the entire parsing mechanism from scratch. I'm > sure you could implement a GLR parser generator that takes a complete > declarative grammar and generates something that goes right from source > code to a SAX- or iterparse-style pre-validated AST, and that would be a > really cool thing. But besides being a lot of work, it would also be a huge > amount of risk. You'd almost certainly end up with new bugs, new places > where syntax errors are harder to diagnose, new places where compiling is > slower than it used to be, etc. > Also, Python is defined as hacking a separate lexical analysis phase, and > a module named tokenize that does the same thing as this phase, and tests > that test it, and so on. So, you can't just throw all that out and remain > backward compatible. > I don't see why that is. The lexical "phase" would just become new parsing rules, and so it would be supplanted by the parser. Then you wouldn't need to add special hooks for lexing. You would merely have hooks for parsing. > > Meanwhile, adding functions to create a token state struct out of a Python > iterable, drive it, and expose that functionality to Python is a lot less > work, very unlikely to have any effect on the existing default mechanism > (if you don't hook the token stream, the existing code runs the same as > today, except for an if check inside the next-token function), and much > easier to make exactly compatible with existing behavior even when you do > hook the token stream (if creating and driving the token state works, all > the other code is the same as it ever was). And it has no backward compat > implications. > > > If it's hard to write a fast parser, then consider writing a parser > generator in Python that generates the C code you want. > > It's not that it's hard to write a _fast_ parser in Python, but that it's > hard to write a parser that does the exact same thing as Python's own > parser (and even more so one that does the exact same thing as whatever > version of Python you're running under's parser). I think it's worth exploring having the whole parser in one place, rather than repeating the same structures in at least four places. With every change to the Python grammar, you pay for this forced repetition. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Jun 6 04:21:08 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 5 Jun 2015 22:21:08 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: Back in the day, I remember Lex and Yacc, then came Flex and Bison, and then ANTLR, which unified lexing and parsing under one common language. In general, I like the idea of putting everything together. I think that because of Python's separation of lexing and parsing, it accepts weird text like "(1if 0else 2)", which is crazy. Here's what I think I want in a parser: Along with the grammar, you also give it code that it can execute as it matches each symbol in a rule. In Python for example, as it matches each argument passed to a function, it would keep track of the count of *args, **kwargs, and keyword arguments, and regular arguments, and then raise a syntax error if it encounters anything out of order. Right now that check is done in validate.c, which is really annoying. I want to specify the lexical rules in the same way that I specify the parsing rules. And I think (after Andrew elucidates what he means by hooks) I want the parsing hooks to be the same thing as lexing hooks, and I agree with him that hooking into the lexer is useful. I want the parser module to be automatically-generated from the grammar if that's possible (I think it is). Typically each grammar rule is implemented using a class. I want the code generation to be a method on that class. This makes changing the AST easy. For example, it was suggested that we might change the grammar to include a starstar_expr node. This should be an easy change, but because of the way every node validates its children, which it expects to have a certain tree structure, it would be a big task with almost no payoff. There's also a question of which parsing algorithm you use. I wish I knew more about the state-of-art parsers. I was interested because I wanted to use Python to parse my LaTeX files. I got the impression that https://en.wikipedia.org/wiki/Earley_parser were state of the art, but I'm not sure. I'm curious what other people will contribute to this discussion as I think having no great parsing library is a huge hole in Python. Having one would definitely allow me to write better utilities using Python. On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho wrote: > On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar > wrote: > > Modern parsers do not separate the grammar from tokenizing, parsing, and > > validation. All of these are done in one place, which not only > simplifies > > changes to the grammar, but also protects you from possible > inconsistencies. > > Hi, Neil, thanks for that! > > Having studied only ancient parsers, I'd love to learn new ones. Can > you please post references to modern parsing? Actual parsers, books, > papers, anything you may find valuable. > > I have I hunch you're talking about PEG parsers, but maybe something > else, or besides? > > Thanks! > > Best, > > Luciano > > -- > Luciano Ramalho > | Author of Fluent Python (O'Reilly, 2015) > | http://shop.oreilly.com/product/0636920032519.do > | Professor em: http://python.pro.br > | Twitter: @ramalhoorg > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Jun 6 04:55:20 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 05 Jun 2015 21:55:20 -0500 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> IMO, lexer and parser separation is sometimes great. It also makes hand-written parsers much simpler. "Modern" parsing with no lexer and EBNF can sometimes be slower than the classics, especially if one is using an ultra-fast lexer generator such as re2c. On June 5, 2015 9:21:08 PM CDT, Neil Girdhar wrote: >Back in the day, I remember Lex and Yacc, then came Flex and Bison, and >then ANTLR, which unified lexing and parsing under one common language. > In >general, I like the idea of putting everything together. I think that >because of Python's separation of lexing and parsing, it accepts weird >text >like "(1if 0else 2)", which is crazy. > >Here's what I think I want in a parser: > >Along with the grammar, you also give it code that it can execute as it >matches each symbol in a rule. In Python for example, as it matches >each >argument passed to a function, it would keep track of the count of >*args, >**kwargs, and keyword arguments, and regular arguments, and then raise >a >syntax error if it encounters anything out of order. Right now that >check >is done in validate.c, which is really annoying. > >I want to specify the lexical rules in the same way that I specify the >parsing rules. And I think (after Andrew elucidates what he means by >hooks) I want the parsing hooks to be the same thing as lexing hooks, >and I >agree with him that hooking into the lexer is useful. > >I want the parser module to be automatically-generated from the grammar >if >that's possible (I think it is). > >Typically each grammar rule is implemented using a class. I want the >code >generation to be a method on that class. This makes changing the AST >easy. For example, it was suggested that we might change the grammar >to >include a starstar_expr node. This should be an easy change, but >because >of the way every node validates its children, which it expects to have >a >certain tree structure, it would be a big task with almost no payoff. > >There's also a question of which parsing algorithm you use. I wish I >knew >more about the state-of-art parsers. I was interested because I wanted >to >use Python to parse my LaTeX files. I got the impression that >https://en.wikipedia.org/wiki/Earley_parser were state of the art, but >I'm >not sure. > >I'm curious what other people will contribute to this discussion as I >think >having no great parsing library is a huge hole in Python. Having one >would >definitely allow me to write better utilities using Python. > > >On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho >wrote: > >> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar >> wrote: >> > Modern parsers do not separate the grammar from tokenizing, >parsing, and >> > validation. All of these are done in one place, which not only >> simplifies >> > changes to the grammar, but also protects you from possible >> inconsistencies. >> >> Hi, Neil, thanks for that! >> >> Having studied only ancient parsers, I'd love to learn new ones. Can >> you please post references to modern parsing? Actual parsers, books, >> papers, anything you may find valuable. >> >> I have I hunch you're talking about PEG parsers, but maybe something >> else, or besides? >> >> Thanks! >> >> Best, >> >> Luciano >> >> -- >> Luciano Ramalho >> | Author of Fluent Python (O'Reilly, 2015) >> | http://shop.oreilly.com/product/0636920032519.do >> | Professor em: http://python.pro.br >> | Twitter: @ramalhoorg >> > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Jun 6 04:57:54 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 5 Jun 2015 22:57:54 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> Message-ID: I don't see why it makes anything simpler. Your lexing rules just live alongside your parsing rules. And I also don't see why it has to be faster to do the lexing in a separate part of the code. Wouldn't the parser generator realize that that some of the rules don't use the stack and so they would end up just as fast as any lexer? On Fri, Jun 5, 2015 at 10:55 PM, Ryan Gonzalez wrote: > IMO, lexer and parser separation is sometimes great. It also makes > hand-written parsers much simpler. > > "Modern" parsing with no lexer and EBNF can sometimes be slower than the > classics, especially if one is using an ultra-fast lexer generator such as > re2c. > > > On June 5, 2015 9:21:08 PM CDT, Neil Girdhar > wrote: > >> Back in the day, I remember Lex and Yacc, then came Flex and Bison, and >> then ANTLR, which unified lexing and parsing under one common language. In >> general, I like the idea of putting everything together. I think that >> because of Python's separation of lexing and parsing, it accepts weird text >> like "(1if 0else 2)", which is crazy. >> >> Here's what I think I want in a parser: >> >> Along with the grammar, you also give it code that it can execute as it >> matches each symbol in a rule. In Python for example, as it matches each >> argument passed to a function, it would keep track of the count of *args, >> **kwargs, and keyword arguments, and regular arguments, and then raise a >> syntax error if it encounters anything out of order. Right now that check >> is done in validate.c, which is really annoying. >> >> I want to specify the lexical rules in the same way that I specify the >> parsing rules. And I think (after Andrew elucidates what he means by >> hooks) I want the parsing hooks to be the same thing as lexing hooks, and I >> agree with him that hooking into the lexer is useful. >> >> I want the parser module to be automatically-generated from the grammar >> if that's possible (I think it is). >> >> Typically each grammar rule is implemented using a class. I want the >> code generation to be a method on that class. This makes changing the AST >> easy. For example, it was suggested that we might change the grammar to >> include a starstar_expr node. This should be an easy change, but because >> of the way every node validates its children, which it expects to have a >> certain tree structure, it would be a big task with almost no payoff. >> >> There's also a question of which parsing algorithm you use. I wish I >> knew more about the state-of-art parsers. I was interested because I >> wanted to use Python to parse my LaTeX files. I got the impression that >> https://en.wikipedia.org/wiki/Earley_parser were state of the art, but >> I'm not sure. >> >> I'm curious what other people will contribute to this discussion as I >> think having no great parsing library is a huge hole in Python. Having one >> would definitely allow me to write better utilities using Python. >> >> >> On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho >> wrote: >> >>> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar >>> wrote: >>> > Modern parsers do not separate the grammar from tokenizing, parsing, >>> and >>> > validation. All of these are done in one place, which not only >>> simplifies >>> > changes to the grammar, but also protects you from possible >>> inconsistencies. >>> >>> Hi, Neil, thanks for that! >>> >>> Having studied only ancient parsers, I'd love to learn new ones. Can >>> you please post references to modern parsing? Actual parsers, books, >>> papers, anything you may find valuable. >>> >>> I have I hunch you're talking about PEG parsers, but maybe something >>> else, or besides? >>> >>> Thanks! >>> >>> Best, >>> >>> Luciano >>> >>> -- >>> Luciano Ramalho >>> | Author of Fluent Python (O'Reilly, 2015) >>> | http://shop.oreilly.com/product/0636920032519.do >>> | Professor em: http://python.pro.br >>> | Twitter: @ramalhoorg >>> >> >> ------------------------------ >> >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Sat Jun 6 05:24:25 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Fri, 05 Jun 2015 23:24:25 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <1433561065.1468319.288247105.251AAED4@webmail.messagingengine.com> On Fri, Jun 5, 2015, at 22:21, Neil Girdhar wrote: > Back in the day, I remember Lex and Yacc, then came Flex and Bison, and > then ANTLR, which unified lexing and parsing under one common language. > In > general, I like the idea of putting everything together. I think that > because of Python's separation of lexing and parsing, it accepts weird > text > like "(1if 0else 2)", which is crazy. I don't think this really has anything to do with separation of lexing and parsing. C rejects this (where "this" is "integer followed by arbitrary alphabetic token") purely due to the lexing stage (specifically, 1if or 0else would be a single "preprocessor number" token, with no valid meaning. Of course, this has its own quirks, for example 0xE+1 is invalid in C.) From guido at python.org Sat Jun 6 06:27:14 2015 From: guido at python.org (Guido van Rossum) Date: Fri, 5 Jun 2015 21:27:14 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> Message-ID: On Fri, Jun 5, 2015 at 7:57 PM, Neil Girdhar wrote: > I don't see why it makes anything simpler. Your lexing rules just live > alongside your parsing rules. And I also don't see why it has to be faster > to do the lexing in a separate part of the code. Wouldn't the parser > generator realize that that some of the rules don't use the stack and so > they would end up just as fast as any lexer? > You're putting a lot of faith in "modern" parsers. I don't know if PLY qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies the lexer and parser. However I don't think it would be much better for a language the size of Python. We are using PLY at Dropbox to parse a medium-sized DSL, and while at the beginning it was convenient to have the entire language definition in one place, there were a fair number of subtle bugs in the earlier stages of the project due to the mixing of lexing and parsing. In order to get this right it seems you actually have to *think* about the lexing and parsing stages differently, and combining them in one tool doesn't actually help you to think more clearly. Also, this approach doesn't really do much for the later stages -- you can easily construct a parse tree but it's a fairly direct representation of the grammar rules, and it offers no help in managing a symbol table or generating code. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 6 07:04:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Jun 2015 15:04:00 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 6 June 2015 at 15:00, Nick Coghlan wrote: > On 6 June 2015 at 12:21, Neil Girdhar wrote: >> I'm curious what other people will contribute to this discussion as I think >> having no great parsing library is a huge hole in Python. Having one would >> definitely allow me to write better utilities using Python. > > The design of *Python's* grammar is deliberately restricted to being > parsable with an LL(1) parser. There are a great many static analysis > and syntax highlighting tools that are able to take advantage of that > simplicity because they only care about the syntax, not the full > semantics. > > Anyone actually doing their *own* parsing of something else *in* > Python, would be better advised to reach for PLY > (https://pypi.python.org/pypi/ply ). PLY is the parser underlying > https://pypi.python.org/pypi/pycparser, and hence the highly regarded > CFFI library, https://pypi.python.org/pypi/cffi For the later stages of the pipeline (i.e. AST -> code generation), CPython now uses Eli Bendersky's asdl_parser: https://github.com/eliben/asdl_parser More background on that: http://eli.thegreenplace.net/2014/06/04/using-asdl-to-describe-asts-in-compilers Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jun 6 07:00:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Jun 2015 15:00:03 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 6 June 2015 at 12:21, Neil Girdhar wrote: > I'm curious what other people will contribute to this discussion as I think > having no great parsing library is a huge hole in Python. Having one would > definitely allow me to write better utilities using Python. The design of *Python's* grammar is deliberately restricted to being parsable with an LL(1) parser. There are a great many static analysis and syntax highlighting tools that are able to take advantage of that simplicity because they only care about the syntax, not the full semantics. Anyone actually doing their *own* parsing of something else *in* Python, would be better advised to reach for PLY (https://pypi.python.org/pypi/ply ). PLY is the parser underlying https://pypi.python.org/pypi/pycparser, and hence the highly regarded CFFI library, https://pypi.python.org/pypi/cffi Other notable parsing alternatives folks may want to look at include https://pypi.python.org/pypi/lrparsing and http://pythonhosted.org/pyparsing/ (both of which allow you to use Python code to define your grammar, rather than having to learn a formal grammar notation). Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Sat Jun 6 07:29:21 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 01:29:21 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan wrote: > On 6 June 2015 at 12:21, Neil Girdhar wrote: > > I'm curious what other people will contribute to this discussion as I > think > > having no great parsing library is a huge hole in Python. Having one > would > > definitely allow me to write better utilities using Python. > > The design of *Python's* grammar is deliberately restricted to being > parsable with an LL(1) parser. There are a great many static analysis > and syntax highlighting tools that are able to take advantage of that > simplicity because they only care about the syntax, not the full > semantics. > Given the validation that happens, it's not actually LL(1) though. It's mostly LL(1) with some syntax errors that are raised for various illegal constructs. Anyway, no one is suggesting changing the grammar. > Anyone actually doing their *own* parsing of something else *in* > Python, would be better advised to reach for PLY > (https://pypi.python.org/pypi/ply ). PLY is the parser underlying > https://pypi.python.org/pypi/pycparser, and hence the highly regarded > CFFI library, https://pypi.python.org/pypi/cffi > > Other notable parsing alternatives folks may want to look at include > https://pypi.python.org/pypi/lrparsing and > http://pythonhosted.org/pyparsing/ (both of which allow you to use > Python code to define your grammar, rather than having to learn a > formal grammar notation). > > I looked at ply and pyparsing, but it was impossible to simply parse LaTeX because I couldn't explain to suck up the right number of arguments given the name of the function. When it sees a function, it learns how many arguments that function needs. When it sees a function call \a{1}{2}{3}, if "\a" takes 2 arguments, then it should only suck up 1 and 2 as arguments, and leave 3 as a regular text token. In other words, I should be able to tell the parser what to expect in code that lives on the rule edges. The parsing tools you listed work really well until you need to do something like (1) the validation step that happens in Python, or (2) figuring out exactly where the syntax error is (line and column number) or (3) ensuring that whitespace separates some tokens even when it's not required to disambiguate different parse trees. I got the impression that they wanted to make these languages simple for the simple cases, but they were made too simple and don't allow you to do everything in one simple pass. Best, Neil > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sat Jun 6 07:30:58 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jun 2015 05:30:58 +0000 (UTC) Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: Message-ID: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue. On Friday, June 5, 2015 7:08 PM, Neil Girdhar wrote: On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert wrote: > >On Jun 5, 2015, at 13:38, Neil Girdhar wrote: >>Hooking the token stream, on the other hand, has pretty obvious uses. For example, in the user-defined literal thread, Paul Moore suggested that for Nick Coghlan's "compile-time expression" idea, requiring valid Python syntax would be way too restrictive, but requiring valid Python tokens is probably OK, and it automatically solves the quoting problem, and it would usually be easier than parsing text. I think he's right about all three parts of that, but unfortunately you can't implement it that way in an import hook because an import hook can't get access to the token stream. >> >>And of course my hack for simulating user-defined literals relies on a workaround to fake hooking the token stream; it would be a whole lot harder without that, while it would be a little easier and a whole lot cleaner if I could just hook the token stream. > >Yes, I think I understand your motivation. Can you help me understand the what the hook you would write would look like? That's a fair question. First, let's look at the relevant parts of an AST hook that transforms every float literal into a Decimal constructor call: class FloatNodeWrapper(ast.NodeTransformer): def visit_Num(self, node): if isinstance(node.n, float): return ast.Call(func=ast.Name(id='Decimal', ctx=ast.Load()), args=[ast.Str(s=repr(node.n))], keywords=[]) return node # ... def source_to_code(self, data, path, *, _optimize=-1): source = importlib.decode_source(data) tree = ast.parse(source) tree = FloatNodeWrapper().visit(tree) ast.fix_missing_locations(tree) return compile(tree, path, 'exec', dont_inherit=True, optimize=_optimize) Now, here's what I'd like to write for a token hook that does the same thing at the token level: def retokenize_float(tokens): for num, val, *loc in tokens: if num == tokenize.NUMBER and ('.' in val or 'e' in val or 'E' in val): yield tokenize.NAME, 'Decimal', *loc yield tokenize.OP, '(', *loc yield tokenize.STRING, repr(val), *loc yield tokenize.OP, ')', *loc else: yield num, val, *loc # ... def source_to_code(self, data, path, *, _optimize=-1): source = importlib.decode_source(data) tokens = tokenize.tokenize(source) tokens = retokenize(tokens) return compile(tokens, path, 'exec', dont_inherit=True, optimize=_optimize) Of course I don't want to do the same thing, I want to do something that you can't do at the AST level?see my user literal hack for an example. But this shows the parallels and differences between the two. If you want more background, see http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html (which I wrote to explain to someone else how floatliteralhack works). Of course I'm not presenting this as an ideal design if I were starting Python from scratch, but as the best design given what Python is already producing and consuming (a stream of tokens that's guaranteed to be equivalent to what you get out of the tokenize module). >>Also, Python is defined as hacking a separate lexical analysis phase, and a module named tokenize that does the same thing as this phase, and tests that test it, and so on. So, you can't just throw all that out and remain backward compatible. > >I don't see why that is. The lexical "phase" would just become new parsing rules, and so it would be supplanted by the parser. Then you wouldn't need to add special hooks for lexing. You would merely have hooks for parsing. Callbacks from a tree builder and using a user-modifiable grammar are clearly not backward compatible with ast.NodeTransformer. They're a completely different way of doing things. Is it a better way? Maybe. Plenty of people are using OMeta/JS every day. Of course plenty of people are cursing the fact that OMeta/JS sometimes generates exponential-time backtracking and it's never clear which part of your production rules are to blame, or the fact that you can't get a useful error message out of it, etc. And I'm pretty sure you could design something with most of the strengths of OMeta without its weaknesses (just using a standard packrat PEG parser instead of an extended PEG parser seems like it would turn most of the exponential productions into explicit errors in the grammar?). Or you could go the opposite way and use GLR and bottom-up callbacks instead of PEG and top-down. Something like that would be a great idea for a research project. But it's probably not a great idea for a proposal to change the core of CPython, at least until someone does that research project. >>It's not that it's hard to write a _fast_ parser in Python, but that it's hard to write a parser that does the exact same thing as Python's own parser (and even more so one that does the exact same thing as whatever version of Python you're running under's parser). > >I think it's worth exploring having the whole parser in one place, rather than repeating the same structures in at least four places. With every change to the Python grammar, you pay for this forced repetition. > The repetition is really a different issue. A different implementation of the same basic design Python already has could make it so you only have to write explicit code for the 3 places the CST->AST node doesn't follow the same rules as everywhere else and the dozen or so places where the AST has to be post-validated, instead of having to write explicit code for both sides of every single node type. And that kind of cleanup could be done without breaking backward compatibility, because the interfaces on each side of the code would be unchanged. But that's also a lot less fun of a change than writing a whole new parser, so I wouldn't be surprised if nobody ever did it? From mistersheik at gmail.com Sat Jun 6 07:50:28 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 01:50:28 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert wrote: > First, I think your idea is almost completely tangential to mine. Yes, if > you completely replaced both the interface and the implementation of the > parser, you could do just about anything you wanted. But assuming nobody is > going to completely replace the way Python does parsing today, I think it's > still useful to add the one missing useful hook to the existing system. But > let's continue. > > On Friday, June 5, 2015 7:08 PM, Neil Girdhar > wrote: > On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert wrote: > > > >On Jun 5, 2015, at 13:38, Neil Girdhar wrote: > > > >>Hooking the token stream, on the other hand, has pretty obvious uses. > For example, in the user-defined literal thread, Paul Moore suggested that > for Nick Coghlan's "compile-time expression" idea, requiring valid Python > syntax would be way too restrictive, but requiring valid Python tokens is > probably OK, and it automatically solves the quoting problem, and it would > usually be easier than parsing text. I think he's right about all three > parts of that, but unfortunately you can't implement it that way in an > import hook because an import hook can't get access to the token stream. > >> > >>And of course my hack for simulating user-defined literals relies on a > workaround to fake hooking the token stream; it would be a whole lot harder > without that, while it would be a little easier and a whole lot cleaner if > I could just hook the token stream. > > > > >Yes, I think I understand your motivation. Can you help me understand > the what the hook you would write would look like? > > That's a fair question. > > First, let's look at the relevant parts of an AST hook that transforms > every float literal into a Decimal constructor call: > > class FloatNodeWrapper(ast.NodeTransformer): > def visit_Num(self, node): > if isinstance(node.n, float): > return ast.Call(func=ast.Name(id='Decimal', > ctx=ast.Load()), > args=[ast.Str(s=repr(node.n))], > keywords=[]) > return node > > > # ... > > def source_to_code(self, data, path, *, _optimize=-1): > source = importlib.decode_source(data) > tree = ast.parse(source) > tree = FloatNodeWrapper().visit(tree) > ast.fix_missing_locations(tree) > return compile(tree, path, 'exec', dont_inherit=True, > optimize=_optimize) > > > Now, here's what I'd like to write for a token hook that does the same > thing at the token level: > > def retokenize_float(tokens): > for num, val, *loc in tokens: > > if num == tokenize.NUMBER and ('.' in val or 'e' in val or 'E' > in val): > yield tokenize.NAME, 'Decimal', *loc > yield tokenize.OP, '(', *loc > > yield tokenize.STRING, repr(val), *loc > > yield tokenize.OP, ')', *loc > > else: > yield num, val, *loc > > # ... > > def source_to_code(self, data, path, *, _optimize=-1): > source = importlib.decode_source(data) > tokens = tokenize.tokenize(source) > tokens = retokenize(tokens) > return compile(tokens, path, 'exec', dont_inherit=True, > optimize=_optimize) > > > Of course I don't want to do the same thing, I want to do something that > you can't do at the AST level?see my user literal hack for an example. But > this shows the parallels and differences between the two. If you want more > background, see > > http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html > (which I wrote to explain to someone else how floatliteralhack works). > Yes. I want to point that if the lexer rules were alongside the parser, they would be generating ast nodes ? so the hook for calling Decimal for all floating point tokens would be doable in the same way as your AST hook. For the new tokens that you want, the ideal solution I think is to modify the python parsing grammar before it parses the text. > Of course I'm not presenting this as an ideal design if I were starting > Python from scratch, but as the best design given what Python is already > producing and consuming (a stream of tokens that's guaranteed to be > equivalent to what you get out of the tokenize module). > This is like saying "I want to change some things, but not other things". I want the best long-term solution, whatever that is. (I don't know what it is.) In the long run, moving towards the best solution tends to be the least total work. Specifically, if lexing hooks are implemented differently than parsing hooks, then that change is probably going to be backed out and replaced with the "ideal" ? eventually ? maybe in five years or ten or twenty. And when it's removed, there's deprecation periods and upgrade pains. At least let's explore what is the "ideal" solution? > > >>Also, Python is defined as hacking a separate lexical analysis phase, > and a module named tokenize that does the same thing as this phase, and > tests that test it, and so on. So, you can't just throw all that out and > remain backward compatible. > > > > >I don't see why that is. The lexical "phase" would just become new > parsing rules, and so it would be supplanted by the parser. Then you > wouldn't need to add special hooks for lexing. You would merely have hooks > for parsing. > > Callbacks from a tree builder and using a user-modifiable grammar are > clearly not backward compatible with ast.NodeTransformer. They're a > completely different way of doing things. > > Is it a better way? Maybe. Plenty of people are using OMeta/JS every day. > Of course plenty of people are cursing the fact that OMeta/JS sometimes > generates exponential-time backtracking and it's never clear which part of > your production rules are to blame, or the fact that you can't get a useful > error message out of it, etc. > I don't know about OMeta, but the Earley parsing algorithm is worst-cast cubic time "quadratic time for unambiguous grammars, and linear time for almost all LR(k) grammars". > > And I'm pretty sure you could design something with most of the strengths > of OMeta without its weaknesses (just using a standard packrat PEG parser > instead of an extended PEG parser seems like it would turn most of the > exponential productions into explicit errors in the grammar?). Or you could > go the opposite way and use GLR and bottom-up callbacks instead of PEG and > top-down. Something like that would be a great idea for a research project. > But it's probably not a great idea for a proposal to change the core of > CPython, at least until someone does that research project. > Yes, totally agree with you. So if it were me doing this work, I would put my energy in the research project to write an amazing parser in Python. And then I would try to convince the Python team to use that. I guess we don't disagree at all. > >>It's not that it's hard to write a _fast_ parser in Python, but that > it's hard to write a parser that does the exact same thing as Python's own > parser (and even more so one that does the exact same thing as whatever > version of Python you're running under's parser). > > > >I think it's worth exploring having the whole parser in one place, rather > than repeating the same structures in at least four places. With every > change to the Python grammar, you pay for this forced repetition. > > > > The repetition is really a different issue. A different implementation of > the same basic design Python already has could make it so you only have to > write explicit code for the 3 places the CST->AST node doesn't follow the > same rules as everywhere else and the dozen or so places where the AST has > to be post-validated, instead of having to write explicit code for both > sides of every single node type. And that kind of cleanup could be done > without breaking backward compatibility, because the interfaces on each > side of the code would be unchanged. But that's also a lot less fun of a > change than writing a whole new parser, so I wouldn't be surprised if > nobody ever did it? > Cool, I didn't know it was even possible. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Jun 6 08:04:51 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 02:04:51 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> Message-ID: On Sat, Jun 6, 2015 at 12:27 AM, Guido van Rossum wrote: > On Fri, Jun 5, 2015 at 7:57 PM, Neil Girdhar > wrote: > >> I don't see why it makes anything simpler. Your lexing rules just live >> alongside your parsing rules. And I also don't see why it has to be faster >> to do the lexing in a separate part of the code. Wouldn't the parser >> generator realize that that some of the rules don't use the stack and so >> they would end up just as fast as any lexer? >> > > You're putting a lot of faith in "modern" parsers. I don't know if PLY > qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies > the lexer and parser. However I don't think it would be much better for a > language the size of Python. > I agree with you. I think the problem might be that the parser that I'm dreaming doesn't exist for Python. In another message, I wrote what I wanted: ? Along with the grammar, you also give it code that it can execute as it matches each symbol in a rule. In Python for example, as it matches each argument passed to a function, it would keep track of the count of *args, **kwargs, and keyword arguments, and regular arguments, and then raise a syntax error if it encounters anything out of order. Right now that check is done in validate.c, which is really annoying. I want to specify the lexical rules in the same way that I specify the parsing rules. And I think (after Andrew elucidates what he means by hooks) I want the parsing hooks to be the same thing as lexing hooks, and I agree with him that hooking into the lexer is useful. I want the parser module to be automatically-generated from the grammar if that's possible (I think it is). Typically each grammar rule is implemented using a class. I want the code generation to be a method on that class. This makes changing the AST easy. For example, it was suggested that we might change the grammar to include a starstar_expr node. This should be an easy change, but because of the way every node validates its children, which it expects to have a certain tree structure, it would be a big task with almost no payoff. ? I don't think this is possible with Ply. > We are using PLY at Dropbox to parse a medium-sized DSL, and while at the > beginning it was convenient to have the entire language definition in one > place, there were a fair number of subtle bugs in the earlier stages of the > project due to the mixing of lexing and parsing. In order to get this right > it seems you actually have to *think* about the lexing and parsing stages > differently, and combining them in one tool doesn't actually help you to > think more clearly. > That's interesting. I can understand wanting to separate them mentally, but two problems with separating at a fundamental programmatic level are: (1) you may want to change a lexical token like number to ? in some cases ? be LL(1) for who knows what reason; or (2) you would have to implement lexical hooks differently than parsing hooks. In some of Andrew's code below, the tokenize hook loos so different than the parser hook, and I think that's unfortunate. > > Also, this approach doesn't really do much for the later stages -- you can > easily construct a parse tree but it's a fairly direct representation of > the grammar rules, and it offers no help in managing a symbol table or > generating code. > It would be nice to generate the code in methods on the classes that implement the grammar rules. This would allow you to use memos that were filled in as you were parsing and validating to generate code. > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jun 6 08:21:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 6 Jun 2015 16:21:14 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 6 June 2015 at 15:30, Andrew Barnert via Python-ideas wrote: > The repetition is really a different issue. A different implementation of the same basic design Python already has could make it so you only have to write explicit code for the 3 places the CST->AST node doesn't follow the same rules as everywhere else and the dozen or so places where the AST has to be post-validated, instead of having to write explicit code for both sides of every single node type. And that kind of cleanup could be done without breaking backward compatibility, because the interfaces on each side of the code would be unchanged. But that's also a lot less fun of a change than writing a whole new parser, so I wouldn't be surprised if nobody ever did it? Eugene Toder had a decent go at introducing more autogeneration into the code generation code a few years ago as part of building out an AST level optimiser: http://bugs.python.org/issue11549 The basic concepts Eugene introduced still seem sound to me, there'd just be some work in bringing the patches up to date to target 3.6. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Sat Jun 6 09:17:47 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jun 2015 00:17:47 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: > On Jun 5, 2015, at 22:50, Neil Girdhar wrote: > >> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert wrote: >> First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue. >> >> On Friday, June 5, 2015 7:08 PM, Neil Girdhar wrote: >> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert wrote: >> > >> If you want more background, see >> http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html (which I wrote to explain to someone else how floatliteralhack works). > > Yes. I want to point that if the lexer rules were alongside the parser, they would be generating ast nodes ? so the hook for calling Decimal for all floating point tokens would be doable in the same way as your AST hook. No. The way Python currently exposes things, the AST hook runs on an already-generated AST and transforms it into another one, to hand off to the code generator. That means it can only be used to handle things that parse as legal Python syntax (unless you replace the entire parser). What I want is a way to similarly take an already-generated token stream and transform it into another one, to hand off to the parser. That will allow it to be used to handle things that lex as legal Python tokens but don't parse as legal Python syntax, like what Paul suggested. Merging lexing into parsing not only doesn't give me that, it makes that impossible. > For the new tokens that you want, the ideal solution I think is to modify the python parsing grammar before it parses the text. But I don't want any new tokens. I just want to change the way existing tokens are interpreted. Just as with an AST hook like PyMacro, I don't want any new nodes, I just want to change the way existing nodes are interpreted. >> Of course I'm not presenting this as an ideal design if I were starting Python from scratch, but as the best design given what Python is already producing and consuming (a stream of tokens that's guaranteed to be equivalent to what you get out of the tokenize module). > > This is like saying "I want to change some things, but not other things". That's exactly what I'm saying. In particular, I want to change as few things as possible, to get what I want, without breaking stuff that already demonstrably works and has worked for decades. > I don't know about OMeta, but the Earley parsing algorithm is worst-cast cubic time "quadratic time for unambiguous grammars, and linear time for almost all LR(k) grammars". I don't know why you'd want to use Earley for parsing a programming language. IIRC, it was the first algorithm that could handle rampant ambiguity in polynomial time, but that isn't relevant to parsing programming languages (especially one like Python, which was explicitly designed to be simple to parse), and it isn't relevant to natural languages if you're not still in the 1960s, except in learning the theory and history of parsing. GLR does much better in almost-unambiguous/almost-deterministic languages; CYK can be easily extended with weights (which propagate sensibly, so you can use them for a final judgment, or to heuristically prune alternatives as you go); Valiant is easier to reason about mathematically; etc. And that's just among the parsers in the same basic family as Earley. Also, the point of OMeta is that it's not just a parsing algorithm, it's a complete system that's been designed and built and is being used to write DSLs, macros, and other language extensions in real-life code in languages like JavaScript and C#. So you don't have to imagine what kind of interface you could present or what it might be like to use it in practice, you can use it and find out. And I think it's in the same basic direction as the kind of interface you want for Python's parser. >> And I'm pretty sure you could design something with most of the strengths of OMeta without its weaknesses (just using a standard packrat PEG parser instead of an extended PEG parser seems like it would turn most of the exponential productions into explicit errors in the grammar?). Or you could go the opposite way and use GLR and bottom-up callbacks instead of PEG and top-down. Something like that would be a great idea for a research project. But it's probably not a great idea for a proposal to change the core of CPython, at least until someone does that research project. > > Yes, totally agree with you. So if it were me doing this work, I would put my energy in the research project to write an amazing parser in Python. And then I would try to convince the Python team to use that. I guess we don't disagree at all. Well, I think we disagree about the value of our time, and about the cost of disruptive changes. If I have a relatively low-work, almost completely non-disruptive way to definitely get everything I actually need, and a high-work, hugely-disruptive way to probably get what I actually need and also probably get a whole bunch of other useful stuff that I might be able to sell everyone else on if I also did a lot of additional work, that seems like a no-brainer to me. In fact, even if I wanted to write an amazing parser library for Python (and I kind of do, but I don't know if I have the time), I still don't think I'd want to suggest it as a replacement for the parser in CPython. Writing all the backward-compat adapters and porting the Python parser over with all its quirks intact and building the tests to prove that it's performance and error handling were strictly better and so on wouldn't be nearly as much fun as other things I could do with it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Sat Jun 6 15:36:28 2015 From: stefan at bytereef.org (s.krah) Date: Sat, 06 Jun 2015 13:36:28 +0000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <14dc9066f21.fb02a929181155.2652394160569890758@bytereef.org> Neil Girdhar <mistersheik at gmail.com> wrote: > Along with the grammar, you also give it code that it can execute as it matches each symbol in a rule. In Python for example, as it matches each argument passed to a function, it would keep track of the count of *args, **kwargs, and keyword arguments, and regular arguments, and then raise a syntax error if it encounters anything out of order. Right now that check is done in validate.c, which is really annoying. Agreed. For 3.4 it was possible to encode these particular semantics into the grammar itself, but it would no longer be LL(1). If I understood correctly, you wanted to handle lexing and parsing together. How would the INDENT/DEDENT tokens be generated? For my private ast generator, I did the opposite: I wanted to formalize the token preprocessing step, so I have: lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the ast directly) It isn't slower than what is in Python right now and you can hook into the token stream at any place. Stefan Krah -------------- next part -------------- An HTML attachment was scrubbed... URL: From guettliml at thomas-guettler.de Sat Jun 6 17:28:32 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Sat, 06 Jun 2015 17:28:32 +0200 Subject: [Python-ideas] Next steps to get type hinting become reality? Message-ID: <557311A0.20905@thomas-guettler.de> Based on the thead "Better type hinting" here are new questions: We have: PEP 484, Python2, the standard library and a dream called "type hinting". Is it possible to get type hinting for the standard library of Python2? If not, how to get type hinting for the standard library of Python3? What can students with some spare time do, to improve the current situation? Regards, Thomas G?ttler -- http://www.thomas-guettler.de/ From mistersheik at gmail.com Sat Jun 6 18:18:49 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 12:18:49 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <14dc9066f21.fb02a929181155.2652394160569890758@bytereef.org> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <14dc9066f21.fb02a929181155.2652394160569890758@bytereef.org> Message-ID: Maybe if every production has a link to its parent, then the spaces after a newline followed by statement reduce to indentation followed by statement, which reduces to indent or dedent or nothing followed by statement based on the parent's indentation level? In other words the parent (a file_input e.g.) has active control of the grammar of its children? On Sat, Jun 6, 2015 at 9:36 AM, s.krah wrote: > > > *Neil Girdhar >* wrote: > > Along with the grammar, you also give it code that it can execute as it > matches each symbol in a rule. In Python for example, as it matches each > argument passed to a function, it would keep track of the count of *args, > **kwargs, and keyword arguments, and regular arguments, and then raise a > syntax error if it encounters anything out of order. Right now that check > is done in validate.c, which is really annoying. > > Agreed. For 3.4 it was possible to encode these particular semantics into > the grammar > itself, but it would no longer be LL(1). > > If I understood correctly, you wanted to handle lexing and parsing > together. How > would the INDENT/DEDENT tokens be generated? > > For my private ast generator, I did the opposite: I wanted to formalize > the token > preprocessing step, so I have: > > lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the > ast directly) > > > It isn't slower than what is in Python right now and you can hook into the > token stream > at any place. > > > > Stefan Krah > > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Jun 6 18:23:03 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 12:23:03 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert wrote: > On Jun 5, 2015, at 22:50, Neil Girdhar wrote: > > On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert wrote: > >> First, I think your idea is almost completely tangential to mine. Yes, if >> you completely replaced both the interface and the implementation of the >> parser, you could do just about anything you wanted. But assuming nobody is >> going to completely replace the way Python does parsing today, I think it's >> still useful to add the one missing useful hook to the existing system. But >> let's continue. >> >> On Friday, June 5, 2015 7:08 PM, Neil Girdhar >> wrote: >> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert >> wrote: >> > >> If you want more background, see >> >> http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html >> (which I wrote to explain to someone else how floatliteralhack works). >> > > Yes. I want to point that if the lexer rules were alongside the parser, > they would be generating ast nodes ? so the hook for calling Decimal for > all floating point tokens would be doable in the same way as your AST hook. > > > No. The way Python currently exposes things, the AST hook runs on an > already-generated AST and transforms it into another one, to hand off to > the code generator. That means it can only be used to handle things that > parse as legal Python syntax (unless you replace the entire parser). > > What I want is a way to similarly take an already-generated token stream > and transform it into another one, to hand off to the parser. That will > allow it to be used to handle things that lex as legal Python tokens but > don't parse as legal Python syntax, like what Paul suggested. Merging > lexing into parsing not only doesn't give me that, it makes that impossible. > Yes, and I what I was suggesting is for the lexer to return AST nodes, so it would be fine to process those nodes in the same way. > > For the new tokens that you want, the ideal solution I think is to modify > the python parsing grammar before it parses the text. > > > But I don't want any new tokens. I just want to change the way existing > tokens are interpreted. > > Just as with an AST hook like PyMacro, I don't want any new nodes, I just > want to change the way existing nodes are interpreted. > > Yes, I see *how* you're trying to solve your problem, but my preference is to have one kind of hook rather than two kinds by unifying lexing and parsing. I think that's more elegant. > Of course I'm not presenting this as an ideal design if I were starting >> Python from scratch, but as the best design given what Python is already >> producing and consuming (a stream of tokens that's guaranteed to be >> equivalent to what you get out of the tokenize module). >> > > This is like saying "I want to change some things, but not other things". > > > That's exactly what I'm saying. In particular, I want to change as few > things as possible, to get what I want, without breaking stuff that already > demonstrably works and has worked for decades. > > I don't know about OMeta, but the Earley parsing algorithm is worst-cast > cubic time "quadratic time for unambiguous grammars, and linear time for > almost all LR(k) grammars". > > > I don't know why you'd want to use Earley for parsing a programming > language. IIRC, it was the first algorithm that could handle rampant > ambiguity in polynomial time, but that isn't relevant to parsing > programming languages (especially one like Python, which was explicitly > designed to be simple to parse), and it isn't relevant to natural languages > if you're not still in the 1960s, except in learning the theory and history > of parsing. GLR does much better in almost-unambiguous/almost-deterministic > languages; CYK can be easily extended with weights (which propagate > sensibly, so you can use them for a final judgment, or to heuristically > prune alternatives as you go); Valiant is easier to reason about > mathematically; etc. And that's just among the parsers in the same basic > family as Earley. > I suggested Earley to mitigate this fear of "exponential backtracking" since that won't happen in Earley. > > Also, the point of OMeta is that it's not just a parsing algorithm, it's a > complete system that's been designed and built and is being used to write > DSLs, macros, and other language extensions in real-life code in languages > like JavaScript and C#. So you don't have to imagine what kind of interface > you could present or what it might be like to use it in practice, you can > use it and find out. And I think it's in the same basic direction as the > kind of interface you want for Python's parser. > > And I'm pretty sure you could design something with most of the strengths >> of OMeta without its weaknesses (just using a standard packrat PEG parser >> instead of an extended PEG parser seems like it would turn most of the >> exponential productions into explicit errors in the grammar?). Or you could >> go the opposite way and use GLR and bottom-up callbacks instead of PEG and >> top-down. Something like that would be a great idea for a research project. >> But it's probably not a great idea for a proposal to change the core of >> CPython, at least until someone does that research project. >> > > Yes, totally agree with you. So if it were me doing this work, I would > put my energy in the research project to write an amazing parser in Python. > And then I would try to convince the Python team to use that. I guess > we don't disagree at all. > > > Well, I think we disagree about the value of our time, and about the cost > of disruptive changes. > > If I have a relatively low-work, almost completely non-disruptive way to > definitely get everything I actually need, and a high-work, > hugely-disruptive way to probably get what I actually need and also > probably get a whole bunch of other useful stuff that I might be able to > sell everyone else on if I also did a lot of additional work, that seems > like a no-brainer to me. > > In fact, even if I wanted to write an amazing parser library for Python > (and I kind of do, but I don't know if I have the time), I still don't > think I'd want to suggest it as a replacement for the parser in CPython. > Writing all the backward-compat adapters and porting the Python parser over > with all its quirks intact and building the tests to prove that it's > performance and error handling were strictly better and so on wouldn't be > nearly as much fun as other things I could do with it. > If you ever decide to write that amazing parser library for Python and want any help please feel free to let me know. Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jun 6 19:13:59 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Jun 2015 10:13:59 -0700 Subject: [Python-ideas] Next steps to get type hinting become reality? In-Reply-To: <557311A0.20905@thomas-guettler.de> References: <557311A0.20905@thomas-guettler.de> Message-ID: The plan is to have volunteers produce stubs for the stdlib, and to contribute those to typeshed: https://github.com/JukkaL/typeshed (that's a shared resource and will eventually be transferred to the PSF, i.e. https://github.com/python/) If you want to use type annotations in Python 2, there's this hack: https://github.com/JukkaL/mypy/tree/master/mypy/codec On Sat, Jun 6, 2015 at 8:28 AM, Thomas G?ttler wrote: > Based on the thead "Better type hinting" here are new questions: > > We have: PEP 484, Python2, the standard library and a dream called "type > hinting". > > Is it possible to get type hinting for the standard library of Python2? > > If not, how to get type hinting for the standard library of Python3? > > What can students with some spare time do, to improve the current > situation? > > Regards, > Thomas G?ttler > > -- > http://www.thomas-guettler.de/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Jun 6 19:52:31 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 06 Jun 2015 12:52:31 -0500 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <6129082A-3B36-45C1-A479-00EC0BA1FF3A@gmail.com> On June 6, 2015 12:29:21 AM CDT, Neil Girdhar wrote: >On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan >wrote: > >> On 6 June 2015 at 12:21, Neil Girdhar wrote: >> > I'm curious what other people will contribute to this discussion as >I >> think >> > having no great parsing library is a huge hole in Python. Having >one >> would >> > definitely allow me to write better utilities using Python. >> >> The design of *Python's* grammar is deliberately restricted to being >> parsable with an LL(1) parser. There are a great many static analysis >> and syntax highlighting tools that are able to take advantage of that >> simplicity because they only care about the syntax, not the full >> semantics. >> > >Given the validation that happens, it's not actually LL(1) though. >It's >mostly LL(1) with some syntax errors that are raised for various >illegal >constructs. > >Anyway, no one is suggesting changing the grammar. > > >> Anyone actually doing their *own* parsing of something else *in* >> Python, would be better advised to reach for PLY >> (https://pypi.python.org/pypi/ply ). PLY is the parser underlying >> https://pypi.python.org/pypi/pycparser, and hence the highly regarded >> CFFI library, https://pypi.python.org/pypi/cffi >> >> Other notable parsing alternatives folks may want to look at include >> https://pypi.python.org/pypi/lrparsing and >> http://pythonhosted.org/pyparsing/ (both of which allow you to use >> Python code to define your grammar, rather than having to learn a >> formal grammar notation). >> >> >I looked at ply and pyparsing, but it was impossible to simply parse >LaTeX >because I couldn't explain to suck up the right number of arguments >given >the name of the function. When it sees a function, it learns how many >arguments that function needs. When it sees a function call >\a{1}{2}{3}, >if "\a" takes 2 arguments, then it should only suck up 1 and 2 as >arguments, and leave 3 as a regular text token. In other words, I >should be >able to tell the parser what to expect in code that lives on the rule >edges. Can't you just hack it into the lexer? When the slash is detected, the lexer can treat the following identifier as a function, look up the number of required arguments, and push it onto some sort of stack. Whenever a left bracket is encountered and another argument is needed by the TOS, it returns a special argument opener token. > >The parsing tools you listed work really well until you need to do >something like (1) the validation step that happens in Python, or (2) >figuring out exactly where the syntax error is (line and column number) >or >(3) ensuring that whitespace separates some tokens even when it's not >required to disambiguate different parse trees. I got the impression >that >they wanted to make these languages simple for the simple cases, but >they >were made too simple and don't allow you to do everything in one simple >pass. > >Best, > >Neil > > >> Regards, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From mistersheik at gmail.com Sat Jun 6 20:27:14 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 14:27:14 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <6129082A-3B36-45C1-A479-00EC0BA1FF3A@gmail.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <6129082A-3B36-45C1-A479-00EC0BA1FF3A@gmail.com> Message-ID: Right. On Sat, Jun 6, 2015 at 1:52 PM, Ryan Gonzalez wrote: > > > On June 6, 2015 12:29:21 AM CDT, Neil Girdhar > wrote: > >On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan > >wrote: > > > >> On 6 June 2015 at 12:21, Neil Girdhar wrote: > >> > I'm curious what other people will contribute to this discussion as > >I > >> think > >> > having no great parsing library is a huge hole in Python. Having > >one > >> would > >> > definitely allow me to write better utilities using Python. > >> > >> The design of *Python's* grammar is deliberately restricted to being > >> parsable with an LL(1) parser. There are a great many static analysis > >> and syntax highlighting tools that are able to take advantage of that > >> simplicity because they only care about the syntax, not the full > >> semantics. > >> > > > >Given the validation that happens, it's not actually LL(1) though. > >It's > >mostly LL(1) with some syntax errors that are raised for various > >illegal > >constructs. > > > >Anyway, no one is suggesting changing the grammar. > > > > > >> Anyone actually doing their *own* parsing of something else *in* > >> Python, would be better advised to reach for PLY > >> (https://pypi.python.org/pypi/ply ). PLY is the parser underlying > >> https://pypi.python.org/pypi/pycparser, and hence the highly regarded > >> CFFI library, https://pypi.python.org/pypi/cffi > >> > >> Other notable parsing alternatives folks may want to look at include > >> https://pypi.python.org/pypi/lrparsing and > >> http://pythonhosted.org/pyparsing/ (both of which allow you to use > >> Python code to define your grammar, rather than having to learn a > >> formal grammar notation). > >> > >> > >I looked at ply and pyparsing, but it was impossible to simply parse > >LaTeX > >because I couldn't explain to suck up the right number of arguments > >given > >the name of the function. When it sees a function, it learns how many > >arguments that function needs. When it sees a function call > >\a{1}{2}{3}, > >if "\a" takes 2 arguments, then it should only suck up 1 and 2 as > >arguments, and leave 3 as a regular text token. In other words, I > >should be > >able to tell the parser what to expect in code that lives on the rule > >edges. > > Can't you just hack it into the lexer? When the slash is detected, the > lexer can treat the following identifier as a function, look up the number > of required arguments, and push it onto some sort of stack. Whenever a left > bracket is encountered and another argument is needed by the TOS, it > returns a special argument opener token. > Your solution is right, but I would implement it in the parser since I want that kind of generic functionality of dynamic grammar rules to be available everywhere. > > > > >The parsing tools you listed work really well until you need to do > >something like (1) the validation step that happens in Python, or (2) > >figuring out exactly where the syntax error is (line and column number) > >or > >(3) ensuring that whitespace separates some tokens even when it's not > >required to disambiguate different parse trees. I got the impression > >that > >they wanted to make these languages simple for the simple cases, but > >they > >were made too simple and don't allow you to do everything in one simple > >pass. > > > >Best, > > > >Neil > > > > > >> Regards, > >> Nick. > >> > >> -- > >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > >> > > > > > >------------------------------------------------------------------------ > > > >_______________________________________________ > >Python-ideas mailing list > >Python-ideas at python.org > >https://mail.python.org/mailman/listinfo/python-ideas > >Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rymg19 at gmail.com Sat Jun 6 20:31:46 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sat, 06 Jun 2015 13:31:46 -0500 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <6129082A-3B36-45C1-A479-00EC0BA1FF3A@gmail.com> Message-ID: On June 6, 2015 1:27:14 PM CDT, Neil Girdhar wrote: >Right. > >On Sat, Jun 6, 2015 at 1:52 PM, Ryan Gonzalez wrote: > >> >> >> On June 6, 2015 12:29:21 AM CDT, Neil Girdhar >> wrote: >> >On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan >> >wrote: >> > >> >> On 6 June 2015 at 12:21, Neil Girdhar >wrote: >> >> > I'm curious what other people will contribute to this discussion >as >> >I >> >> think >> >> > having no great parsing library is a huge hole in Python. >Having >> >one >> >> would >> >> > definitely allow me to write better utilities using Python. >> >> >> >> The design of *Python's* grammar is deliberately restricted to >being >> >> parsable with an LL(1) parser. There are a great many static >analysis >> >> and syntax highlighting tools that are able to take advantage of >that >> >> simplicity because they only care about the syntax, not the full >> >> semantics. >> >> >> > >> >Given the validation that happens, it's not actually LL(1) though. >> >It's >> >mostly LL(1) with some syntax errors that are raised for various >> >illegal >> >constructs. >> > >> >Anyway, no one is suggesting changing the grammar. >> > >> > >> >> Anyone actually doing their *own* parsing of something else *in* >> >> Python, would be better advised to reach for PLY >> >> (https://pypi.python.org/pypi/ply ). PLY is the parser underlying >> >> https://pypi.python.org/pypi/pycparser, and hence the highly >regarded >> >> CFFI library, https://pypi.python.org/pypi/cffi >> >> >> >> Other notable parsing alternatives folks may want to look at >include >> >> https://pypi.python.org/pypi/lrparsing and >> >> http://pythonhosted.org/pyparsing/ (both of which allow you to use >> >> Python code to define your grammar, rather than having to learn a >> >> formal grammar notation). >> >> >> >> >> >I looked at ply and pyparsing, but it was impossible to simply parse >> >LaTeX >> >because I couldn't explain to suck up the right number of arguments >> >given >> >the name of the function. When it sees a function, it learns how >many >> >arguments that function needs. When it sees a function call >> >\a{1}{2}{3}, >> >if "\a" takes 2 arguments, then it should only suck up 1 and 2 as >> >arguments, and leave 3 as a regular text token. In other words, I >> >should be >> >able to tell the parser what to expect in code that lives on the >rule >> >edges. >> >> Can't you just hack it into the lexer? When the slash is detected, >the >> lexer can treat the following identifier as a function, look up the >number >> of required arguments, and push it onto some sort of stack. Whenever >a left >> bracket is encountered and another argument is needed by the TOS, it >> returns a special argument opener token. >> > >Your solution is right, but I would implement it in the parser since I >want >that kind of generic functionality of dynamic grammar rules to be >available >everywhere. > Unless the parsing library doesn't support that. Like PLY. I believe pycparser also uses the lexer to manage type names. > >> >> > >> >The parsing tools you listed work really well until you need to do >> >something like (1) the validation step that happens in Python, or >(2) >> >figuring out exactly where the syntax error is (line and column >number) >> >or >> >(3) ensuring that whitespace separates some tokens even when it's >not >> >required to disambiguate different parse trees. I got the >impression >> >that >> >they wanted to make these languages simple for the simple cases, but >> >they >> >were made too simple and don't allow you to do everything in one >simple >> >pass. >> > >> >Best, >> > >> >Neil >> > >> > >> >> Regards, >> >> Nick. >> >> >> >> -- >> >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> >> > >> > >> >>------------------------------------------------------------------------ >> > >> >_______________________________________________ >> >Python-ideas mailing list >> >Python-ideas at python.org >> >https://mail.python.org/mailman/listinfo/python-ideas >> >Code of Conduct: http://python.org/psf/codeofconduct/ >> >> -- >> Sent from my Android device with K-9 Mail. Please excuse my brevity. >> -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From mistersheik at gmail.com Sat Jun 6 20:44:38 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 6 Jun 2015 14:44:38 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <14dc9066f21.fb02a929181155.2652394160569890758@bytereef.org> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <14dc9066f21.fb02a929181155.2652394160569890758@bytereef.org> Message-ID: Ryan: I'm trying to figure out how the parsing library should be done ? not trying to work around other designs. Stefan: maybe this is a better answer to your question. So thinking about this more, this is how I think it should be done: Each grammar rule is expressed as an Iterable. class FileInput: def __init__(self): self.indent_level = None def match(self): while True: matched = yield Disjunction( '\n', [Whitespace(self.indent_level, indent=False), Statement()]) if matched == '\n': break yield EndOfFile() class Suite: def __init__(self, indent_level): self.indent_level = indent_level def match(self): yield Disjunction( SimpleStatement(), ['\n', Whitespace(self.indent_level, indent=True), Repeat(Statement)]) # dedent is not required because the next statement knows its indent # level. On Sat, Jun 6, 2015 at 9:36 AM, s.krah wrote: > > > *Neil Girdhar >* wrote: > > Along with the grammar, you also give it code that it can execute as it > matches each symbol in a rule. In Python for example, as it matches each > argument passed to a function, it would keep track of the count of *args, > **kwargs, and keyword arguments, and regular arguments, and then raise a > syntax error if it encounters anything out of order. Right now that check > is done in validate.c, which is really annoying. > > Agreed. For 3.4 it was possible to encode these particular semantics into > the grammar > itself, but it would no longer be LL(1). > > If I understood correctly, you wanted to handle lexing and parsing > together. How > would the INDENT/DEDENT tokens be generated? > > For my private ast generator, I did the opposite: I wanted to formalize > the token > preprocessing step, so I have: > > lexer -> parser1 (generates INDENT/DEDENT) -> parser2 (generates the > ast directly) > > > It isn't slower than what is in Python right now and you can hook into the > token stream > at any place. > > > > Stefan Krah > > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/WTFHSUbfU20/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sat Jun 6 21:02:50 2015 From: mertz at gnosis.cx (David Mertz) Date: Sat, 6 Jun 2015 12:02:50 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <93565846-227F-4C13-9895-05C2EDCE4299@gmail.com> Message-ID: On Fri, Jun 5, 2015 at 9:27 PM, Guido van Rossum wrote: > You're putting a lot of faith in "modern" parsers. I don't know if PLY > qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies > the lexer and parser. However I don't think it would be much better for a > language the size of Python. > PLY doesn't really "unify" the lexer and parser; it just provides both of them in the same Python package (and uses somewhat similar syntax and conventions for each). I wrote a project at my last consulting position to process a fairly complex DSL (used for code generation to several targets, Python, C++, Verilog, etc.). I like PLY, and decided to use that tool; but after a short while I gave up on the parser part of it, and only used the lexing, leaving parsing to "hand rolled" code. I'm sure I *could* have managed to shoehorn in the entire EBNF stuff into the parsing component of PLY. But for my own purpose, I found it more important to do various simplifications and modifications of the token stream before generating the data structures that defined the eventual output parameters. So in this respect, what I did is something like a simpler version of Python's compilation pipeline. Actually, what I did was probably terrible practice for parsing purists, but felt to me like the best "practicality beats purity" approach. There were these finite number of constructs in the DSL, and I would simply scan through the token stream, in several passes, trying to identify a particular construct, then pulling it out into the relevant data structure type, and just marking those tokens as "used". Other passes would look for other constructs, and in some cases I'd need to resolve a reference to one kind of construct that wasn't generated until a later pass in a "unification" step. There was a bit of duct tape and bailing wire involved in all of this, but it actually seemed to keep the code as simple as possible by isolating the code to generate each type of construct. None of which is actually relevant to what Python should do in its parsing, just a little bit of rambling thoughts. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guettliml at thomas-guettler.de Sat Jun 6 22:22:15 2015 From: guettliml at thomas-guettler.de (=?UTF-8?B?VGhvbWFzIEfDvHR0bGVy?=) Date: Sat, 06 Jun 2015 22:22:15 +0200 Subject: [Python-ideas] Next steps to get type hinting become reality? In-Reply-To: References: <557311A0.20905@thomas-guettler.de> Message-ID: <55735677.7060909@thomas-guettler.de> Am 06.06.2015 um 19:13 schrieb Guido van Rossum: > The plan is to have volunteers produce stubs for the stdlib, and to > contribute those to typeshed: https://github.com/JukkaL/typeshed (that's a > shared resource and will eventually be transferred to the PSF, i.e. > https://github.com/python/) > > If you want to use type annotations in Python 2, there's this hack: > https://github.com/JukkaL/mypy/tree/master/mypy/codec typeshed is referenced in the PEP. But something like your above answer to my question is missing in the PEP 484. Why not add a new chapter to the PEP with explains this roadmap? Should I open an issue for pep 484? Regards, Thomas G?ttler -- http://www.thomas-guettler.de/ From abarnert at yahoo.com Sun Jun 7 00:52:22 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jun 2015 15:52:22 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Jun 6, 2015, at 09:23, Neil Girdhar wrote: > >> On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert wrote: >>> On Jun 5, 2015, at 22:50, Neil Girdhar wrote: >>> >>>> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert wrote: >>>> First, I think your idea is almost completely tangential to mine. Yes, if you completely replaced both the interface and the implementation of the parser, you could do just about anything you wanted. But assuming nobody is going to completely replace the way Python does parsing today, I think it's still useful to add the one missing useful hook to the existing system. But let's continue. >>>> >>>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar wrote: >>>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert wrote: >>>> > >>>> If you want more background, see >>>> http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html (which I wrote to explain to someone else how floatliteralhack works). >>> >>> Yes. I want to point that if the lexer rules were alongside the parser, they would be generating ast nodes ? so the hook for calling Decimal for all floating point tokens would be doable in the same way as your AST hook. >> >> No. The way Python currently exposes things, the AST hook runs on an already-generated AST and transforms it into another one, to hand off to the code generator. That means it can only be used to handle things that parse as legal Python syntax (unless you replace the entire parser). >> >> What I want is a way to similarly take an already-generated token stream and transform it into another one, to hand off to the parser. That will allow it to be used to handle things that lex as legal Python tokens but don't parse as legal Python syntax, like what Paul suggested. Merging lexing into parsing not only doesn't give me that, it makes that impossible. > > Yes, and I what I was suggesting is for the lexer to return AST nodes, so it would be fine to process those nodes in the same way. Seriously? Tokens don't form a tree, they form a list. Yes, every linked list is just a degenerate tree, so you could have every "node" just include the next one as a child. But why? Do you want to then the input text into a tree of character nodes? Python has all kinds of really good tools for dealing with iterables; why take away those tools and force me to work with a more complicated abstraction that Python doesn't have any tools for dealing with? In the case of the user-defined literal hack, for example, I can use the adjacent-pairs recipe from itertools and my transformation becomes trivial. I did it more explicitly in the hack I uploaded, using a generator function with a for statement, just to make it blindingly obvious what's happening. But if I had to deal with a tree, I'd either have to write explicit lookahead or store some state explicitly on the tree or the visitor. That isn't exactly _hard_, but it's certainly _harder_, and for no benefit. Also, if we got my change, I could write code that cleanly hooks parsing in 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can at least use it, and all of the relevant and complicated code would be shared between the two versions. With your change, I'd have to write code that was completely different for 3.6+ than what I could backport, meaning I'd have to write, debug, and maintain two completely different implementations. And again, for no benefit. And finally, once again: we already have a token stream as part of the process, we already expose every other interesting step in the process, exposing the token stream as it exists today in a way that fits into everything else as it exists today is clearly the easiest and least disruptive thing to do. Sometimes it's worth doing harder or more disruptive things because they provide more benefit, but you haven't yet shown any benefit. You asked me for examples, and I provided them. Why don't you try writing a couple of actual examples--user literals, the LINQ-ish example from MacroPy, whatever--using your proposed design to show us how they could be simpler, or more elegant, or open up further possibilities. Or come up with an example of something your design could do that the existing one (even with my small proposed change) can't. >>> For the new tokens that you want, the ideal solution I think is to modify the python parsing grammar before it parses the text. >> >> But I don't want any new tokens. I just want to change the way existing tokens are interpreted. >> >> Just as with an AST hook like PyMacro, I don't want any new nodes, I just want to change the way existing nodes are interpreted. > > Yes, I see *how* you're trying to solve your problem, but my preference is to have one kind of hook rather than two kinds by unifying lexing and parsing. I think that's more elegant. I'm trying to find a way to interpret this that makes sense. I think you're suggesting that we should throw out the idea of letting users write and install simple post-processing hooks in Python, because that will force us to find a way to instead make the entire parsing process user-customizable at runtime, which will force users to come up with "more elegant" solutions involving changing the grammar instead of post-processing it macro-style. If so, I think that's a very bad idea. Decades of practice in both Python and many other languages (especially those with built-in macro facilities) shows that post-processing at the relevant level is generally simple and elegant. Even if we had a fully-runtime-customizable parser, something like OMeta but "closing the loop" and implementing the language in the programmable metalanguage, many things are still simpler and more elegant written post-processing style (as used by existing import hooks, including MacroPy, and in other languages going all the way back to Lisp), and there's a much lower barrier to learning them, and there's much less risk of breaking the compiler/interpreter being used to run your hook in the first place. And, even if none of that were true, and your new and improved system really were simpler in every case, and you had actually built it rather than just envisioning it, there's still backward compatibility to think of. Do you really want to break working, documented functionality that people have written things like MacroPy on top of, even if forcing them to redesign and rewrite everything from scratch would force them to come up with a "more elegant" solution? And finally, the added flexibility of such a system is a cost as well as a benefit--the fact that Arc makes it as easy as possible to "rewrite the language into one that makes writing your application trivial" also means that one Arc programmer can't understand another's code until putting in a lot of effort to learn his idiosyncratic language. >>> I don't know about OMeta, but the Earley parsing algorithm is worst-cast cubic time "quadratic time for unambiguous grammars, and linear time for almost all LR(k) grammars". >> >> I don't know why you'd want to use Earley for parsing a programming language. IIRC, it was the first algorithm that could handle rampant ambiguity in polynomial time, but that isn't relevant to parsing programming languages (especially one like Python, which was explicitly designed to be simple to parse), and it isn't relevant to natural languages if you're not still in the 1960s, except in learning the theory and history of parsing. GLR does much better in almost-unambiguous/almost-deterministic languages; CYK can be easily extended with weights (which propagate sensibly, so you can use them for a final judgment, or to heuristically prune alternatives as you go); Valiant is easier to reason about mathematically; etc. And that's just among the parsers in the same basic family as Earley. > > I suggested Earley to mitigate this fear of "exponential backtracking" since that won't happen in Earley. I already explained that using standard PEG with a packrat parser instead of extended PEG with an OMeta-style parser gives you linear time. Why do you think telling me about a decades-older cubic-time algorithm designed for parsing natural languages that's a direct ancestor to two other algorithms I also already mentioned is going to be helpful? Do you not understand the advantages of PEG or GLR over Earley? -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 7 01:04:19 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Jun 2015 16:04:19 -0700 Subject: [Python-ideas] Next steps to get type hinting become reality? In-Reply-To: <55735677.7060909@thomas-guettler.de> References: <557311A0.20905@thomas-guettler.de> <55735677.7060909@thomas-guettler.de> Message-ID: If you really think this should be added to the PEP please submit a pull request. Don't mention the PY2 thing though (it's unofficial). On Jun 6, 2015 1:22 PM, "Thomas G?ttler" wrote: > Am 06.06.2015 um 19:13 schrieb Guido van Rossum: > > The plan is to have volunteers produce stubs for the stdlib, and to > > contribute those to typeshed: https://github.com/JukkaL/typeshed > (that's a > > shared resource and will eventually be transferred to the PSF, i.e. > > https://github.com/python/) > > > > If you want to use type annotations in Python 2, there's this hack: > > https://github.com/JukkaL/mypy/tree/master/mypy/codec > > typeshed is referenced in the PEP. But something like your above answer to > my question > is missing in the PEP 484. > > Why not add a new chapter to the PEP with explains this roadmap? > > Should I open an issue for pep 484? > > > Regards, > Thomas G?ttler > > > -- > http://www.thomas-guettler.de/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgbeutler at gmail.com Sun Jun 7 05:03:38 2015 From: cgbeutler at gmail.com (Cory Beutler) Date: Sat, 6 Jun 2015 21:03:38 -0600 Subject: [Python-ideas] If branch merging Message-ID: I recently(1 year ago) realized that 'if', 'elif', and 'else' provide easy branching for code, but there is no easy way to merge branches of code back together. One fix for this would be introduction of two new keywords: 'also' and 'alif' (also if) Here are the definitions of if-chain keywords with the addition of 'also' and 'alif': *if *- execute code if this condition is met *else *- execute code if no previous condition is met *elif *- execute code if no previous condition is met and this condition is met *also *- execute code if any previous condition is met *alif *- execute code if any previous condition is met and this condition is met This would simplify some logic expressions by allowing the merging of branched code. *Examples of use:* *Duplicate code in if-chains may be reduced:* # Old Version if a == b: print ('a == b') foo() # <-- duplicate code elif b == c: print ('b == c') foo() # <-- duplicate code elif c == d: print ('c == d') foo() # <-- duplicate code # New Version if a == b: print ('a == b') elif b == c: print ('b == c') elif c == d: print ('c == d') also: foo() # <-- No longer duplicated *Many nested 'if' statements could now be a more linear style:* # Old Version if a == b: print ('a == b') if b == c: print ('b == c') print ('end if') # New Version if a == b: print ('a == b') alif b == c: print ('b == c') also: print ('end if') These two examples are the most common ways this will help code. I have been writing code samples using these keywords and have found that it simplifies many other things as well. It does take a bit of getting used to, though. I have found that is is best to use 'also' and 'alif' sparingly, as overuse can make some code less flexible and more confusing. *Selective Branch merging:* One limitation of the 'also' and 'alif' keywords is the restriction to the "all of the above" checking. What I mean by that is that there is no way to pick and choose which branches to merge back together. When using 'also' and 'alif' you are catching all previous if-branches. One easy way to solve this would be to allow for named branching. The most simple way to do this is to save the conditions of each branch into a variable with a name. Here is an example of merging only select branches together: # Old Version if a == b: print ('a == b') elif a == c: print ('a == c') elif a == d: print ('a == d') if (a == b) or (a == d): print ('a == b and a == d') # New Version using 'as' keyword if a == b as aisb: print ('a == b') elif a == c: print ('a == c') elif a == d as aisd: print ('a == d') alif aisd or aisb: print ('a == b and a == d') NOTE: In the old version, it may be necessary to save off the boolean expression beforehand if the variables being used are changed in the first set of conditions. This would not be required in the new version, making the total code written an even larger gap. I realize that using the 'as' keyword may not be the best. Using 'as' seems to suggest that it will only be used in the following block. An alternative to using the 'as' keyword could be assigning the 'if' to a variable like so: aisb = if a == b: This looks a bit gross to me, though. If think of a better one, I would love to see it. *Speed:* The next logical question to ask is that of speed. Will this slow down my code at all? I happily submit to you that it shouldn't. If done right, this may even speed things up by a cycle or two (yeah. I know. That is not enough to fuss over, but I view it as a side benefit.) When just using 'also' and 'alif', there should be no difference in speed. The same number of jumps and checks should be done as before. Naming branches may add an additional assignment operation, but this should be more than made up for by not having to calculate the condition more than once. There may be a few cases where this would be slower, but those can be optimized to result in old-style code. *Implementation:* I am currently learning how the python parser and lexer work in hopes of making a custom version containing these features. Because of my lack of knowledge here, I cannot say how it should be implemented in python specifically. Here is how each if code block should work in basic english: on True: 1. Execute code block 2. Jump to next 'alif', 'also', or end of if-chain on False: 1. jump to next 'elif', 'else', or end of if-chain Note: 'also' blocks can be useful in more places than just the end of a chain: if a == b: print ('a == b') elif a == c: print ('a == c') also: print ('a == b == c') else: print ('a != b and a != c') It could also be useful to have more than one 'also' in an if chain. In contrast, having an 'else' before the end is not so useful. For example, placing one before an also makes the also a little pointless: if a == b: print ('a == b') elif a == c: print ('a == c') else: print ('a != b and a != c') also: print ('This will always execute') It would therefore be good to still require that 'else' be the last item in a chain. *The End* Thank you for humoring my idea. I am new to this mailing list, so sorry if this seems out of line or something. I am currently working with ansi C to try to get something similar into the 'C' language, but it could be 20 years before anything comes of that. The gears move ofly slow over there. Anyway, I look forward to hearing your feedback. -Cory Beutler -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Sun Jun 7 05:40:55 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sat, 06 Jun 2015 23:40:55 -0400 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> On Sat, Jun 6, 2015, at 23:03, Cory Beutler wrote: > # New Version using 'as' keyword > if a == b as aisb: > print ('a == b') > I realize that using the 'as' keyword may not be the best. Using 'as' > seems > to suggest that it will only be used in the following block. An > alternative > to using the 'as' keyword could be assigning the 'if' to a variable like > so: > aisb = if a == b: > This looks a bit gross to me, though. If think of a better one, I would > love to see it. Well you could always go with if aisb = a == b. I'm not sure there is a convincing reason to allow your case (assign within an if statement) that doesn't also work just as well as a general argument for assignment expressions. From ben+python at benfinney.id.au Sun Jun 7 06:17:36 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 07 Jun 2015 14:17:36 +1000 Subject: [Python-ideas] If branch merging References: Message-ID: <85h9qki23j.fsf@benfinney.id.au> Cory Beutler writes: > This would simplify some logic expressions by allowing the merging of > branched code. I don't think you've made the case for that assertion. Your description is clear, but I can't see how real code would be simplified. I would like to see some real examples of code that is using existing syntax, and your proposed syntax, so the merits can be discussed in context. Can you provide some real-world code examples, that you believe would be improved by this change? -- \ ?All my life I've had one dream: to achieve my many goals.? | `\ ?Homer, _The Simpsons_ | _o__) | Ben Finney From abarnert at yahoo.com Sun Jun 7 06:29:35 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sat, 6 Jun 2015 21:29:35 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: <74F53743-9F29-464E-BB99-F16EB91CA691@yahoo.com> On Jun 6, 2015, at 20:03, Cory Beutler wrote: > > I recently(1 year ago) realized that 'if', 'elif', and 'else' provide easy branching for code, but there is no easy way to merge branches of code back together. One fix for this would be introduction of two new keywords: 'also' and 'alif' (also if) Can you provide a realistic use case for when you'd want this, instead of just a toy example where you check meaningless variables for equality? Because in practice, almost every time I've wanted complicated elif chains, deeply nested ifs, or anything else like it, it's been easy to either refactor the code into a function, replace the conditionals with a dict, or both. That isn't _always_ true, but I'm having a hard time coming up with an example where I really do need complicated elif chains and your new syntax would help. > on True: > 1. Execute code block > 2. Jump to next 'alif', 'also', or end of if-chain > on False: > 1. jump to next 'elif', 'else', or end of if-chain > > Note: 'also' blocks can be useful in more places than just the end of a chain: > if a == b: > print ('a == b') > elif a == c: > print ('a == c') > also: > print ('a == b == c') > else: > print ('a != b and a != c') This example seems to do the wrong thing. If a=b=2 and c=3, or a=c=3 and b=2, you're going to print "a == b == c" even though that isn't true. This implies that maybe it isn't as easy to think through the logic and keep the conditions in your head as you expected, even in relatively simple cases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 7 07:19:41 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 7 Jun 2015 15:19:41 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: <20150607051941.GF20701@ando.pearwood.info> On Sat, Jun 06, 2015 at 09:03:38PM -0600, Cory Beutler wrote: [...] > This would simplify some logic expressions by allowing the merging of > branched code. > > *Examples of use:* > *Duplicate code in if-chains may be reduced:* > # Old Version > if a == b: > print ('a == b') > foo() # <-- duplicate code > elif b == c: > print ('b == c') > foo() # <-- duplicate code > elif c == d: > print ('c == d') > foo() # <-- duplicate code if a == b: print('a == b') elif b == c: print('b == c') elif c == d: print('c == d') foo() No new syntax required. > *Many nested 'if' statements could now be a more linear style:* > # Old Version > if a == b: > print ('a == b') > if b == c: > print ('b == c') > print ('end if') What's wrong with that code? Nesting the code like that follows the logic of the code: the b==c test *only* occurs if a==b. > # New Version > if a == b: > print ('a == b') > alif b == c: > print ('b == c') > also: > print ('end if') I consider this significantly worse. It isn't clear that the comparison between b and c is only made if a == b, otherwise it is entirely skipped. > These two examples are the most common ways this will help code. I have > been writing code samples using these keywords and have found that it > simplifies many other things as well. It does take a bit of getting used > to, though. I have found that is is best to use 'also' and 'alif' > sparingly, as overuse can make some code less flexible and more confusing. You don't say. Are you aware of any languages with this construct? What are the rules for combining various if...elif...alif...also...else blocks? > *Selective Branch merging:* > One limitation of the 'also' and 'alif' keywords is the restriction to the > "all of the above" checking. What I mean by that is that there is no way to > pick and choose which branches to merge back together. When using 'also' > and 'alif' you are catching all previous if-branches. One easy way to solve > this would be to allow for named branching. The most simple way to do this > is to save the conditions of each branch into a variable with a name. Here > is an example of merging only select branches together: > # Old Version > if a == b: > print ('a == b') > elif a == c: > print ('a == c') > elif a == d: > print ('a == d') > if (a == b) or (a == d): > print ('a == b and a == d') That code is wrong. Was that an intentional error? The final branch prints that a == b == d, but that's not correct, it runs when either a == b or a == d, not just when both are true. Personally, I would write that as: if a == b or a == d: if a == b: print('a == b') else: print('a == d') print('a == b or a == d') elif a == c: print('a == c') You do end up comparing a and b for equality twice, but worrying about that is likely to be premature optimization. It isn't worth adding syntax to the language just for the one time in a million that actually matters. > # New Version using 'as' keyword > if a == b as aisb: > print ('a == b') > elif a == c: > print ('a == c') > elif a == d as aisd: > print ('a == d') > alif aisd or aisb: > print ('a == b and a == d') With this "as" proposal, there's no need for "alif": if a == b as aisb: print('a == b') elif a == c: print('a == c') elif a == d as aisd: print('a == d') if aisd or aisb: print('a == b or a == d') I think this has been proposed before. [...] > I realize that using the 'as' keyword may not be the best. Using 'as' seems > to suggest that it will only be used in the following block. Not necessarily. Consider "import module as spam". [...] > I am currently working with ansi C to > try to get something similar into the 'C' language, but it could be 20 > years before anything comes of that. The gears move ofly slow over there. One advantage of Python is that the language does evolve more quickly, but still, Python is a fairly conservative language. We don't typically add new syntax features unless they solve a problem in an elegant fashion that cannot be solved easily without it. -- Steve From steve at pearwood.info Sun Jun 7 07:25:00 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 7 Jun 2015 15:25:00 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> Message-ID: <20150607052459.GG20701@ando.pearwood.info> On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at fastmail.us wrote: > Well you could always go with if aisb = a == b. No, that is a terrible design and a source of many bugs in languages that allow it. if a = expr: ... Oops, I meant to compare a == expr, instead I assigned the result of the expression to a. I'm not convinced that we need to allow name binding in if/elif clauses, but if we do, the Pythonic syntax would be if a == b as aeqb: ... -- Steve From bruce at leban.us Sun Jun 7 07:50:52 2015 From: bruce at leban.us (Bruce Leban) Date: Sat, 6 Jun 2015 22:50:52 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: On Sat, Jun 6, 2015 at 8:03 PM, Cory Beutler wrote: > *also *- execute code if any previous condition is met > > Thank you for humoring my idea. I am new to this mailing list, so sorry if > this seems out of line or something. > Seeing many posts on this list which are repeats of ideas seen many times, it's nice to see a new idea. I think the difficulty of making this work is how often you want something only when *all* of the previous conditions are true yet can't conveniently do it another way (e.g., setting a flag). Your point about writing conditions multiple time is legitimate and happens frequently. Here's an example of something where there is a similar difficulty in writing simple code: if foo.a == 0: ... elif foo.a == 1 and foo.b == 0: ... elif foo.a >= 1 and foo.b >= 0 and foo.c = 0: ... elif ... This is a generic example but I've written code like this many times and there is no simple way to say that all the foo.x values don't need to be computed more than once. Here it is rewritten to avoid recomputation: foo_a = foo.a if foo_a == 0: ... else: foo_b = foo.b if foo_a == 1 and foo_b == 0: ... else: foo_c = foo.c if foo_a >= 1 and foo_b >= 0 and foo_c = 0: ... else: ... Much harder to follow the logic. A simpler example where the same recomputation happens is: x = a and a.b and a.b.c and a.b.c.d which becomes x = a and a.b if x: x = x.c if x: x = x.d Ick. --- Bruce Check out my new puzzle book: http://J.mp/ingToConclusions Get it free here: http://J.mp/ingToConclusionsFree (available on iOS) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jun 7 07:59:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 7 Jun 2015 15:59:13 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas wrote: > Also, if we got my change, I could write code that cleanly hooks parsing in > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can > at least use it, and all of the relevant and complicated code would be > shared between the two versions. With your change, I'd have to write code > that was completely different for 3.6+ than what I could backport, meaning > I'd have to write, debug, and maintain two completely different > implementations. And again, for no benefit. I don't think I've said this explicitly yet, but I'm +1 on the idea of making it easier to "hack the token stream". As Andew has noted, there are two reasons this is an interesting level to work at for certain kinds of modifications: 1. The standard Python tokeniser has already taken care of converting the byte stream into Unicode code points, and the code point stream into tokens (including replacing leading whitespace with the structural INDENT/DEDENT tokens) 2. You get to work with a linear stream of tokens, rather than a precomposed tree of AST nodes that you have to traverse and keep consistent If all you're wanting to do is token rewriting, or to push the token stream over a network connection in preference to pushing raw source code or fully compiled bytecode, a bit of refactoring of the existing tokeniser/compiler interface to be less file based and more iterable based could make that easier to work with. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 7 08:15:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 7 Jun 2015 16:15:14 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: On 7 June 2015 at 15:50, Bruce Leban wrote: > This is a generic example but I've written code like this many times and > there is no simple way to say that all the foo.x values don't need to be > computed more than once. This is one of the key powers of JIT compilers like PyPy and Numba - they can detect that a calculation is repeated and avoid repeating it when the compiler knows the input values haven't changed. There is no way any syntax addition can compete for clarity with using existing already clear syntax and speeding its execution up implicitly. > Here it is rewritten to avoid recomputation: > > foo_a = foo.a > if foo_a == 0: > ... > > else: > foo_b = foo.b > if foo_a == 1 and foo_b == 0: > ... > > else: > > foo_c = foo.c > > if foo_a >= 1 and foo_b >= 0 and foo_c = 0: > ... > > else: > ... > > Much harder to follow the logic. It's hard to reason about whether or not logic is difficult to follow when using metasyntactic variables, as they're never self-documenting. There's also the fact that this *specific* example is why having expensive-to-calculate values accessed as attributes without some form of caching is a bad idea - it encourages folks to make their code harder to read too early in the development process. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan at bytereef.org Sun Jun 7 12:20:29 2015 From: stefan at bytereef.org (s.krah) Date: Sun, 07 Jun 2015 10:20:29 +0000 Subject: [Python-ideas] If branch merging In-Reply-To: <20150607052459.GG20701@ando.pearwood.info> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> Message-ID: <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Steven D'Aprano <steve at pearwood.info> wrote: On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at fastmail.us wrote: >> Well you could always go with if aisb = a == b. > No, that is a terrible design and a source of many bugs in languages > that allow it. > if a = expr: ... > Oops, I meant to compare a == expr, instead I assigned the result of the > expression to a. In C I've mistyped this perhaps twice, in which case you get a compiler warning. It's a complete non-issue (and the construct *is* very handy). Stefan Krah _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Sun Jun 7 12:30:48 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 06:30:48 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 01:59, Nick Coghlan wrote: > 1. The standard Python tokeniser has already taken care of converting > the byte stream into Unicode code points, and the code point stream > into tokens (including replacing leading whitespace with the > structural INDENT/DEDENT tokens) Remember that balanced brackets are important for this INDENT/DEDENT transformation. What should the parser do with indentation in the presence of a hook that consumes a sequence containing unbalanced or mixed brackets? From abarnert at yahoo.com Sun Jun 7 14:18:04 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 05:18:04 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Message-ID: On Jun 7, 2015, at 03:20, s.krah wrote: > > > > Steven D'Aprano wrote: > On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at fastmail.us wrote: > >> Well you could always go with if aisb = a == b. > > > No, that is a terrible design and a source of many bugs in languages > > that allow it. > > > if a = expr: ... > > > Oops, I meant to compare a == expr, instead I assigned the result of the > > expression to a. > In C I've mistyped this perhaps twice Then you must be an amazing programmer. Or maybe you don't code in C very much. Look through the commit history of any major early C project and you'll find plenty of these errors. Dennis Ritchie made this mistake more than two times just in the Unix source. Why do you think compilers added the warning? If this really were a non-issue that nobody ever faces in real life, no compiler vendor would have bothered to write a warning that will annoy people far more often than it helps. Or, if someone did just to satisfy some rare clumsy user, nobody else would have copied it. in which case you get a compiler warning. Of course you also get the compiler warning when you use this feature _intentionally_, which means it's actually not usable syntax (unless you like to ignore warnings from the compiler, or pepper your code with pragmas). Most compilers let you use some variant on the syntax, typically throwing meaningless extra parentheses around the assignment, to make the warning go away. But this implies that C chose the wrong syntax in the first place. If I were designing a new C-like language, I'd allow declarations, but not assignments, in the if condition (with the variable only live inside the if statement's scope). That would handle what you want 90% of the time, and usually better than the current rule, and would have no chance of confusing an assignment with a comparison, so the compiler warning would go away. But of course this is irrelevant to Python, which doesn't have variable declarations (or sub-function scopes). In Python, I think not allowing assignment in an if condition was the right choice. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Jun 7 14:20:54 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 7 Jun 2015 22:20:54 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Message-ID: On Sun, Jun 7, 2015 at 8:20 PM, s.krah wrote: > Steven D'Aprano wrote: > > On Sat, Jun 06, 2015 at 11:40:55PM -0400, random832 at fastmail.us wrote: >>> Well you could always go with if aisb = a == b. > >> No, that is a terrible design and a source of many bugs in languages >> that allow it. > >> if a = expr: ... > >> Oops, I meant to compare a == expr, instead I assigned the result of the >> expression to a. > > In C I've mistyped this perhaps twice, in which case you get a compiler > warning. > > It's a complete non-issue (and the construct *is* very handy). That's as may be, but Steven's still correct that the Pythonic way to do it would be with "as". In C, assignment is an expression ("=" is an operator that mutates its LHS and yields a value), but in Python, it simply isn't, and making it possible to do assignment in an 'if' condition would be a big change. with expr as name: except expr as name: if expr as name: Three parallel ways to do something and capture it. It makes reasonable sense, if someone can come up with a really compelling use-case. Personally, I'd be more inclined to seek the same thing for a while loop: while get_next_value() as value: # equivalent to while True: value = get_next_value() if not value: break as that's a somewhat more common idiom; but neither is hugely common. ChrisA From stefan at bytereef.org Sun Jun 7 14:55:23 2015 From: stefan at bytereef.org (s.krah) Date: Sun, 07 Jun 2015 12:55:23 +0000 Subject: [Python-ideas] Wg: Re: If branch merging In-Reply-To: References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Message-ID: <14dce1500cf.108c6a52d214760.5036404333476499795@bytereef.org> Andrew Barnert abarnert at yahoo.com wrote: >> In C I've mistyped this perhaps twice > Then you must be an amazing programmer. Or maybe you don't code in C very much. Or maybe I don't pontificate on mailing lists all day long. > Look through the commit history of any major early C project and you'll find plenty of these errors. Look through the commit history of CPython ... Stefan Krah -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Sun Jun 7 15:08:10 2015 From: stefan at bytereef.org (s.krah) Date: Sun, 07 Jun 2015 13:08:10 +0000 Subject: [Python-ideas] Wg: Re: If branch merging In-Reply-To: References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Message-ID: <14dce215ac6.ae8ecc03215231.2331767415048004642@bytereef.org> Chris Angelico<rosuav at gmail.com> wrote: >>> Oops, I meant to compare a == expr, instead I assigned the result of the >>> expression to a. >> >> In C I've mistyped this perhaps twice, in which case you get a compiler >> warning. >> >> It's a complete non-issue (and the construct *is* very handy). > That's as may be, but Steven's still correct that the Pythonic way to > do it would be with "as". I agree. I was mainly responding to the claim that it's a "major source of bugs" in languages that allow it. Stefan Krah -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Sun Jun 7 15:05:26 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 06:05:26 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: <4E9B5B1E-4054-4BCF-A6CC-001CF635B7D3@yahoo.com> On Jun 6, 2015, at 22:59, Nick Coghlan wrote: > > On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas > wrote: >> Also, if we got my change, I could write code that cleanly hooks parsing in >> 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people can >> at least use it, and all of the relevant and complicated code would be >> shared between the two versions. With your change, I'd have to write code >> that was completely different for 3.6+ than what I could backport, meaning >> I'd have to write, debug, and maintain two completely different >> implementations. And again, for no benefit. > > I don't think I've said this explicitly yet, but I'm +1 on the idea of > making it easier to "hack the token stream". As Andew has noted, there > are two reasons this is an interesting level to work at for certain > kinds of modifications: > > 1. The standard Python tokeniser has already taken care of converting > the byte stream into Unicode code points, and the code point stream > into tokens (including replacing leading whitespace with the > structural INDENT/DEDENT tokens) Actually, as I discovered while trying to hack in the change this afternoon, the C tokenizer doesn't actually take care of conveying the byte stream. It does take care of detecting the encoding, but what it hands to the parsetok function is still encoded bytes. The Python wrapper does transparently decode for you (in 3.x), but that actually just makes it harder to feed the output back into the parser, because the parser wants encoded bytes. (Also, as I mentioned before, it would be nice if the Python wrapper could just take Unicode in the first place, because the most obvious place to use this is in an import hook, where you can detect and decode the bytes yourself in as single line, and it's easier to just use the string than to encode it to UTF-8 so the tokenizer can detect UTF-8 so either the Python tokenizer wrapper or the C parser can decode it again...). Anyway, this part was at least easy to temporarily work around; the stumbling block that prevented me from finishing a working implementation this afternoon is a bit hairier. The C tokenizer hands the parser the current line (which can actually be multiple lines) and start and end pointers to characters within that line. It also hands it the current token string, but the parser ignores that and just reads from line+start to line+end. The Python tokenizer, on the other hand, gives you line number and (Unicode-based) column numbers for start and end. Converting those to encoded-bytes offsets isn't _that_ hard... but those are offsets into the original (encoded) line, so the parser is going to see the value of the original token rather than the token value(s) you're trying to substitute, which defeats the entire purpose. I was able to implement a hacky workaround using untokenize to fake the current line and provide offsets within that, but that means you get garbage from SyntaxErrors, and all your column numbers--and, worse, all your line numbers, if you add in a multi-line token--are off within the AST and bytecode. (And there may be other problems; those are just the ones I saw immediately when I tried it...) I think what I'm going to try next is to fork the whole parsetok function and write a version that uses the token's string instead of the substring of the line, and start and stop as offsets instead of pointers. I'm still not sure whether the token string and line should be in tok->encoding, UTF-8, UTF-32, or a PyUnicode object, but I'll figure that out as I do it.... Once I get that working for the wrapped-up token iterator, then I can see if I can reunify it with the existing version for the C tokenizer (without any performance penalty, and without breaking pgen). I'd hate to have two copies of that giant function to keep in sync. Meanwhile, I'm not sure what to do about tokens that don't have the optional start/stop/line values. Maybe just not allow them (just because untokenize can handle it doesn't mean ast.parse has to), or maybe just untokenize a fake line (and if any SyntaxErrors are ugly and undebuggable, well, don't skip those values...). The latter might be useful if for some reason you wanted to generate tokens on the fly instead of just munging a stream of tokens from source text you have available. I'm also not sure what to do about a few error cases. For example, if you feed the parser something that isn't iterable, or whose values aren't an iterable of iterables of length 2 to 5 with the right types, that really feels more like a TypeError than a SyntaxError (and that would also be a good way to signal the end user that the bug is in the token stream transformer rather than in the source code...), but raising a TypeError from within the parser requires a bit more refactoring (the tokenizer can't tell you what error to raise, just that the current token is an error along with a tokenizer error code--although I code add an E_NOTOK error code that the parser interprets as "raise a TypeError instead of a SyntaxError"), and I'm not sure whether that would affect any other stuff. Anyway, for the first pass I may just leave it as a SyntaxError, just to get something working. Finally, it might be nice if it were possible to generate a SyntaxError that showed the original source line but also told you that the tokens don't match the source (again, to signal the end user that he should look at what the hook did to his code, not just his code), but I'm not sure how necessary that is, or how easy it will be (it depends on how I end up refactoring parsetok). > If all you're wanting to do is token rewriting, or to push the token > stream over a network connection in preference to pushing raw source > code or fully compiled bytecode I didn't think about that use case at all, but that could be very handy. From abarnert at yahoo.com Sun Jun 7 15:24:27 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 06:24:27 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> Message-ID: On Jun 7, 2015, at 03:30, random832 at fastmail.us wrote: > >> On Sun, Jun 7, 2015, at 01:59, Nick Coghlan wrote: >> 1. The standard Python tokeniser has already taken care of converting >> the byte stream into Unicode code points, and the code point stream >> into tokens (including replacing leading whitespace with the >> structural INDENT/DEDENT tokens) > > Remember that balanced brackets are important for this INDENT/DEDENT > transformation. What should the parser do with indentation in the > presence of a hook that consumes a sequence containing unbalanced or > mixed brackets? I'm pretty sure that just doing nothing special here means you get a SyntaxError from the parser. Although I probably need more test cases. Anyway, this is one of those cases I mentioned where the SyntaxError can't actually show you what's wrong with the code, because the actual source doesn't have an error in it, only the transformed token stream. But there are easier ways to get that--just replace a `None` with a `with` in the token stream and you get an error that shows you a perfectly valid line, with no indication that a hook has screwed things up for you. I think we can at least detect that the tokens don't match the source line and throw in a note to go look for an installed token-transforming hook. It would be even nicer if we could show what the untokenized line looks like, so the user can see why it's an error. Something like this: File "", line 1 if spam is None: ^ SyntaxError: invalid syntax Tokens do not match input, parsed as if spam is with : Of course in the specific case you mentioned of unbalanced parens swallowing a dedent, the output still wouldn't be useful, but I'm not sure what we could show usefully in that case anyway. From random832 at fastmail.us Sun Jun 7 16:53:01 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 10:53:01 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> Message-ID: <1433688781.2945003.289004449.6EF8B40F@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 09:24, Andrew Barnert wrote: > I'm pretty sure that just doing nothing special here means you get a > SyntaxError from the parser. Although I probably need more test cases. I'm actually talking about what happens if the _untransformed_ stream contains an unbalanced bracket that the hook is supposed to eliminate (and/or supply the balancing one). My mental model of this idea was that the "lexer" generates the entire untransformed (but including indent/dedent magic etc) token sequence, then supplies it to the hook. From random832 at fastmail.us Sun Jun 7 17:04:03 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 11:04:03 -0400 Subject: [Python-ideas] If branch merging In-Reply-To: References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> Message-ID: <1433689443.2946990.289008641.623BEC86@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 08:20, Chris Angelico wrote: > with expr as name: > except expr as name: > if expr as name: > > Three parallel ways to do something and capture it. It makes > reasonable sense The problem is, "with" and "except" create a variable whose scope is limited to the enclosed block. The proposed "if... as..." does not, so it's misleading. If we don't like spelling it as = then invent a new operator, maybe := or something. Maybe even as. My wider point, though, was that there's no argument for the _functionality_ of allowing an assignment of the boolean condition of an if statement that can't be generalized to allowing inline assignment of anything (why not e.g. "if (expr as foo) > 5"? That, unlike the boolean, might even be something that would be useful within the enclosed block.) From rosuav at gmail.com Sun Jun 7 17:29:01 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 8 Jun 2015 01:29:01 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <1433689443.2946990.289008641.623BEC86@webmail.messagingengine.com> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> <1433689443.2946990.289008641.623BEC86@webmail.messagingengine.com> Message-ID: On Mon, Jun 8, 2015 at 1:04 AM, wrote: > > On Sun, Jun 7, 2015, at 08:20, Chris Angelico wrote: > > with expr as name: > > except expr as name: > > if expr as name: > > > > Three parallel ways to do something and capture it. It makes > > reasonable sense > > The problem is, "with" and "except" create a variable whose scope is > limited to the enclosed block. The proposed "if... as..." does not, so > it's misleading. Actually, they don't. A with block tends to create a broad expectation that the object will be used within that block, but it isn't scoped, and sometimes there's "one last use" of something outside of the block - for instance, a Timer context manager which uses __enter__ to start timing, __exit__ to stop timing, and then has attributes "wall" and "cpu" to tell you how much wall time and CPU time were used during that block. There are a lot of context managers that might as well have disappeared at the end of the with block (open files, psycopg2 cursors (but not connections), thread locks, etc), but they are technically still around. The "except" case is a slightly different one. Yes, the name is valid only within that block - but it's not a matter of scope, it's a deliberate unsetting. >>> e = 2.718281828 >>> try: 1/0 ... except Exception as e: pass ... >>> e Traceback (most recent call last): File "", line 1, in NameError: name 'e' is not defined There's no concept of nested/limited scope here, although I'm sure that this particular case could be turned into a subscope without breaking anyone's code (I honestly cannot imagine there being ANY code that depends on the name getting unset!), if Python ever grows support for subscopes that aren't associated with nested functions. Comprehensions do actually create their own scopes, but that's actually implemented with a function: >>> e = 2.718281828 >>> [e*2 for e in range(3)] [0, 2, 4] >>> e 2.718281828 >>> dis.dis(lambda: [e*2 for e in range(3)]) 1 0 LOAD_CONST 1 ( at 0x7fa4f1dae5d0, file "", line 1>) 3 LOAD_CONST 2 ('..') 6 MAKE_FUNCTION 0 9 LOAD_GLOBAL 0 (range) 12 LOAD_CONST 3 (3) 15 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 18 GET_ITER 19 CALL_FUNCTION 1 (1 positional, 0 keyword pair) 22 RETURN_VALUE Hence, the most logical way to handle conditions with 'as' clauses is to have them in the same scope. The special case for exceptions is because tracebacks would create refloops with the locals, and catching exceptions is extremely common. Nothing else needs that special case, so everything else can follow the 'with' block model and leave the name bound. ChrisA From ncoghlan at gmail.com Sun Jun 7 17:54:18 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 01:54:18 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> <1433689443.2946990.289008641.623BEC86@webmail.messagingengine.com> Message-ID: On 8 June 2015 at 01:29, Chris Angelico wrote: > There's no concept of nested/limited scope here, although I'm sure > that this particular case could be turned into a subscope without > breaking anyone's code (I honestly cannot imagine there being ANY code > that depends on the name getting unset!), if Python ever grows support > for subscopes that aren't associated with nested functions. The unsetting of bound exceptions is also merely a language quirk introduced to cope with the implicit exception chaining introduced in PEP 3134. The circular reference from the traceback frame back to the bound exception caused a lot of uncollectable cycles that were resolved by automatically dropping the frame's reference to the bound exception. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jun 7 18:12:28 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 02:12:28 +1000 Subject: [Python-ideas] Wg: Re: If branch merging In-Reply-To: <14dce1500cf.108c6a52d214760.5036404333476499795@bytereef.org> References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> <14dce1500cf.108c6a52d214760.5036404333476499795@bytereef.org> Message-ID: On 7 June 2015 at 22:55, s.krah wrote: > Andrew Barnert abarnert at yahoo.com wrote: > >>> In C I've mistyped this perhaps twice > >> Then you must be an amazing programmer. Or maybe you don't code in C very >> much. > > Or maybe I don't pontificate on mailing lists all day long. No need for that, folks. (And Stefan, you're definitely on the first half of Andrew's either/or statement there - not everyone is going to be aware that you wrote cdecimal, redesigned the memoryview implementation, etc). The C/C++ embedded assignment construct *is* a major source of bugs unless you really know what you're doing, which is why the compiler developers eventually relented and introduced a warning for it. It's a construct that trades clarity for brevity, so using it isn't always a clear win from a maintainability perspective. It's also heavily reliant on C's behaviour where assignment expression have a truth value that matches that or the RHS of the assignment expression, and the resulting longstanding conventions that have built up to take advantage of that fact. In Python, we've never had those conventions, so the alignment between "truth value you want to test" and "value you want to work with" isn't anywhere near as strong. In particular, testing for None via bool() is considered incorrect, while testing for a NULL pointer in C++ with a truth test is far more common. To get the same kind of utility as C/C++ embedded assignments provide, you really do need arbitrary embedded assignments. Only being able to name the result of an if or while clause would be too limiting to be useful, since you'd still need to find other ways to handle the cases where the value to be tested and the value you want to work with aren't quite the same. That's why this particular idea is oft-discussed-but-never-accepted - there are certain looping and conditional constructs that *do* become more convoluted in its absence, but the benefit of leaving it out is that those more convoluted constructs are needed to handle the general case anyway, so the special case would be an additional thing to learn that wouldn't actually make the language substantially more expressive than it already is. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Sun Jun 7 18:11:21 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 8 Jun 2015 02:11:21 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <1433648455.2978583.288785249.376E7D0A@webmail.messagingengine.com> <20150607052459.GG20701@ando.pearwood.info> <14dcd873161.1111973a8209999.4003977132021429653@bytereef.org> <1433689443.2946990.289008641.623BEC86@webmail.messagingengine.com> Message-ID: On Mon, Jun 8, 2015 at 1:54 AM, Nick Coghlan wrote: > On 8 June 2015 at 01:29, Chris Angelico wrote: >> There's no concept of nested/limited scope here, although I'm sure >> that this particular case could be turned into a subscope without >> breaking anyone's code (I honestly cannot imagine there being ANY code >> that depends on the name getting unset!), if Python ever grows support >> for subscopes that aren't associated with nested functions. > > The unsetting of bound exceptions is also merely a language quirk > introduced to cope with the implicit exception chaining introduced in > PEP 3134. The circular reference from the traceback frame back to the > bound exception caused a lot of uncollectable cycles that were > resolved by automatically dropping the frame's reference to the bound > exception. Right. I know the reason for it, and it's a special case for exceptions because of the traceback. (Though I'm not sure why exception chaining causes this. Was that the first place where the traceback - with its reference to locals - was made a part of the exception object?) If Python had a concept of nested scopes within functions, it'd make equal sense to have the "except X as e:" subscope shadow, rather than overwriting and unsetting, the outer "e". Since neither that nor the list comprehension is implemented with nested scopes, I think it's safe to say that "if cond as e:" wouldn't be either. ChrisA From robertc at robertcollins.net Mon Jun 8 00:19:05 2015 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 8 Jun 2015 10:19:05 +1200 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 6 June 2015 at 17:00, Nick Coghlan wrote: > On 6 June 2015 at 12:21, Neil Girdhar wrote: >> I'm curious what other people will contribute to this discussion as I think >> having no great parsing library is a huge hole in Python. Having one would >> definitely allow me to write better utilities using Python. > > The design of *Python's* grammar is deliberately restricted to being > parsable with an LL(1) parser. There are a great many static analysis > and syntax highlighting tools that are able to take advantage of that > simplicity because they only care about the syntax, not the full > semantics. > > Anyone actually doing their *own* parsing of something else *in* > Python, would be better advised to reach for PLY > (https://pypi.python.org/pypi/ply ). PLY is the parser underlying > https://pypi.python.org/pypi/pycparser, and hence the highly regarded > CFFI library, https://pypi.python.org/pypi/cffi > > Other notable parsing alternatives folks may want to look at include > https://pypi.python.org/pypi/lrparsing and > http://pythonhosted.org/pyparsing/ (both of which allow you to use > Python code to define your grammar, rather than having to learn a > formal grammar notation). Let me just pimp https://pypi.python.org/pypi/Parsley here - I have written languages in both Parsely (a simple packaging metadata language) and its predecessor pymeta (in which I wrote pybars - handlebars.js for python) - and both were good implementations of OMeta, IMNSHO. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From greg.ewing at canterbury.ac.nz Mon Jun 8 00:21:57 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Jun 2015 10:21:57 +1200 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <1433688781.2945003.289004449.6EF8B40F@webmail.messagingengine.com> References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> <1433688781.2945003.289004449.6EF8B40F@webmail.messagingengine.com> Message-ID: <5574C405.6060906@canterbury.ac.nz> random832 at fastmail.us wrote: > I'm actually talking about what happens if the _untransformed_ stream > contains an unbalanced bracket that the hook is supposed to eliminate I'm of the opinion that designing an input language to require or allow unmatched brackets is a bad idea. If nothing else, it causes grief for editors that do bracket matching. -- Greg From cgbeutler at gmail.com Mon Jun 8 03:06:42 2015 From: cgbeutler at gmail.com (Cory Beutler) Date: Sun, 7 Jun 2015 19:06:42 -0600 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: Thank you all for your responses. I didn't realize how much support this mailing list had. In response to several responses: It appears I have hit a soft spot with the 'as' keyword. It seems clear to me that inlining an assignment confuses scope. With any inline solution, that confusion will exist. Now, I will say that I do not like 'if aisb = a == b' because of the potential errors, as others have mentioned. A language should be written as much for the beginners as the experts, or it will never live very long. Avoiding absentminded mistakes is always good to do. There are many other possible solutions from a comma, as in "if a == b, aisb:", to a custom language addition of a new keyword or operator. Irregardless of how inline assignment is written, the scope issue will still exist. As such, it is more important to decide if it is needed first. The fact that this idea has been brought up before means that it deserves some research. Perhaps I can do some analytics and return with more info on where it could be used and if it will actually provide any speed benefits. Ok, that was a bit of a shotgun response to many remarks. Hopefully it will suffice. Thanks again for all the feedback. I would now like to respond to Steven's response directly: On Sat, Jun 6, 2015 at 11:19 PM, Steven D'Aprano wrote: > On Sat, Jun 06, 2015 at 09:03:38PM -0600, Cory Beutler wrote: > > [...] > > This would simplify some logic expressions by allowing the merging of > > branched code. > > > > *Examples of use:* > > *Duplicate code in if-chains may be reduced:* > > # Old Version > > if a == b: > > print ('a == b') > > foo() # <-- duplicate code > > elif b == c: > > print ('b == c') > > foo() # <-- duplicate code > > elif c == d: > > print ('c == d') > > foo() # <-- duplicate code > > if a == b: > print('a == b') > elif b == c: > print('b == c') > elif c == d: > print('c == d') > foo() > > No new syntax required. > > The functionally is not the same. In your example 'foo' gets called even if none of the conditions are true. The above example only runs 'foo' if it enters one of the if-blocks. This basic layout is useful for various parsing and reading operations. It is nice to check if something fits various conditions, then after the specific handling, add on finishing details in a 'foo'-like context. > > > *Many nested 'if' statements could now be a more linear style:* > > # Old Version > > if a == b: > > print ('a == b') > > if b == c: > > print ('b == c') > > print ('end if') > > What's wrong with that code? Nesting the code like that follows the > logic of the code: the b==c test *only* occurs if a==b. > > > > # New Version > > if a == b: > > print ('a == b') > > alif b == c: > > print ('b == c') > > also: > > print ('end if') > > I consider this significantly worse. It isn't clear that the comparison > between b and c is only made if a == b, otherwise it is entirely > skipped. > > It may only be worse because you are not used to reading it. This type of syntax looks simple once you know how the pieces work. I mean, you know that having multiple if-elif statements will result in only checking conditions until one passes. The 'also' mentality would be the same, but backwards. > *Selective Branch merging:* > > One limitation of the 'also' and 'alif' keywords is the restriction to > the > > "all of the above" checking. What I mean by that is that there is no way > to > > pick and choose which branches to merge back together. When using 'also' > > and 'alif' you are catching all previous if-branches. One easy way to > solve > > this would be to allow for named branching. The most simple way to do > this > > is to save the conditions of each branch into a variable with a name. > Here > > is an example of merging only select branches together: > > # Old Version > > if a == b: > > print ('a == b') > > elif a == c: > > print ('a == c') > > elif a == d: > > print ('a == d') > > if (a == b) or (a == d): > > print ('a == b and a == d') > > That code is wrong. Was that an intentional error? The final branch > prints that a == b == d, but that's not correct, it runs when either > a == b or a == d, not just when both are true. > Yeah, that was a mistype. That is why I shouldn't program fake code late at night. It does warm my heart to see that 2 people have corrected my fake code. That means it is easy to learn and understand. > > Personally, I would write that as: > > if a == b or a == d: > if a == b: > print('a == b') > else: > print('a == d') > print('a == b or a == d') > elif a == c: > print('a == c') > > You do end up comparing a and b for equality twice, but worrying about > that is likely to be premature optimization. It isn't worth adding > syntax to the language just for the one time in a million that actually > matters. > > With that rearrangement, you could write it with 'also': if a == b: print('a == b') elif a == d: print('a == d') also: print('a == b or a == d') elif a == c: print('a == c') but that does not demonstrate the selective branch merging. It would not work so well to combine the next two 'elif' statements, as any other 'also' blocks would capture the 'a == b', 'a == d', and first 'also' branches. I guess what I mean to say is that this example is a little too dumbed down. I think you know what I am after, though. If 'a == b' is a heavy duty calculation, it would be nice to be able to store that inline. Thank you, Steven, for your objective view of things. It is has been useful to see an outside perspective. I look forward to your future input. -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Jun 8 03:37:50 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 21:37:50 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <5574C405.6060906@canterbury.ac.nz> References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> <1433673048.3081290.288899641.466DD5BB@webmail.messagingengine.com> <1433688781.2945003.289004449.6EF8B40F@webmail.messagingengine.com> <5574C405.6060906@canterbury.ac.nz> Message-ID: <1433727470.3823479.289296537.7FFA3EE7@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 18:21, Greg Ewing wrote: > random832 at fastmail.us wrote: > > I'm actually talking about what happens if the _untransformed_ stream > > contains an unbalanced bracket that the hook is supposed to eliminate > > I'm of the opinion that designing an input language > to require or allow unmatched brackets is a bad > idea. If nothing else, it causes grief for editors > that do bracket matching. Suppose one of the brackets is quoted somehow, or mismatched bracket pairs such as [this), perhaps including tokens that are not normally considered brackets, are somehow meaningful. From random832 at fastmail.us Mon Jun 8 03:40:00 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 21:40:00 -0400 Subject: [Python-ideas] If branch merging Message-ID: <1433727600.3823666.289298017.740A35C6@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 21:06, Cory Beutler wrote: > Thank you all for your responses. I didn't realize how much support this > mailing list had. > > In response to several responses: > > It appears I have hit a soft spot with the 'as' keyword. I don't have an issue with the as keyword, I was just pointing out that it disguises the fact that what you're really asking for seems to be general assignment expressions, since there is no particular rationale to constrain it to the boolean condition of if statements. From random832 at fastmail.us Mon Jun 8 03:42:14 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Sun, 07 Jun 2015 21:42:14 -0400 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: <1433727734.3824336.289300161.549D946E@webmail.messagingengine.com> On Sun, Jun 7, 2015, at 21:06, Cory Beutler wrote: > Thank you all for your responses. I didn't realize how much support this > mailing list had. > > In response to several responses: > > It appears I have hit a soft spot with the 'as' keyword. I don't have an issue with the as keyword, I was just pointing out that it disguises the fact that what you're really asking for seems to be general assignment expressions, since there is no particular rationale to constrain it to the boolean condition of if statements. From ncoghlan at gmail.com Mon Jun 8 04:08:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 12:08:46 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: On 8 Jun 2015 11:07, "Cory Beutler" wrote: > > Thank you all for your responses. I didn't realize how much support this mailing list had. > > In response to several responses: > > It appears I have hit a soft spot with the 'as' keyword. It seems clear to me that inlining an assignment confuses scope. With any inline solution, that confusion will exist. Not really, as we have a number of inline assignment and renaming constructs, and they all use "as" (import, with statements, exception handlers). For loops, function definitions and class definitions also help establish the behaviour of name bindings in compound statement header lines affecting the containing scope rather than only affecting the internal suite. The exception handler case is the odd one out, since that includes an implied "del" whenever execution leaves the cobtained suite. Any form of inline assignment that doesn't use "as NAME" will need a good justification. (It's also worth noting that "as" clauses are specifically for binding to a name, while the LHS of an assignment statement allows attributes, indexing, slicing and tuple unpacking) > Avoiding absentminded mistakes is always good to do. There are many other possible solutions from a comma, as in "if a == b, aisb:", to a custom language addition of a new keyword or operator. Commas are generally out, due to the ambiguity with tuple construction. > Irregardless of how inline assignment is written, the scope issue will still exist. As such, it is more important to decide if it is needed first. The fact that this idea has been brought up before means that it deserves some research. Perhaps I can do some analytics and return with more info on where it could be used and if it will actually provide any speed benefits. In this particular case, the variant that has always seemed most attractive to me in past discussions is a general purpose "named subexpression" construct that's just a normal local name binding operation affecting whatever namespace the expression is executed in. In the simple if statement case, it wouldn't be much different from having a separate assignment statement before the if statement, but in a while loop it would be executed on each iteration, in an elif it could make the results of subcalculations available to subsequent elif clauses without additional nesting, and in the conditional expression and comprehension cases it could make part of the condition calculation available to the result calculation. It would certainly be possible for folks to go overboard with such a construct and jam way too much into a single expression for it to be readable, but that's already the case today, and the way to handle it would remain the same: refactoring the relevant code to make it easier for readers to follow and hence maintain. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Jun 8 04:20:06 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 22:20:06 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sat, Jun 6, 2015 at 6:52 PM, Andrew Barnert wrote: > On Jun 6, 2015, at 09:23, Neil Girdhar wrote: > > On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert wrote: > >> On Jun 5, 2015, at 22:50, Neil Girdhar wrote: >> >> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert >> wrote: >> >>> First, I think your idea is almost completely tangential to mine. Yes, >>> if you completely replaced both the interface and the implementation of the >>> parser, you could do just about anything you wanted. But assuming nobody is >>> going to completely replace the way Python does parsing today, I think it's >>> still useful to add the one missing useful hook to the existing system. But >>> let's continue. >>> >>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar >>> wrote: >>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert >>> wrote: >>> > >>> If you want more background, see >>> >>> http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html >>> (which I wrote to explain to someone else how floatliteralhack works). >>> >> >> Yes. I want to point that if the lexer rules were alongside the parser, >> they would be generating ast nodes ? so the hook for calling Decimal for >> all floating point tokens would be doable in the same way as your AST hook. >> >> >> No. The way Python currently exposes things, the AST hook runs on an >> already-generated AST and transforms it into another one, to hand off to >> the code generator. That means it can only be used to handle things that >> parse as legal Python syntax (unless you replace the entire parser). >> >> What I want is a way to similarly take an already-generated token stream >> and transform it into another one, to hand off to the parser. That will >> allow it to be used to handle things that lex as legal Python tokens but >> don't parse as legal Python syntax, like what Paul suggested. Merging >> lexing into parsing not only doesn't give me that, it makes that impossible. >> > > Yes, and I what I was suggesting is for the lexer to return AST nodes, so > it would be fine to process those nodes in the same way. > > > Seriously? > > Tokens don't form a tree, they form a list. Yes, every linked list is just > a degenerate tree, so you could have every "node" just include the next one > as a child. But why? Do you want to then the input text into a tree of > character nodes? > When the work done by the lexer is done by the parser, the characters contained in a lexical node will be siblings. Typically, in Python a tree is represented as nodes with iterable children, so the characters would just be a string. > > Python has all kinds of really good tools for dealing with iterables; why > take away those tools and force me to work with a more complicated > abstraction that Python doesn't have any tools for dealing with? > The stream would still be just an iterable. > > In the case of the user-defined literal hack, for example, I can use the > adjacent-pairs recipe from itertools and my transformation becomes trivial. > I did it more explicitly in the hack I uploaded, using a generator function > with a for statement, just to make it blindingly obvious what's happening. > But if I had to deal with a tree, I'd either have to write explicit > lookahead or store some state explicitly on the tree or the visitor. That > isn't exactly _hard_, but it's certainly _harder_, and for no benefit. > > Also, if we got my change, I could write code that cleanly hooks parsing > in 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people > can at least use it, and all of the relevant and complicated code would be > shared between the two versions. With your change, I'd have to write code > that was completely different for 3.6+ than what I could backport, meaning > I'd have to write, debug, and maintain two completely different > implementations. And again, for no benefit. > > And finally, once again: we already have a token stream as part of the > process, we already expose every other interesting step in the process, > exposing the token stream as it exists today in a way that fits into > everything else as it exists today is clearly the easiest and least > disruptive thing to do. Sometimes it's worth doing harder or more > disruptive things because they provide more benefit, but you haven't yet > shown any benefit. > > You would still be able to do all this stuff. > You asked me for examples, and I provided them. Why don't you try writing > a couple of actual examples--user literals, the LINQ-ish example from > MacroPy, whatever--using your proposed design to show us how they could be > simpler, or more elegant, or open up further possibilities. Or come up with > an example of something your design could do that the existing one (even > with my small proposed change) can't. > > If I find time, I'll do that. I will explain my solution in another message. > For the new tokens that you want, the ideal solution I think is to modify >> the python parsing grammar before it parses the text. >> >> >> But I don't want any new tokens. I just want to change the way existing >> tokens are interpreted. >> >> Just as with an AST hook like PyMacro, I don't want any new nodes, I just >> want to change the way existing nodes are interpreted. >> >> > Yes, I see *how* you're trying to solve your problem, but my preference is > to have one kind of hook rather than two kinds by unifying lexing and > parsing. I think that's more elegant. > > > I'm trying to find a way to interpret this that makes sense. I think > you're suggesting that we should throw out the idea of letting users write > and install simple post-processing hooks in Python, because that will force > us to find a way to instead make the entire parsing process > user-customizable at runtime, which will force users to come up with "more > elegant" solutions involving changing the grammar instead of > post-processing it macro-style. > > If so, I think that's a very bad idea. Decades of practice in both Python > and many other languages (especially those with built-in macro facilities) > shows that post-processing at the relevant level is generally simple and > elegant. Even if we had a fully-runtime-customizable parser, something like > OMeta but "closing the loop" and implementing the language in the > programmable metalanguage, many things are still simpler and more elegant > written post-processing style (as used by existing import hooks, including > MacroPy, and in other languages going all the way back to Lisp), and > there's a much lower barrier to learning them, and there's much less risk > of breaking the compiler/interpreter being used to run your hook in the > first place. And, even if none of that were true, and your new and improved > system really were simpler in every case, and you had actually built it > rather than just envisioning it, there's still backward compatibility to > think of. Do you really want to break working, documented functionality > that people have written things like MacroPy on top of, even if forcing > them to redesign and rewrite everything from scratch would force them to > come up with a "more elegant" solution? And finally, the added flexibility > of such a system is a cost as well as a benefit--the fact that Arc makes it > as easy as possible to "rewrite the language into one that makes writing > your application trivial" also means that one Arc programmer can't > understand another's code until putting in a lot of effort to learn his > idiosyncratic language. > I understand that you are motivated by a specific problem. However, your solution does not solve the general problem. If you only allow transformations of the token stream, the token set is fixed. Transformations of the token stream also hide ? even in your example ? the fact that you're actually building what is conceptually a subtree. It makes more sense to me to solve the problem in general, once and for all. (1) Make it easy to change the grammar, and (2) make lexing part of the grammar. Now, you don't have to change the grammar to solve some problems. Sometimes, you can just use AST transformers to accomplish what you were doing with a lexical transformer. That's nice because it's one less thing to learn. Sometimes, you need the power of changing the grammar. That is already coming with the popularity of languages like Theano. I really want to transform Python code into Theano, and for that it may be more elegant to change the grammar. > > I don't know about OMeta, but the Earley parsing algorithm is worst-cast >> cubic time "quadratic time for unambiguous grammars, and linear time for >> almost all LR(k) grammars". >> >> >> I don't know why you'd want to use Earley for parsing a programming >> language. IIRC, it was the first algorithm that could handle rampant >> ambiguity in polynomial time, but that isn't relevant to parsing >> programming languages (especially one like Python, which was explicitly >> designed to be simple to parse), and it isn't relevant to natural languages >> if you're not still in the 1960s, except in learning the theory and history >> of parsing. GLR does much better in almost-unambiguous/almost-deterministic >> languages; CYK can be easily extended with weights (which propagate >> sensibly, so you can use them for a final judgment, or to heuristically >> prune alternatives as you go); Valiant is easier to reason about >> mathematically; etc. And that's just among the parsers in the same basic >> family as Earley. >> > > I suggested Earley to mitigate this fear of "exponential backtracking" > since that won't happen in Earley. > > > I already explained that using standard PEG with a packrat parser instead > of extended PEG with an OMeta-style parser gives you linear time. Why do > you think telling me about a decades-older cubic-time algorithm designed > for parsing natural languages that's a direct ancestor to two other > algorithms I also already mentioned is going to be helpful? Do you not > understand the advantages of PEG or GLR over Earley? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Jun 8 04:18:30 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Jun 2015 12:18:30 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: <20150608021829.GI20701@ando.pearwood.info> On Sun, Jun 07, 2015 at 07:06:42PM -0600, Cory Beutler wrote: [...] > The functionally is not the same. In your example 'foo' gets called even if > none of the conditions are true. The above example only runs 'foo' if it > enters one of the if-blocks. Ah yes, of course you are correct. > > > *Many nested 'if' statements could now be a more linear style:* > > > # Old Version > > > if a == b: > > > print ('a == b') > > > if b == c: > > > print ('b == c') > > > print ('end if') > > > > What's wrong with that code? Nesting the code like that follows the > > logic of the code: the b==c test *only* occurs if a==b. > > > > > > > # New Version > > > if a == b: > > > print ('a == b') > > > alif b == c: > > > print ('b == c') > > > also: > > > print ('end if') > > > > I consider this significantly worse. It isn't clear that the comparison > > between b and c is only made if a == b, otherwise it is entirely > > skipped. > > > > > It may only be worse because you are not used to reading it. This type of > syntax looks simple once you know how the pieces work. I mean, you know > that having multiple if-elif statements will result in only checking > conditions until one passes. The 'also' mentality would be the same, but > backwards. In the first case, your b==c test only occurs if a==b, which can be easily seen from the structure of the code: if a == b: everything here occurs only when a == b including the b == c test In the second case, there is no hint from the structure: if a == b: ... alif b == c: As you read down the left hand column, you see "if a == b" and you can mentally say "that block only occurs if a == b" and move on. But when you get to the alif block, you have to stop reading forward and go back up to understand whether it runs or not. It's not like elif, which is uneffected by any previous if or elif clauses. Each if/elif clause is independent. The test is always made (assuming execution reaches that line of code at all), and you can decide whether the block is entered or not by looking at the if/elif line alone: ... elif some_condition(): block Here, nothing above the "elif" line matters. If I reach that line, some_condition() *must* be evaluated, and the block entered if it evaluates to a truthy value. It's easy to understand. But: ... alif some_condition(): block I cannot even tell whether some_condition() is called or not. The structure gives no hint as to whether the alif line is reachable. It looks like it is at the same semantic level as the distant "if" line somewhere far above it, but it isn't. Whether it runs or not is dependent on the distant "if" and "elif" lines above it. By it's nature, this cannot be simple, since it introduces coupling between the alif line you are reading and one or more distant lines above it, while disguising the structure of the code by aligning the alif with the if even though it is conceptually part of the if block. -- Steve From mistersheik at gmail.com Mon Jun 8 04:23:59 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 22:23:59 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan wrote: > On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas > wrote: > > Also, if we got my change, I could write code that cleanly hooks parsing > in > > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people > can > > at least use it, and all of the relevant and complicated code would be > > shared between the two versions. With your change, I'd have to write code > > that was completely different for 3.6+ than what I could backport, > meaning > > I'd have to write, debug, and maintain two completely different > > implementations. And again, for no benefit. > > I don't think I've said this explicitly yet, but I'm +1 on the idea of > making it easier to "hack the token stream". As Andew has noted, there > are two reasons this is an interesting level to work at for certain > kinds of modifications: > > 1. The standard Python tokeniser has already taken care of converting > the byte stream into Unicode code points, and the code point stream > into tokens (including replacing leading whitespace with the > structural INDENT/DEDENT tokens) > I will explain in another message how to replace the indent and dedent tokens so that the lexer loses most of its "magic" and becomes just like the parser. > > 2. You get to work with a linear stream of tokens, rather than a > precomposed tree of AST nodes that you have to traverse and keep > consistent > The AST nodes would contain within them the linear stream of tokens that you are free to work with. The AST also encodes the structure of the tokens, which can be very useful if only to debug how the tokens are being parsed. You might find yourself, when doing a more complicated lexical transformation, trying to reverse engineer where the parse tree nodes begin and end in the token stream. This would be a nightmare. This is the main problem with trying to process the token stream "blind" to the parse tree. > > If all you're wanting to do is token rewriting, or to push the token > stream over a network connection in preference to pushing raw source > code or fully compiled bytecode, a bit of refactoring of the existing > tokeniser/compiler interface to be less file based and more iterable > based could make that easier to work with. > You can still do all that with the tokens included in the parse tree. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 8 04:33:59 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Jun 2015 11:33:59 +0900 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> Cory Beutler writes: > It may only be worse because you are not used to reading it. This > type of syntax looks simple once you know how the pieces work. I > mean, you know that having multiple if-elif statements will result > in only checking conditions until one passes. The 'also' mentality > would be the same, but backwards. And that inversion is what underlies Steven's point, I think. I see your point, *but only if 'elif' goes away*. Currently the "hangindent" formatting of if ... elif ... else signals a series of alternatives, as similar formatting does (as a convention, rather than syntax) in many other languages. This makes scanning either actions or conditions fairly easy; you don't have to actually read the "elif"s to understand the alternative structure. With also and alif, you now have to not only read the keywords, you have to parse the code to determine what conditions are actually in force. This is definitely a readability minus, a big one. It doesn't help that "else" and "also" and "elif" and "alif" are rather visually confusable pairs, but at this point that's a bikeshed painting issue (except that as proponent you might want to paint it a different color for presentation). There's also the "dangling also" issue: I would suppose that also has all the problems of "dangling else", and some new ones besides. For example, since "elif" really is "else if" (not a C-like "case"), it's easy to imagine situations where you'd like to have one also or alif for the first three cases, and one for the next two, etc. Python, being a language for grownups, could always add a convention that you should generally only use also and alif at the end of an if ... elif ... else series or something like that, but I think that would seriously impair the usefulness of these constructs. I'm definitely -1 on the also, alif syntax at this point. On the other hand, having done a lot of C programming in my misspent youth, I do miss anaphoric conditionals, so I too would like to see the possibility of "if cond as var: do_something_with_var" explored. Of course Nick is right that automatic common subexpression elimination (CSE) is the big win, but manual CSE can improve readability. From mistersheik at gmail.com Mon Jun 8 04:37:36 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 22:37:36 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: The best parsing library in Python I could find to my eyes is modgrammar: https://pythonhosted.org/modgrammar/ It's GLR I think. The documentation isn't bad and the syntax isn't too bad. The major change that I want to make to it is to replace the grammar class variables with regular instance generator methods, and to replace the components of the grammar return value, which are currently classes, with constructed objects. That way, a whitespace object that represents a block continuation can be constructed to know how much whitespace it must match. Similarly, a "suite" can include a constructed whitespace object that includes extra space. After it's matched, it can be queried for its size, and the grammar generator method can construct whitespace objects with the appropriate size. This eliminates the need for INDENT and DEDENT tokens. This kind of dynamic grammar generation is desirable for all kinds of other language related problems, like the LaTeX one I discussed, and it also allows us to merge all of the validation code into the parsing code, which follows "Don't Repeat Yourself". I think it's a better design. I will try to find time to build a demo of this this week. Ultimately, my problem with "token transformers" is, if I'm understanding correctly, that we want to change Python so that not only will 3.5 have Token transformers, but every Python after that has to support this. This risks constraining the development of the elegant solution. And for what major reason do we even need token transformers so soon? For a toy example on python ideas about automatic Decimal instances? Why can't a user define a one character function "d(x)" to do the conversion everywhere? I prefer to push for the better design even if it means waiting a year. Best, Neil On Sun, Jun 7, 2015 at 6:19 PM, Robert Collins wrote: > On 6 June 2015 at 17:00, Nick Coghlan wrote: > > On 6 June 2015 at 12:21, Neil Girdhar wrote: > >> I'm curious what other people will contribute to this discussion as I > think > >> having no great parsing library is a huge hole in Python. Having one > would > >> definitely allow me to write better utilities using Python. > > > > The design of *Python's* grammar is deliberately restricted to being > > parsable with an LL(1) parser. There are a great many static analysis > > and syntax highlighting tools that are able to take advantage of that > > simplicity because they only care about the syntax, not the full > > semantics. > > > > Anyone actually doing their *own* parsing of something else *in* > > Python, would be better advised to reach for PLY > > (https://pypi.python.org/pypi/ply ). PLY is the parser underlying > > https://pypi.python.org/pypi/pycparser, and hence the highly regarded > > CFFI library, https://pypi.python.org/pypi/cffi > > > > Other notable parsing alternatives folks may want to look at include > > https://pypi.python.org/pypi/lrparsing and > > http://pythonhosted.org/pyparsing/ (both of which allow you to use > > Python code to define your grammar, rather than having to learn a > > formal grammar notation). > > Let me just pimp https://pypi.python.org/pypi/Parsley here - I have > written languages in both Parsely (a simple packaging metadata > language) and its predecessor pymeta (in which I wrote pybars - > handlebars.js for python) - and both were good implementations of > OMeta, IMNSHO. > > -Rob > > -- > Robert Collins > Distinguished Technologist > HP Converged Cloud > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 8 04:42:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Jun 2015 11:42:23 +0900 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> Message-ID: <874mmisyy8.fsf@uwakimon.sk.tsukuba.ac.jp> Nick Coghlan writes: > (It's also worth noting that "as" clauses are specifically for binding to a > name, while the LHS of an assignment statement allows attributes, indexing, > slicing and tuple unpacking) +1 (and the point that it's a *binding*, not an assignment, deserves a lot more than a parenthesized aside). > In this particular case, the variant that has always seemed most attractive > to me in past discussions is a general purpose "named subexpression" > construct that's just a normal local name binding operation affecting > whatever namespace the expression is executed in. Yes, please! From ncoghlan at gmail.com Mon Jun 8 04:42:31 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 12:42:31 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 8 June 2015 at 12:23, Neil Girdhar wrote: > > > On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan wrote: >> >> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas >> wrote: >> > Also, if we got my change, I could write code that cleanly hooks parsing >> > in >> > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people >> > can >> > at least use it, and all of the relevant and complicated code would be >> > shared between the two versions. With your change, I'd have to write >> > code >> > that was completely different for 3.6+ than what I could backport, >> > meaning >> > I'd have to write, debug, and maintain two completely different >> > implementations. And again, for no benefit. >> >> I don't think I've said this explicitly yet, but I'm +1 on the idea of >> making it easier to "hack the token stream". As Andew has noted, there >> are two reasons this is an interesting level to work at for certain >> kinds of modifications: >> >> 1. The standard Python tokeniser has already taken care of converting >> the byte stream into Unicode code points, and the code point stream >> into tokens (including replacing leading whitespace with the >> structural INDENT/DEDENT tokens) > > > I will explain in another message how to replace the indent and dedent > tokens so that the lexer loses most of its "magic" and becomes just like the > parser. I don't dispute that this *can* be done, but what would it let me do that I can't already do today? I addition, how will I be able to continue to do all the things that I can do today with the separate tokenisation step? *Adding* steps to the compilation toolchain is doable (one of the first things I was involved in CPython core development was the introduction of the AST based parser in Python 2.5), but taking them *away* is much harder. You appear to have an idealised version of what a code generation toolchain "should" be, and would like to hammer CPython's code generation pipeline specifically into that mould. That's not the way this works - we don't change the code generator for the sake of it, we change it to solve specific problems with it. Introducing the AST layer solved a problem. Introducing an AST optimisation pass would solve a problem. Making the token stream easier to manipulate would solve a problem. Merging the lexer and the parser doesn't solve any problem that we have. >> 2. You get to work with a linear stream of tokens, rather than a >> precomposed tree of AST nodes that you have to traverse and keep >> consistent > > The AST nodes would contain within them the linear stream of tokens that you > are free to work with. The AST also encodes the structure of the tokens, > which can be very useful if only to debug how the tokens are being parsed. > You might find yourself, when doing a more complicated lexical > transformation, trying to reverse engineer where the parse tree nodes begin > and end in the token stream. This would be a nightmare. This is the main > problem with trying to process the token stream "blind" to the parse tree. Anything that cares about the structure to that degree shouldn't be manipulating the token stream - it should be working on the parse tree. >> If all you're wanting to do is token rewriting, or to push the token >> stream over a network connection in preference to pushing raw source >> code or fully compiled bytecode, a bit of refactoring of the existing >> tokeniser/compiler interface to be less file based and more iterable >> based could make that easier to work with. > > You can still do all that with the tokens included in the parse tree. Not as easily, because I have to navigate the parse tree even when I don't care about that structure, rather than being able to just look at the tokens in isolation. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Mon Jun 8 04:45:49 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 8 Jun 2015 12:45:49 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull wrote: > I'm definitely -1 on the also, alif syntax at this point. On the > other hand, having done a lot of C programming in my misspent youth, I > do miss anaphoric conditionals, so I too would like to see the > possibility of "if cond as var: do_something_with_var" explored. Of > course Nick is right that automatic common subexpression elimination > (CSE) is the big win, but manual CSE can improve readability. Part of the trouble with depending on CSE is that Python is so dynamic that you can't depend on things having no side effects... but the more important part, in my opinion, is that duplication is a source code maintenance problem. Bruce suggested this: x = a and a.b and a.b.c and a.b.c.d # which becomes x = a and a.b if x: x = x.c if x: x = x.d and frankly, I'd be more worried about a subsequent edit missing something than I would be about the performance of all the repeated lookups. Of course, Python does have an alternative, and that's to use attribute absence rather than falsiness: try: x = a.b.c.d except AttributeError: x = None But that won't always be an option. And any kind of expression that says "the thing on the left, if it's false, otherwise the thing on the left modified by this operator" is likely to get messy in anything more than trivial cases; it looks great here: x = a?.b?.c?.d but now imagine something more complicated, and it's a lot more messy. ChrisA From abarnert at yahoo.com Mon Jun 8 04:44:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 19:44:24 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <20150608021829.GI20701@ando.pearwood.info> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <20150608021829.GI20701@ando.pearwood.info> Message-ID: On Jun 7, 2015, at 19:18, Steven D'Aprano wrote: > > As you read down the left hand column, you see "if a == b" and you can > mentally say "that block only occurs if a == b" and move on. But when > you get to the alif block, you have to stop reading forward and go back > up to understand whether it runs or not. Thanks for putting it this way. I knew there was a more fundamental problem, but I couldn't see it until your message. The proposal is closely analogous to trying to define a Boolean predicate in a list GUI instead of a tree. And that means it has the exact same problems that the early MS Office and Visual C++ Find in File dialogs had. Besides the obvious fact that mixing conjunctions and disjunctions without grouping (via nesting) is insufficiently powerful for many real-life predicates (which is exactly why the proposal needs the assignment-like add-on), even in the simple cases where it works, it's not readable (which is why the examples had at least one mistake, and at least one person misread one of the other examples). If your eye has to travel back upwards to the last also, but the alsos are flush against the left with the elifs instead of nested differently, you have to make an effort to parse each clause in your head, which is not true for a flat chain of elifs. At any rate, as two people (I think Stephen and Nick) suggested, the second half of the proposal (the as-like binding) nearly eliminates the need for the first half, and doesn't have the same problem. The biggest problem it has is that you want the same syntax in other places besides if conditions, which is a better problem to have. From mistersheik at gmail.com Mon Jun 8 04:47:16 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 22:47:16 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On Sun, Jun 7, 2015 at 10:42 PM, Nick Coghlan wrote: > On 8 June 2015 at 12:23, Neil Girdhar wrote: > > > > > > On Sun, Jun 7, 2015 at 1:59 AM, Nick Coghlan wrote: > >> > >> On 7 June 2015 at 08:52, Andrew Barnert via Python-ideas > >> wrote: > >> > Also, if we got my change, I could write code that cleanly hooks > parsing > >> > in > >> > 3.6+, but uses the tokenize/untokenize hack for 2.7 and 3.5, so people > >> > can > >> > at least use it, and all of the relevant and complicated code would be > >> > shared between the two versions. With your change, I'd have to write > >> > code > >> > that was completely different for 3.6+ than what I could backport, > >> > meaning > >> > I'd have to write, debug, and maintain two completely different > >> > implementations. And again, for no benefit. > >> > >> I don't think I've said this explicitly yet, but I'm +1 on the idea of > >> making it easier to "hack the token stream". As Andew has noted, there > >> are two reasons this is an interesting level to work at for certain > >> kinds of modifications: > >> > >> 1. The standard Python tokeniser has already taken care of converting > >> the byte stream into Unicode code points, and the code point stream > >> into tokens (including replacing leading whitespace with the > >> structural INDENT/DEDENT tokens) > > > > > > I will explain in another message how to replace the indent and dedent > > tokens so that the lexer loses most of its "magic" and becomes just like > the > > parser. > > I don't dispute that this *can* be done, but what would it let me do > that I can't already do today? I addition, how will I be able to > continue to do all the things that I can do today with the separate > tokenisation step? > > *Adding* steps to the compilation toolchain is doable (one of the > first things I was involved in CPython core development was the > introduction of the AST based parser in Python 2.5), but taking them > *away* is much harder. > > You appear to have an idealised version of what a code generation > toolchain "should" be, and would like to hammer CPython's code > generation pipeline specifically into that mould. That's not the way > this works - we don't change the code generator for the sake of it, we > change it to solve specific problems with it. > > Introducing the AST layer solved a problem. Introducing an AST > optimisation pass would solve a problem. Making the token stream > easier to manipulate would solve a problem. > > Merging the lexer and the parser doesn't solve any problem that we have. > You're right. And as usual, Nick, your analysis is spot on. My main concern is that the idealized way of parsing the language is not precluded by any change. Does adding token manipulation promise forwards compatibility? Will a Python 3.9 have to have the same kind of token manipulator exposed. If not, then I'm +1 on token manipulation. :) > > >> 2. You get to work with a linear stream of tokens, rather than a > >> precomposed tree of AST nodes that you have to traverse and keep > >> consistent > > > > The AST nodes would contain within them the linear stream of tokens that > you > > are free to work with. The AST also encodes the structure of the tokens, > > which can be very useful if only to debug how the tokens are being > parsed. > > You might find yourself, when doing a more complicated lexical > > transformation, trying to reverse engineer where the parse tree nodes > begin > > and end in the token stream. This would be a nightmare. This is the > main > > problem with trying to process the token stream "blind" to the parse > tree. > > Anything that cares about the structure to that degree shouldn't be > manipulating the token stream - it should be working on the parse > tree. > > >> If all you're wanting to do is token rewriting, or to push the token > >> stream over a network connection in preference to pushing raw source > >> code or fully compiled bytecode, a bit of refactoring of the existing > >> tokeniser/compiler interface to be less file based and more iterable > >> based could make that easier to work with. > > > > You can still do all that with the tokens included in the parse tree. > > Not as easily, because I have to navigate the parse tree even when I > don't care about that structure, rather than being able to just look > at the tokens in isolation. > I don't think it would be more of a burden than it would prevent bugs by allowing you to ensure that the parse tree structure is what you think it is. It's a matter of intuition I guess. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 8 04:52:43 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 12:52:43 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 8 June 2015 at 12:37, Neil Girdhar wrote: > Ultimately, my problem with "token transformers" is, if I'm understanding > correctly, that we want to change Python so that not only will 3.5 have > Token transformers, but every Python after that has to support this. This > risks constraining the development of the elegant solution. And for what > major reason do we even need token transformers so soon? For a toy example > on python ideas about automatic Decimal instances? Why can't a user define > a one character function "d(x)" to do the conversion everywhere? I prefer > to push for the better design even if it means waiting a year. Neil, you're the only one proposing major structural changes to the code generation pipeline, and nobody at all is proposing anything for Python 3.5 (the feature freeze deadline for that has already passed). Andrew is essentially only proposing relatively minor tweaks to the API of the existing tokenizer module to make it more iterable based and less file based (while still preserving the file based APIs). Eugene Toder's and Dave Malcolm's patches from a few years ago make the existing AST -> bytecode section of the toolchain easier to modify and experiment with (and are ideas worth exploring for 3.6 if anyone is willing and able to invest the time to bring them back up to date). However, if you're specifically wanting to work on an "ideal parser API", then the reference interpreter for a 24 year old established language *isn't* the place to do it - the compromises necessitated by the need to align with an extensive existing ecosystem will actively work against your goal for a clean, minimalist structure. That's thus something better developed *independently* of CPython, and then potentially considered at some point in the future when it's better established what concrete benefits it would offer over the status quo for both the core development team and Python's end users. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Mon Jun 8 04:56:55 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 22:56:55 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On Sun, Jun 7, 2015 at 10:52 PM, Nick Coghlan wrote: > On 8 June 2015 at 12:37, Neil Girdhar wrote: > > Ultimately, my problem with "token transformers" is, if I'm understanding > > correctly, that we want to change Python so that not only will 3.5 have > > Token transformers, but every Python after that has to support this. > This > > risks constraining the development of the elegant solution. And for what > > major reason do we even need token transformers so soon? For a toy > example > > on python ideas about automatic Decimal instances? Why can't a user > define > > a one character function "d(x)" to do the conversion everywhere? I > prefer > > to push for the better design even if it means waiting a year. > > Neil, you're the only one proposing major structural changes to the > code generation pipeline, and nobody at all is proposing anything for > Python 3.5 (the feature freeze deadline for that has already passed). > > Andrew is essentially only proposing relatively minor tweaks to the > API of the existing tokenizer module to make it more iterable based > and less file based (while still preserving the file based APIs). > Eugene Toder's and Dave Malcolm's patches from a few years ago make > the existing AST -> bytecode section of the toolchain easier to modify > and experiment with (and are ideas worth exploring for 3.6 if anyone > is willing and able to invest the time to bring them back up to date). > > However, if you're specifically wanting to work on an "ideal parser > API", then the reference interpreter for a 24 year old established > language *isn't* the place to do it - the compromises necessitated by > the need to align with an extensive existing ecosystem will actively > work against your goal for a clean, minimalist structure. That's thus > something better developed *independently* of CPython, and then > potentially considered at some point in the future when it's better > established what concrete benefits it would offer over the status quo > for both the core development team and Python's end users. > That's not what I'm doing. All I'm suggesting is that changes to Python that *preclude* the "ideal parser API" be avoided. I'm not trying to make the ideal API happen today. I'm just keeping the path to that rosy future free of obstacles. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 8 05:03:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 13:03:09 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 8 June 2015 at 12:47, Neil Girdhar wrote: > You're right. And as usual, Nick, your analysis is spot on. My main > concern is that the idealized way of parsing the language is not precluded > by any change. Does adding token manipulation promise forwards > compatibility? Will a Python 3.9 have to have the same kind of token > manipulator exposed. If not, then I'm +1 on token manipulation. :) That may have been the heart of the confusion, as token manipulation is *already* a public feature: https://docs.python.org/3/library/tokenize.html The tokenizer module has been a public part of Python for longer than I've been a Pythonista (first documented in 1.5.2 in 1999): https://docs.python.org/release/1.5.2/lib/module-tokenize.html As a result, token stream manipulation is already possible, you just have to combine the tokens back into a byte stream before feeding them to the compiler. Any future Python interpreter would be free to fall back on implementing a token based API that way, if the CPython code generator itself were to gain a native token stream interface. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Mon Jun 8 05:04:07 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sun, 7 Jun 2015 23:04:07 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <696373354.6114584.1433568658605.JavaMail.yahoo@mail.yahoo.com> Message-ID: Okay, well I'm sorry for the trouble then!! On Sun, Jun 7, 2015 at 11:03 PM, Nick Coghlan wrote: > On 8 June 2015 at 12:47, Neil Girdhar wrote: > > You're right. And as usual, Nick, your analysis is spot on. My main > > concern is that the idealized way of parsing the language is not > precluded > > by any change. Does adding token manipulation promise forwards > > compatibility? Will a Python 3.9 have to have the same kind of token > > manipulator exposed. If not, then I'm +1 on token manipulation. :) > > That may have been the heart of the confusion, as token manipulation > is *already* a public feature: > https://docs.python.org/3/library/tokenize.html > > The tokenizer module has been a public part of Python for longer than > I've been a Pythonista (first documented in 1.5.2 in 1999): > https://docs.python.org/release/1.5.2/lib/module-tokenize.html > > As a result, token stream manipulation is already possible, you just > have to combine the tokens back into a byte stream before feeding them > to the compiler. Any future Python interpreter would be free to fall > back on implementing a token based API that way, if the CPython code > generator itself were to gain a native token stream interface. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 8 05:05:49 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 20:05:49 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Jun 7, 2015, at 19:45, Chris Angelico wrote: > Part of the trouble with depending on CSE is that Python is so dynamic > that you can't depend on things having no side effects... but the more > important part, in my opinion, is that duplication is a source code > maintenance problem. Bruce suggested this: > > x = a and a.b and a.b.c and a.b.c.d > # which becomes > x = a and a.b > if x: x = x.c > if x: x = x.d > > and frankly, I'd be more worried about a subsequent edit missing > something than I would be about the performance of all the repeated > lookups. Of course, Python does have an alternative, and that's to use > attribute absence rather than falsiness: > > try: x = a.b.c.d > except AttributeError: x = None > > But that won't always be an option. I don't have a link, but one of the Swift development blogs shows a number of good examples where it isn't an option. When deciding whether they wanted SmallTalk-style nil chaining or Python-style AttributeError/LookupError, all the simple cases look just as good both ways. So they went out looking for real-life code in multiple languages to find examples that couldn't be translated to the other style. They found plenty of nil-chaining examples that were clumsy to translate to exceptions, but almost all of the exception examples that were clumsy to translate to nil chaining could be solved if they just had multiple levels of nil. So, if they could find a way to provide something like Haskell's Maybe, but without forcing you to think about monads and pattern matching, that would be better than exceptions. So that's what they did. (I'm not sure it's 100% successful, because there are rare times when you really do want to check for Just Nothing, and by hiding things under the covers they made that difficult... But in simple cases it definitely does work.) Anyway, their language design choice isn't directly relevant here (I assume nobody wants a.b.c.d to be None of a.b is missing, or wants to add a?.b?.c?.d syntax to Python), but the examples probably are. > And any kind of expression that > says "the thing on the left, if it's false, otherwise the thing on the > left modified by this operator" is likely to get messy in anything > more than trivial cases; it looks great here: > > x = a?.b?.c?.d > > but now imagine something more complicated, and it's a lot more messy. It's surprising how often it doesn't get messy in Swift. But when it does, I really miss being able to pattern match Just Nothing, and there's no way around that without two clumsy assignment statements before the conditional (or defining and calling an extra function), which is even worse than the one that Python often needs... From robertc at robertcollins.net Mon Jun 8 05:10:17 2015 From: robertc at robertcollins.net (Robert Collins) Date: Mon, 8 Jun 2015 15:10:17 +1200 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 8 June 2015 at 14:56, Neil Girdhar wrote: > > > On Sun, Jun 7, 2015 at 10:52 PM, Nick Coghlan wrote: >> However, if you're specifically wanting to work on an "ideal parser >> API", then the reference interpreter for a 24 year old established >> language *isn't* the place to do it - the compromises necessitated by >> the need to align with an extensive existing ecosystem will actively >> work against your goal for a clean, minimalist structure. That's thus >> something better developed *independently* of CPython, and then >> potentially considered at some point in the future when it's better >> established what concrete benefits it would offer over the status quo >> for both the core development team and Python's end users. > > > That's not what I'm doing. All I'm suggesting is that changes to Python > that *preclude* the "ideal parser API" be avoided. I'm not trying to make > the ideal API happen today. I'm just keeping the path to that rosy future > free of obstacles. I've used that approach in projects before, and in hindsight I realise that I caused significant disruption doing that. The reason boils down to - without consensus that the rosy future is all of: - the right future - worth doing eventually - more important to reach than solve problems that appear on the way then you end up frustrating folk that have problems now, without actually adding value to anyone: the project gets to choose between a future that [worst case, fails all three tests] might not be right, might not be worth doing, and is less important than actual problems which it is stopping solutions for. In this particular case, given Nick's comments about why we change the guts here, I'd say that 'worth doing eventually' is not in consensus, and I personally think that anything that is 'in the indefinite future' is almost never more important than problems affecting people today, because its a possible future benefit vs a current cost. There's probably an economics theorem to describe that, but I'm not an economist :) Pragmatically, I propose that the existing structure already has significant friction around any move to a unified (but still multi-pass I presume) parser infrastructure, and so adding a small amount of friction for substantial benefits will not substantially impact the future work. Concretely: a multi-stage parser with unified language for both lexer and parser should be quite amenable to calling out to a legacy token hook, without onerous impact. Failing that, we can follow the deprecation approach when someone finds we can't do that, and after a reasonable time remove the old hook. But right now, I think the onus is on you to show that a shim wouldn't be possible, rather than refusing to support adding a tokeniser hook because a shim isn't *obviously possible*. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From abarnert at yahoo.com Mon Jun 8 05:18:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 20:18:36 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <0BCB3647-784B-4731-808F-E7666EE23877@yahoo.com> On Jun 7, 2015, at 19:52, Nick Coghlan wrote: > > Andrew is essentially only proposing relatively minor tweaks to the > API of the existing tokenizer module to make it more iterable based > and less file based (while still preserving the file based APIs). And also a patch to the existing ast module to allow it to handle tokenizers from Python as well as from C. The tokenizer tweaks themselves are just to make that easier (and to make using tokenizer a little simpler even if you don't feed it directly to the parser). (It surprised me that the C-level tokenizer actually can take C strings and string objects rather than file objects, but once you think about how the high-level C API stuff like being able to exec a single line must work, it's pretty obvious why that was added...) > Eugene Toder's and Dave Malcolm's patches from a few years ago make > the existing AST -> bytecode section of the toolchain easier to modify > and experiment with (and are ideas worth exploring for 3.6 if anyone > is willing and able to invest the time to bring them back up to date). I got a chance to take a look at this, and, while it seems completely orthogonal to what I'm trying to do, it also seems very cool. If someone got the patches up to date for the trunk and fixed the minor issues involved in the last review (both of which look pretty simple), what are the chances of getting it reviewed for 3.6? (I realize this is probably a better question for the issue tracker or the -dev list than buried in the middle of a barely-relevant -ideas thread, but I'm on my phone here, and you brought it up.:) From ncoghlan at gmail.com Mon Jun 8 05:41:01 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 13:41:01 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 8 June 2015 at 12:45, Chris Angelico wrote: > On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull wrote: >> I'm definitely -1 on the also, alif syntax at this point. On the >> other hand, having done a lot of C programming in my misspent youth, I >> do miss anaphoric conditionals, so I too would like to see the >> possibility of "if cond as var: do_something_with_var" explored. Of >> course Nick is right that automatic common subexpression elimination >> (CSE) is the big win, but manual CSE can improve readability. > > Part of the trouble with depending on CSE is that Python is so dynamic > that you can't depend on things having no side effects... but the more > important part, in my opinion, is that duplication is a source code > maintenance problem. Yes, this is the part of the problem definition I agree with, which is why I think named subexpressions are the most attractive alternative presented in the past discussions. Our typical answer is "pull the named subexpression out to a separate assignment statement and give it a name", but there are a range of constructs where that poses a problem. For example: x = a.b if a.b else a.c while a.b: x = a.b [a.b for a in iterable if a.b] Eliminating the duplication with named subexpressions would be straightforward (I'd suggest making the parentheses mandatory for this construct, which would also avoid ambiguity in the with statement and exception handler clause cases): x = b if (a.b as b) else a.c while (a.b as x): ... [b for a in iterable if (a.b as b)] By contrast, eliminating the duplication *today* requires switching to very different structures based on the underlying patterns otherwise hidden behind the syntactic sugar: x = a.b if not x: x = a.c while True: x = a.b if not x: break ... result = [] for a in iterable: b = a.b if b: result.append(b) The main *problem* with named subexpressions (aside from the potential for side effects introduced by deliberately letting the name bindings leak into the surrounding namespace) is that it introduces a redundancy at the single assignment level since an expression statement that names the expression would be equivalent to a simple assignment statement: x = a (a as x) On the other hand, there's a similar existing redundancy between function definitions and binding lambda expressions to a name: f = lambda: None def f(): pass And for that, we just have a PEP 8 style guideline recommending the latter form. Something similar would likely work for saying "only use named subexpressions in cases where using a normal assignment statement instead would require completely restructuring the code". Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 8 05:55:17 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 13:55:17 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <0BCB3647-784B-4731-808F-E7666EE23877@yahoo.com> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <0BCB3647-784B-4731-808F-E7666EE23877@yahoo.com> Message-ID: On 8 June 2015 at 13:18, Andrew Barnert wrote: > On Jun 7, 2015, at 19:52, Nick Coghlan wrote: >> Eugene Toder's and Dave Malcolm's patches from a few years ago make >> the existing AST -> bytecode section of the toolchain easier to modify >> and experiment with (and are ideas worth exploring for 3.6 if anyone >> is willing and able to invest the time to bring them back up to date). > > I got a chance to take a look at this, and, while it seems completely orthogonal to what I'm trying to do, it also seems very cool. If someone got the patches up to date for the trunk and fixed the minor issues involved in the last review (both of which look pretty simple), what are the chances of getting it reviewed for 3.6? (I realize this is probably a better question for the issue tracker or the -dev list than buried in the middle of a barely-relevant -ideas thread, but I'm on my phone here, and you brought it up.:) I'm still interested in the underlying ideas, and while it's always possible to get completely surprised by life's twists and turns, I'm optimistic I'd be able to find the time to provide feedback on it myself for 3.6, and hopefully encourage other folks with experience with the compiler internals to review it as well. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 8 06:18:15 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 14:18:15 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 8 June 2015 at 13:10, Robert Collins wrote: > In this particular case, given Nick's comments about why we change the > guts here, I'd say that 'worth doing eventually' is not in consensus, > and I personally think that anything that is 'in the indefinite > future' is almost never more important than problems affecting people > today, because its a possible future benefit vs a current cost. > There's probably an economics theorem to describe that, but I'm not an > economist :) I don't know about economics, but for anyone that hasn't encountered it before, the phrase YAGNI is a good one to know: You Ain't Gonna Need It. ( http://c2.com/cgi/wiki?YouArentGonnaNeedIt ) The way YAGNI applies when deciding *to* do something is when you're faced with the following choice: * Making a particular change solves an immediate problem, but would make another possible change more complex in the future * Not making a change preserves the simplicity of the possible future change, but also doesn't solve the immediate problem Sometimes you'll get lucky and someone will figure out a third path that both addresses the immediate concern *and* leaves your future options open for other changes. More often though, you'll have to decide between these options, and in those cases "YAGNI" argues in favour of heavily discounting the potential increase in difficulty for a change you may never make anyway. Cheers, Nick. P.S. This tension between considering the long term implications of changes without allowing that consideration to block near term progress is what I personally see in the following two lines of the Zen of Python: Now is better than never. Although never is often better than *right* now. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mistersheik at gmail.com Mon Jun 8 06:23:43 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 8 Jun 2015 00:23:43 -0400 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: Yes, but in this case the near term "problem" was as far as I can tell just parsing floats as decimals, which is easily done with a somewhat noisy function call. I don't see why it's important. The way that CPython does parsing is more than just annoying. It's a mess of repetition and tests that try to make sure that all of the phases are synchronized. I don't think that CPython is the future of Python. One day, someone will write a Python interpreter in Python that includes a clean one-pass parser. I would prefer to make that as easy to realize as possible. You might think it's far-fetched. I don't think it is. Best, Neil On Mon, Jun 8, 2015 at 12:18 AM, Nick Coghlan wrote: > On 8 June 2015 at 13:10, Robert Collins wrote: > > In this particular case, given Nick's comments about why we change the > > guts here, I'd say that 'worth doing eventually' is not in consensus, > > and I personally think that anything that is 'in the indefinite > > future' is almost never more important than problems affecting people > > today, because its a possible future benefit vs a current cost. > > There's probably an economics theorem to describe that, but I'm not an > > economist :) > > I don't know about economics, but for anyone that hasn't encountered > it before, the phrase YAGNI is a good one to know: You Ain't Gonna > Need It. ( http://c2.com/cgi/wiki?YouArentGonnaNeedIt ) > > The way YAGNI applies when deciding *to* do something is when you're > faced with the following choice: > > * Making a particular change solves an immediate problem, but would > make another possible change more complex in the future > * Not making a change preserves the simplicity of the possible future > change, but also doesn't solve the immediate problem > > Sometimes you'll get lucky and someone will figure out a third path > that both addresses the immediate concern *and* leaves your future > options open for other changes. More often though, you'll have to > decide between these options, and in those cases "YAGNI" argues in > favour of heavily discounting the potential increase in difficulty for > a change you may never make anyway. > > Cheers, > Nick. > > P.S. This tension between considering the long term implications of > changes without allowing that consideration to block near term > progress is what I personally see in the following two lines of the > Zen of Python: > > Now is better than never. > Although never is often better than *right* now. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 8 06:24:36 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 21:24:36 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> On Jun 7, 2015, at 20:41, Nick Coghlan wrote: > >> On 8 June 2015 at 12:45, Chris Angelico wrote: >>> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull wrote: >>> I'm definitely -1 on the also, alif syntax at this point. On the >>> other hand, having done a lot of C programming in my misspent youth, I >>> do miss anaphoric conditionals, so I too would like to see the >>> possibility of "if cond as var: do_something_with_var" explored. Of >>> course Nick is right that automatic common subexpression elimination >>> (CSE) is the big win, but manual CSE can improve readability. >> >> Part of the trouble with depending on CSE is that Python is so dynamic >> that you can't depend on things having no side effects... but the more >> important part, in my opinion, is that duplication is a source code >> maintenance problem. > > Yes, this is the part of the problem definition I agree with, which is > why I think named subexpressions are the most attractive alternative > presented in the past discussions. The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic. You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is: > [b for a in iterable if (a.b as b)] That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse? Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better, -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 8 06:27:05 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 21:27:05 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> Message-ID: Sorry, early send... Sent from my iPhone > On Jun 7, 2015, at 21:24, Andrew Barnert via Python-ideas wrote: > >> On Jun 7, 2015, at 20:41, Nick Coghlan wrote: >> >>> On 8 June 2015 at 12:45, Chris Angelico wrote: >>>> On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull wrote: >>>> I'm definitely -1 on the also, alif syntax at this point. On the >>>> other hand, having done a lot of C programming in my misspent youth, I >>>> do miss anaphoric conditionals, so I too would like to see the >>>> possibility of "if cond as var: do_something_with_var" explored. Of >>>> course Nick is right that automatic common subexpression elimination >>>> (CSE) is the big win, but manual CSE can improve readability. >>> >>> Part of the trouble with depending on CSE is that Python is so dynamic >>> that you can't depend on things having no side effects... but the more >>> important part, in my opinion, is that duplication is a source code >>> maintenance problem. >> >> Yes, this is the part of the problem definition I agree with, which is >> why I think named subexpressions are the most attractive alternative >> presented in the past discussions. > > The problem with general named subexpressions is that it inherently means a side effect buried in the middle of an expression. While it's not _impossible_ to do that in Python today (e.g., you can always call a mutating method in a comprehension's if clause or in the third argument to a function), but it's not common or idiomatic. > > You could say this is a consulting-adults issue and you shouldn't use it in cases where it's not deep inside an expression--but those are the actual motivating cases, the ones where just "pull it out into a named assignment" won't work. In fact, one of our three examples is: > >> [b for a in iterable if (a.b as b)] > > That's exactly the kind of place that you'd call non-idiomatic with a mutating method call, so why is a binding not even worse? > > Maybe something more like a let expression, where the binding goes as far left as possible instead of as far right would look better, ... but I can't even begin to think of a way to fit that into Python's syntax that isn't horribly ugly and clunky, and "as" already has lots of precedent, so I think that's not worth exploring. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Mon Jun 8 06:31:21 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 7 Jun 2015 21:31:21 -0700 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On Jun 7, 2015, at 21:23, Neil Girdhar wrote: > > Yes, but in this case the near term "problem" was as far as I can tell just parsing floats as decimals, which is easily done with a somewhat noisy function call. I don't see why it's important. This isn't the only case anyone's ever wanted. The tokenize module has been there since at least 1.5, and presumably it wasn't added for no good reason, or made to work with 3.x just for fun. And it has an example use in the docs. The only thing that's changed is that, now that postprocessing the AST has become a lot easier and less hacky because of the ast module and the succession of changes to the import process, the fact that tokenize is still clumsy and hacky is more noticeable. From stefan_ml at behnel.de Mon Jun 8 06:43:52 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 08 Jun 2015 06:43:52 +0200 Subject: [Python-ideas] If branch merging In-Reply-To: References: Message-ID: Cory Beutler schrieb am 07.06.2015 um 05:03: > *Examples of use:* > *Duplicate code in if-chains may be reduced:* > # Old Version > if a == b: > print ('a == b') > foo() # <-- duplicate code > elif b == c: > print ('b == c') > foo() # <-- duplicate code > elif c == d: > print ('c == d') > foo() # <-- duplicate code > > # New Version > if a == b: > print ('a == b') > elif b == c: > print ('b == c') > elif c == d: > print ('c == d') > also: > foo() # <-- No longer duplicated I think this is best done by extracting it into a function, e.g. def good_name_needed(): if a == b: print('a == b') elif b == c: print('b == c') elif c == d: print('c == d') else: # nothing to do here return foo() # <-- No longer duplicated Usually, in real code, the call to foo() will have some kind of relation to the previous if-chain anyway, so a good name for the whole function shouldn't be all to difficult to find. Stefan From ncoghlan at gmail.com Mon Jun 8 07:01:06 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 15:01:06 +1000 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: On 8 June 2015 at 14:23, Neil Girdhar wrote: > Yes, but in this case the near term "problem" was as far as I can tell just > parsing floats as decimals, which is easily done with a somewhat noisy > function call. I don't see why it's important. No, the problem to be solved is making it easier for people to "play" with Python's syntax and try out different ideas in a format that can be shared easily. The more people that are able to tinker and play with something, and share the results of their work, the more opportunities there are for good ideas to be had, and shared, eventually building up to the point of becoming a coherent proposal for change. The 3.4 dis module included several enhancements to make playing with bytecode easier and more fun: https://docs.python.org/3/whatsnew/3.4.html#dis 3.4 also added the source_to_code() hook in importlib to make it easy to tweak the compilation pass without having to learn all the other intricacies of the import system: https://docs.python.org/3/whatsnew/3.4.html#importlib MacroPy and Hylang are interesting examples of ways to manipulate the AST in order to use the CPython VM without relying solely on the native language syntax, while byteplay and Numba are examples of manipulating things at the bytecode level. > The way that CPython does parsing is more than just annoying. It's a mess > of repetition and tests that try to make sure that all of the phases are > synchronized. I don't think that CPython is the future of Python. One day, > someone will write a Python interpreter in Python that includes a clean > one-pass parser. I would prefer to make that as easy to realize as > possible. You might think it's far-fetched. I don't think it is. While the structure of CPython's code generation toolchain certainly poses high incidental barriers to entry, those barriers are trivial compared to the *inherent* barriers to entry involved in successfully making the case for a change like introducing a matrix multiplication operator or more clearly separating coroutines from generators through the async/await keywords (both matrix multiplication support and async/await landed for 3.5). If someone successfully makes the case for a compelling change to the language specification, then existing core developers are also ready, willing and able to assist in actually making the change to CPython. As a result, making that final step of *implementing* a syntactic change in CPython easier involves changing something that *isn't the bottleneck in the process*, so it would have no meaningful impact on the broader Python community. By contrast, making *more steps* of the existing compilation process easier for pure Python programmers to play with, preferably in an implementation independent way, *does* impact two of the bottlenecks: the implementation of syntactic ideas in executable form, and sharing those experiments with others. Combining that level of syntactic play with PyPy's ability to automatically generate JIT compilers offers an extraordinarily powerful platform for experimentation, with the standardisation process ensuring that accepted experiments also scale down to significantly more constrained environments like MicroPython. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From floyd at floyd.ch Mon Jun 8 09:56:41 2015 From: floyd at floyd.ch (floyd) Date: Mon, 08 Jun 2015 09:56:41 +0200 Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio Message-ID: <55754AB9.1010000@floyd.ch> Hi * I use this python line quite a lot in some projects: if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold: I realized that this is performance-wise not optimal, therefore wrote a method that will return much faster in a lot of cases by using the length of "a" and "b" to calculate the upper bound for "threshold": if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold): I'd say we could include it into the stdlib, but maybe it should only be a python code recipe? I would say this is one of the most frequent use cases for difflib, but maybe that's just my biased opinion :) . What's yours? See http://bugs.python.org/issue24384 cheers, floyd From storchaka at gmail.com Mon Jun 8 10:44:38 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 08 Jun 2015 11:44:38 +0300 Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio In-Reply-To: <55754AB9.1010000@floyd.ch> References: <55754AB9.1010000@floyd.ch> Message-ID: On 08.06.15 10:56, floyd wrote: > I use this python line quite a lot in some projects: > > if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold: > > I realized that this is performance-wise not optimal, therefore wrote a > method that will return much faster in a lot of cases by using the > length of "a" and "b" to calculate the upper bound for "threshold": > > if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold): > > I'd say we could include it into the stdlib, but maybe it should only be > a python code recipe? > > I would say this is one of the most frequent use cases for difflib, but > maybe that's just my biased opinion :) . What's yours? > > See http://bugs.python.org/issue24384 If such function will be added, I think it needs better name. E.g. difflib.isclose(a, b, threshold). From ncoghlan at gmail.com Mon Jun 8 12:32:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 20:32:44 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> Message-ID: On 8 June 2015 at 14:24, Andrew Barnert wrote: > On Jun 7, 2015, at 20:41, Nick Coghlan wrote: > > On 8 June 2015 at 12:45, Chris Angelico wrote: > > On Mon, Jun 8, 2015 at 12:33 PM, Stephen J. Turnbull > wrote: > > I'm definitely -1 on the also, alif syntax at this point. On the > > other hand, having done a lot of C programming in my misspent youth, I > > do miss anaphoric conditionals, so I too would like to see the > > possibility of "if cond as var: do_something_with_var" explored. Of > > course Nick is right that automatic common subexpression elimination > > (CSE) is the big win, but manual CSE can improve readability. > > > Part of the trouble with depending on CSE is that Python is so dynamic > > that you can't depend on things having no side effects... but the more > > important part, in my opinion, is that duplication is a source code > > maintenance problem. > > > Yes, this is the part of the problem definition I agree with, which is > why I think named subexpressions are the most attractive alternative > presented in the past discussions. > > > The problem with general named subexpressions is that it inherently means a > side effect buried in the middle of an expression. While it's not > _impossible_ to do that in Python today (e.g., you can always call a > mutating method in a comprehension's if clause or in the third argument to a > function), but it's not common or idiomatic. > > You could say this is a consulting-adults issue and you shouldn't use it in > cases where it's not deep inside an expression--but those are the actual > motivating cases, the ones where just "pull it out into a named assignment" > won't work. In fact, one of our three examples is: > > [b for a in iterable if (a.b as b)] > > > That's exactly the kind of place that you'd call non-idiomatic with a > mutating method call, so why is a binding not even worse? Ah, but that's one of the interesting aspects of the idea: since comprehensions and generator expressions *already* define their own nested scope in Python 3 in order to keep the iteration variable from leaking, their named subexpressions wouldn't leak either :) For if/elif clauses and while loops, the leaking would be a desired feature in order to make the subexpression available for use inside the following suite body. That would leave conditional expressions as the main suggested use case where leaking the named subexpressions might not be desirable. Without any dedicated syntax, the two ways that first come to mind for doing expression local named subexpressions would be: x = (lambda a=a: b if (a.b as b) else a.c)() x = next((b if (a.b as b) else a.c) for a in (a,)) Neither of which would be a particularly attractive option. The other possibility that comes to mind is to ask the question: "What happens when a named subexpression appears as part of an argument list to a function call, or as part of a subscript operation, or as part of a container display?", as in: x = func(b if (a.b as b) else a.c) x = y[b if (a.b as b) else a.c] x = (b if (a.b as b) else a.c), x = [b if (a.b as b) else a.c] x = {b if (a.b as b) else a.c} x = {'k': b if (a.b as b) else a.c} Having *those* subexpressions leak seems highly questionable, so it seems reasonable to suggest that in order for this idea to be workable in practice, there would need to be some form of implicit scoping rule where using a named subexpression turned certain constructs into "scoped subexpressions" that implicitly created a function object and called it, rather than being evaluated inline as normal. (The dual pass structure of the code generator should make this technically feasible - it would be similar to the existing behaviour where the presence of a yield expression changes the way a containing "def" statement is handled) However, that complication is significant enough to make me wonder how feasible the idea really is - yes, it handles simple cases nicely, but figuring out how to keep the side effect implications to a manageable level without making the scoping rules impossibly hard to follow would be a non-trivial challenge. Without attempting to implement it, I'm honestly not sure how hard it would be to introduce more comprehension style implicit scopes to bound the propagation of named subexpression bindings. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Jun 8 13:24:33 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jun 2015 04:24:33 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> Message-ID: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> On Jun 8, 2015, at 03:32, Nick Coghlan wrote: > On 8 June 2015 at 14:24, Andrew Barnert wrote: >> >> The problem with general named subexpressions is that it inherently means a >> side effect buried in the middle of an expression. While it's not >> _impossible_ to do that in Python today (e.g., you can always call a >> mutating method in a comprehension's if clause or in the third argument to a >> function), but it's not common or idiomatic. >> >> You could say this is a consulting-adults issue and you shouldn't use it in >> cases where it's not deep inside an expression--but those are the actual >> motivating cases, the ones where just "pull it out into a named assignment" >> won't work. In fact, one of our three examples is: >> >> [b for a in iterable if (a.b as b)] >> >> >> That's exactly the kind of place that you'd call non-idiomatic with a >> mutating method call, so why is a binding not even worse? > > Ah, but that's one of the interesting aspects of the idea: since > comprehensions and generator expressions *already* define their own > nested scope in Python 3 in order to keep the iteration variable from > leaking, their named subexpressions wouldn't leak either :) > > For if/elif clauses and while loops, the leaking would be a desired > feature in order to make the subexpression available for use inside > the following suite body. Except it would also make the subexpression available for use _after_ the suite body. And it would give you a way to accidentally replace rather than shadow a variable from earlier in the function. So it really is just as bad as any other assignment or other mutation inside a condition. > That would leave conditional expressions as the main suggested use > case where leaking the named subexpressions might not be desirable. > Without any dedicated syntax, the two ways that first come to mind for > doing expression local named subexpressions would be: > > x = (lambda a=a: b if (a.b as b) else a.c)() > x = next((b if (a.b as b) else a.c) for a in (a,)) > > Neither of which would be a particularly attractive option. Especially since if you're willing to introduce an otherwise-unnecessary scope, you don't even need this feature: x = (lambda b: b if b else a.c)(a.b) x = (lambda b=a.b: b if b else a.c)() Or, of course, you can just define a reusable ifelse function somewhere: def defaultify(val, defaultval return val if val else defaultval x = defaultify(a.b, a.c) > The other possibility that comes to mind is to ask the question: "What > happens when a named subexpression appears as part of an argument list > to a function call, or as part of a subscript operation, or as part of > a container display?", as in: > > x = func(b if (a.b as b) else a.c) > x = y[b if (a.b as b) else a.c] > x = (b if (a.b as b) else a.c), > x = [b if (a.b as b) else a.c] > x = {b if (a.b as b) else a.c} > x = {'k': b if (a.b as b) else a.c} > > Having *those* subexpressions leak seems highly questionable, so it > seems reasonable to suggest that in order for this idea to be workable > in practice, there would need to be some form of implicit scoping rule > where using a named subexpression turned certain constructs into > "scoped subexpressions" that implicitly created a function object and > called it, rather than being evaluated inline as normal. Now you really _are_ reinventing let. A let expression like this: x = let b=a.b in (b if b else a.c) ... is effectively just syntactic sugar for the lambda above. And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?) As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static. From steve at pearwood.info Mon Jun 8 14:12:28 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 8 Jun 2015 22:12:28 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> Message-ID: <20150608121228.GL20701@ando.pearwood.info> On Mon, Jun 08, 2015 at 04:24:33AM -0700, Andrew Barnert via Python-ideas wrote: [...] > > For if/elif clauses and while loops, the leaking would be a desired > > feature in order to make the subexpression available for use inside > > the following suite body. > > Except it would also make the subexpression available for use _after_ > the suite body. And it would give you a way to accidentally replace > rather than shadow a variable from earlier in the function. So it > really is just as bad as any other assignment or other mutation inside > a condition. I don't know why you think this will be a bad thing. Or rather, even if it is a bad thing, it's the Python Way. Apart from classes and functions themselves, indented blocks are *not* new scopes as they may be in some other languages. They are part of the existing scope, and the issues you raise above are already true today: x = 1 if some_condition(): x = 2 # replaces, rather than shadow, the earlier x y = 3 # y may be available for use after the suite body So I don't see the following as any more of a problem: x = 1 if (some_condition() as x) or (another_condition() as y): ... # x is replaced, and y is available The solution to replacing a variable is, use another name. And if you really care about y escaping from the if-block, just use del y at the end of the block. (I can't imagine why anyone would bother.) [...] > > The other possibility that comes to mind is to ask the question: "What > > happens when a named subexpression appears as part of an argument list > > to a function call, or as part of a subscript operation, or as part of > > a container display?", as in: > > > > x = func(b if (a.b as b) else a.c) > > x = y[b if (a.b as b) else a.c] > > x = (b if (a.b as b) else a.c), > > x = [b if (a.b as b) else a.c] > > x = {b if (a.b as b) else a.c} > > x = {'k': b if (a.b as b) else a.c} > > > > Having *those* subexpressions leak seems highly questionable, I agree with that in regard to the function call. It just feels wrong and icky for a binding to occur inside a function call like that. But I don't think I agree with respect to the rest. To answer Andrew's later question: > What does "x[(a.b as b)] = b" mean surely it simply means the same as: b = a.b x[b] = b Now we could apply the same logic to a function call: # func(a.b as b) b = a.b func(b) but I think the reason this feels wrong for function calls is that it looks like the "as b" binding should be inside the function's scope rather than in the caller's scope. (At least that's what it looks like to me.) But that doesn't apply to the others. (At least for me.) But frankly, I think I would prefer to have b escape from the function call than to have to deal with a bunch of obscure, complicated and unintuitive "as" scoping rules. Simplicity and predictability counts for a lot. -- Steve From liik.joonas at gmail.com Mon Jun 8 14:26:35 2015 From: liik.joonas at gmail.com (Joonas Liik) Date: Mon, 8 Jun 2015 15:26:35 +0300 Subject: [Python-ideas] If branch merging In-Reply-To: <20150608121228.GL20701@ando.pearwood.info> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> Message-ID: I just got this funny feeling reading the last few posts. # this f(i+1 as i) # feels a lot like.. f(i++) # but really f(++i) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jun 8 14:26:28 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 22:26:28 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> Message-ID: On 8 June 2015 at 21:24, Andrew Barnert wrote: > > Now you really _are_ reinventing let. A let expression like this: > > x = let b=a.b in (b if b else a.c) > > ... is effectively just syntactic sugar for the lambda above. Sure, I've thought a *lot* about adding let-type syntax - hence PEP's 403 (@in) and 3150 (given) for a couple of variations on statement level local variables. The problem with a let expression is that you still end up having to jumble up the order of things, just as you do with the trick of defining and calling a function, rather than being able to just name the subexpression on first execution and refer back to it by name later rather than repeating the calculation. Thus a let expression doesn't actually help all that much with improving the flow of reading or writing code - you still have the step of pulling the subexpression out and declaring both its name and value first, before proceeding on with the value of the calculation. That's not only annoying when writing, but also increases the cognitive load when reading, since the subexpressions are introduced in a context free fashion. When the named subexpressions are inlined, they work more like the way pronouns in English work: When the (named subexpressions as they) are inlined, they work more like the way pronouns in English work. It's a matter of setting up a subexpression for a subsequent backreference, rather than pulling it out into a truly independent step. > And it's a lot more natural and easy to reason about than letting b escape one step out to the conditional expression but not any farther. (Or to the rest of the complete containing expression? Or the statement? What does "x[(a.b as b)] = b" mean, for example? Or "x[(b if (a.b as b) else a.c) + (b if (d.b as b) else d.c)]"? Or "x[(b if (a.b as b) else a.c) + b]"?) Exactly, that's the main problem with named subexpressions - if you let them *always* leak, you get some very confusing consequences, and if you *never* let them leak, than you don't address the if statement and while loop use cases. So to make them work as desired, you have to say they "sometimes" leak, and then define what that means in a comprehensible way. One possible way to do that would be to say that they *never* leak by default (i.e. using a named subexpression always causes the expression containing them to be executed in its own scope), and then introduce some form of special casing into if statements and while loops to implicitly extract named subexpressions. > As a side note, the initial proposal here was to improve performance by not repeating the a.b lookup; I don't think adding an implicit comprehension-like function definition and call will be faster than a getattr except in very uncommon cases. However, I think there are reasonable cases where it's more about correctness than performance (e.g., the real expression you want to avoid evaluating twice is next(spam) or f.readline(), not a.b), so I'm not too concerned there. Also, I'm pretty sure a JIT could effectively inline a function definition plus call more easily than it could CSE an expression that's hard to prove is static. Yes, I'm not particularly interested in speed here - I'm personally interested in maintainability and expressiveness. (That's also why I consider this a very low priority project for me personally, as it's very, very hard to make a programming language easier to use by *adding* concepts to it. You really want to be giving already emergent patterns names and syntactic sugar, since you're then replacing learning a pattern that someone would have eventually had to learn anyway with learning the dedicated syntax). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Jun 8 14:38:58 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Jun 2015 22:38:58 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <20150608121228.GL20701@ando.pearwood.info> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> Message-ID: On 8 June 2015 at 22:12, Steven D'Aprano wrote: [In relation to named subexpressions leaking to the surrounding namespace by default] > I agree with that in regard to the function call. It just feels wrong > and icky for a binding to occur inside a function call like that. But > I don't think I agree with respect to the rest. To answer Andrew's later > question: > >> What does "x[(a.b as b)] = b" mean > > surely it simply means the same as: > > b = a.b > x[b] = b Right, but it reveals the execution order jumping around in a way that is less obvious in the absence of side effects. That is, for side effect free functions, the order of evaluation in: x[a()] = b() doesn't matter. Once side effects are in play, the order matters a lot more. > Now we could apply the same logic to a function call: > > # func(a.b as b) > b = a.b > func(b) > > but I think the reason this feels wrong for function calls is that it > looks like the "as b" binding should be inside the function's scope > rather than in the caller's scope. (At least that's what it looks like > to me.) But that doesn't apply to the others. (At least for me.) > > But frankly, I think I would prefer to have b escape from the function > call than to have to deal with a bunch of obscure, complicated and > unintuitive "as" scoping rules. Simplicity and predictability counts for > a lot. Hence the ongoing absence of named subexpressions as a feature - the simple cases look potentially interesting, but without careful consideration, the complex cases would inevitably end up depending on CPython specific quirks in subexpression execution order. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Mon Jun 8 15:21:40 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jun 2015 06:21:40 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> Message-ID: <4998A780-6D75-4AD6-AEB4-9C8AB37FA3A3@yahoo.com> On Jun 8, 2015, at 05:26, Nick Coghlan wrote: > > The problem with a let expression is that you still end up having to > jumble up the order of things, just as you do with the trick of > defining and calling a function, rather than being able to just name > the subexpression on first execution and refer back to it by name > later rather than repeating the calculation. But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name. He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote. Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal), but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird. Of course the same thing does happen in comprehensions, but (a) those are one of the few things in Python that are intended to read as much like math as like English, and (b) it almost always _is_ the expression rather than the loop variable that's the interesting part of a comprehension; that isn't generally true for a conditional. From abarnert at yahoo.com Mon Jun 8 15:31:34 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 8 Jun 2015 06:31:34 -0700 Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio In-Reply-To: <55754AB9.1010000@floyd.ch> References: <55754AB9.1010000@floyd.ch> Message-ID: If this really is needed as a performance optimization, surely you want to do something faster than loop over dozens of comparisons to decide whether you can skip the actual work? I don't know if this is something you can calculate analytically, but if not, you're presumably doing this on zillions of lines, and instead of repeating the loop every time, wouldn't it be better to just do it once and then just check the ratio each time? (You could hide that from the caller by just factoring out the loop to a function _get_ratio_for_threshold and decorating it with @lru_cache. But I don't know if you really need to hide it from the caller.) Also, do the extra checks for 0, 1, and 0.1 and for empty strings actually speed things up in practice? > On Jun 8, 2015, at 00:56, floyd wrote: > > Hi * > > I use this python line quite a lot in some projects: > > if difflib.SequenceMatcher.quick_ratio(None, a, b) >= threshold: > > I realized that this is performance-wise not optimal, therefore wrote a > method that will return much faster in a lot of cases by using the > length of "a" and "b" to calculate the upper bound for "threshold": > > if difflib.SequenceMatcher.quick_ratio_ge(None, a, b, threshold): > > I'd say we could include it into the stdlib, but maybe it should only be > a python code recipe? > > I would say this is one of the most frequent use cases for difflib, but > maybe that's just my biased opinion :) . What's yours? > > See http://bugs.python.org/issue24384 > > cheers, > floyd > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ncoghlan at gmail.com Mon Jun 8 16:11:26 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Jun 2015 00:11:26 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <4998A780-6D75-4AD6-AEB4-9C8AB37FA3A3@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <4998A780-6D75-4AD6-AEB4-9C8AB37FA3A3@yahoo.com> Message-ID: On 8 June 2015 at 23:21, Andrew Barnert wrote: > On Jun 8, 2015, at 05:26, Nick Coghlan wrote: >> >> The problem with a let expression is that you still end up having to >> jumble up the order of things, just as you do with the trick of >> defining and calling a function, rather than being able to just name >> the subexpression on first execution and refer back to it by name >> later rather than repeating the calculation. > > But notice that in two of your three use cases--and, significantly, the ones that are expressions--the place of first execution comes lexically _after_ the reference, so in normal reading order, you're referring _forward_ to it by name. Right, but as you note later, that jumping around in execution order is inherent in the way conditional expressions and comprehensions are constructed, and the named subexpressions track execution order rather than lexical order. It's also worth noting that the comprehension case causes the same problem for a let expression that "pull it out to a separate statement" does for while loops: # This works x = a.b if x: # use x # This doesn't x = a.b while x: # use x And similarly: # This could work x = (let b = a.b in (b if b else a.c)) # This can't be made to work x = (let b = a.b in (b for a in iterable if b) By contrast, these would both be possible: x = b if (a.b as b) else a.c x = (b for a in iterable if (a.b as b)) If it's accepted that letting subexpressions of binary, ternary and quaternary expressions refer to each other is a desirable design goal, then a new scope definition expression can't handle that requirement - cross-references require a syntax that can be interleaved with the existing constructs and track their execution flow, rather than a syntax that wraps them in a new larger expression. > He can front a clause without swapping the pronoun and its referent if Nick intends that special emphasis, but otherwise he wouldn't do that in English. That's a valid English sentence, but you have to think for a second to parse it, and then think again to guess what the odd emphasis is supposed to connote. Yeah, I didn't adequately think through the way the out-of-order execution weakened the pronoun-and-back-reference analogy. > Sometimes you actually do want that odd emphasis (it seems like a major point of your given proposal), It's just a consequence of tracking execution order rather than lexical order. The *reason* for needing to track execution order is because it's the only way to handle loops properly (by rebinding the name to a new value on each iteration). It's also possible to get a conditional expression to use a back reference instead of a forward reference by inverting the check: x = a.c if not (a.b as b) else b Or by using the existing "pull the subexpression out to a separate statement" trick: b = a.b x = b if b else a.c You'd never be *forced* to use a forward reference if you felt it made the code less readable. The forward reference would be mandatory in comprehensions, but that's already at least somewhat familiar due to the behaviour of the iteration variable. > but that's not the case here. It's the temporary name "b" that's unimportant, not its definition; the only reason you need the name at all is to avoid evaluating "a.b" twice. So having it come halfway through the expression is a little weird. I'd consider elif clauses, while loops, comprehensions and generator expressions to be the most useful cases - they're all situations where pulling the subexpression out to a preceding assignment statement doesn't work due to the conditional execution of the clause (elif) or the repeated execution (while loops, comprehensions, generator expressions). For other cases, the semantics would need to be clearly *defined* in any real proposal, but I would expect a preceding explicit assignment statement to be clearer most of the time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From taleinat at gmail.com Mon Jun 8 17:15:29 2015 From: taleinat at gmail.com (Tal Einat) Date: Mon, 8 Jun 2015 18:15:29 +0300 Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio In-Reply-To: References: <55754AB9.1010000@floyd.ch> Message-ID: On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka wrote: > If such function will be added, I think it needs better name. E.g. > difflib.isclose(a, b, threshold). Indeed, this is somewhat similar in concept to the recently-added math.isclose() function, and could use a similar signature, i.e.: difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None). However, the real issue here is whether this is important enough to be included in the stdlib. Are there any places in the stdlib where this would be useful? Can anyone other than the OP confirm that they would find having this in the stdlib particularly useful? Why should this be in the stdlib vs. a recipe? - Tal Einat From wolfram.hinderer at googlemail.com Mon Jun 8 17:24:04 2015 From: wolfram.hinderer at googlemail.com (Wolfram Hinderer) Date: Mon, 08 Jun 2015 17:24:04 +0200 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> Message-ID: <5575B394.9070203@googlemail.com> Am 08.06.2015 um 14:38 schrieb Nick Coghlan: > On 8 June 2015 at 22:12, Steven D'Aprano wrote: > > [In relation to named subexpressions leaking to the surrounding > namespace by default] > >>> What does "x[(a.b as b)] = b" mean >> >> surely it simply means the same as: >> >> b = a.b >> x[b] = b > > Right, but it reveals the execution order jumping around in a way that > is less obvious in the absence of side effects. I'm lost. The evaluation order of today (right hand side first) would make "x[(a.b as b)] = b" mean x[a.b] = b b = a.b (assuming looking up a.b has no side effects). Would the introduction of named subexpressions change that, and how? From bussonniermatthias at gmail.com Mon Jun 8 17:39:59 2015 From: bussonniermatthias at gmail.com (Matthias Bussonnier) Date: Mon, 8 Jun 2015 08:39:59 -0700 Subject: [Python-ideas] difflib.SequenceMatcher quick_ratio In-Reply-To: References: <55754AB9.1010000@floyd.ch> Message-ID: <367F99E3-2E16-4D9B-B105-58152DEEA19B@gmail.com> > On Jun 8, 2015, at 08:15, Tal Einat wrote: > > On Mon, Jun 8, 2015 at 11:44 AM, Serhiy Storchaka wrote: > >> If such function will be added, I think it needs better name. E.g. >> difflib.isclose(a, b, threshold). > > Indeed, this is somewhat similar in concept to the recently-added > math.isclose() function, and could use a similar signature, i.e.: > difflib.SequenceMatcher.isclose(a, b, rel_tol=None, abs_tol=None). > > However, the real issue here is whether this is important enough to be > included in the stdlib. One thing I found is that the fact that stdlib try to be smart (human friendly diff) make it horribly slow[1]. > SequenceMatcher tries to compute a "human-friendly diff" between two sequences. If you are only interested in a quick ratio, especially on long sequences, I would suggest using another algorithm which is not worse case scenario in n^3. On some sequences computing the diff was several order of magnitude faster for me[2] (pure python). At the point where quick-ratio was not needed. Note also that SequeceMatcher(None, a, b) might not give the same result/ratio that SequenceMatcher(None, b, a) >>> SequenceMatcher(None,'aba', 'bca').get_matching_blocks() [Match(a=0, b=2, size=1), Match(a=3, b=3, size=0)] # 1 common char >>> SequenceMatcher(None,'bca','aba').get_matching_blocks() [Match(a=0, b=1, size=1), Match(a=2, b=2, size=1), Match(a=3, b=3, size=0)] # 2 common chars. ? M [1] For my application, I don?t really care about having Human Friendly diff, but the actual minimal diff. I do understand the need for this algorithm though. [2] Well chosen Benchmark : http://i.imgur.com/cZPwR0H.png -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Jun 8 17:42:34 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Jun 2015 00:42:34 +0900 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> Message-ID: <871thm436d.fsf@uwakimon.sk.tsukuba.ac.jp> Robert Collins writes: > There's probably an economics theorem to describe that, but I'm not an > economist :) I'd like to refuse the troll, but this one is too good to pass up. The book which is the authoritative source on the theorems you're looking for is Nancy Stokey's "The Economics of Inaction". 'Nuff said on this topic? From robertc at robertcollins.net Mon Jun 8 20:35:48 2015 From: robertc at robertcollins.net (Robert Collins) Date: Tue, 9 Jun 2015 06:35:48 +1200 Subject: [Python-ideas] Hooking between lexer and parser In-Reply-To: <871thm436d.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1993A95F-2564-4038-9B0B-E1AE86AC5B54@yahoo.com> <1a57ad87-860d-4044-98f2-11f1d3622a18@googlegroups.com> <871thm436d.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 9 June 2015 at 03:42, Stephen J. Turnbull wrote: > Robert Collins writes: > > > There's probably an economics theorem to describe that, but I'm not an > > economist :) > > I'd like to refuse the troll, but this one is too good to pass up. > > The book which is the authoritative source on the theorems you're > looking for is Nancy Stokey's "The Economics of Inaction". 'Nuff said > on this topic? Thanks; wasn't a troll - fishing perhaps :) I've bought it for kindle and shall have a read. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From ncoghlan at gmail.com Mon Jun 8 23:26:09 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Jun 2015 07:26:09 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <5575B394.9070203@googlemail.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> Message-ID: On 9 June 2015 at 01:24, Wolfram Hinderer wrote: > Am 08.06.2015 um 14:38 schrieb Nick Coghlan: >> On 8 June 2015 at 22:12, Steven D'Aprano wrote: >> >> [In relation to named subexpressions leaking to the surrounding >> namespace by default] >> >>>> What does "x[(a.b as b)] = b" mean >>> >>> surely it simply means the same as: >>> >>> b = a.b >>> x[b] = b >> >> Right, but it reveals the execution order jumping around in a way that >> is less obvious in the absence of side effects. > > I'm lost. The evaluation order of today (right hand side first) > would make "x[(a.b as b)] = b" mean > > x[a.b] = b > b = a.b > > (assuming looking up a.b has no side effects). That assumption that the LHS evaluation has no side effects is the one that gets revealed by named subexpressions: >>> def subscript(): ... print("Subscript called") ... return 0 ... >>> def value(): ... print("Value called") ... return 42 ... >>> def target(): ... print("Target called") ... return [None] ... >>> target()[subscript()] = value() Value called Target called Subscript called Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ceridwen.mailing.lists at gmail.com Tue Jun 9 16:29:36 2015 From: ceridwen.mailing.lists at gmail.com (Cara) Date: Tue, 09 Jun 2015 10:29:36 -0400 Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 49 In-Reply-To: References: Message-ID: <1433860176.2951.37.camel@gmail.com> This isn't directed at Andrew in particular but the discussion in general, because it wasn't clear to me from how everyone was using the words that the distinction Andrew mentions was clear. PEGs are a class of grammars analogous to context-free grammars, regular grammars, or (to choose a more obscure example) Boolean grammars ( http://users.utu.fi/aleokh/boolean/ ). PEGs are probably not comparable to CFGs, while Boolean grammars are a strict superset of CFGs. Like other grammars, PEGs admit multiple parsing algorithms. As Andrew says and as far as I know, OMeta uses a top-down recursive-descent algorithm with backtracking for parsing PEGs, which is why it can go exponential on some inputs. Packrat parsing is the algorithm that Bryan Ford published at the same time as he introduced PEGs, and it can parse PEGs in linear time by using memoization instead of backtracking. However, it's not the only algorithm that can parse PEGs, and while I don't know any work on this, it seems plausible that algorithms for parsing general CFGs like GLL or Earley could be adapted to parsing PEGs. Likewise, memoization can be used to avoid backtracking in a top-down recursive-descent parser for CFGs, though it's highly unlikely that any algorithm could achieve linear time for ambiguous CFGs. > > I don't know about OMeta, but the Earley parsing algorithm is > worst-cast cubic time "quadratic time for unambiguous grammars, and > linear time for almost all LR(k) grammars". > > I don't know why you'd want to use Earley for parsing a programming > language. IIRC, it was the first algorithm that could handle rampant > ambiguity in polynomial time, but that isn't relevant to parsing > programming languages (especially one like Python, which was > explicitly designed to be simple to parse), and it isn't relevant to > natural languages if you're not still in the 1960s, except in learning > the theory and history of parsing. GLR does much better in > almost-unambiguous/almost-deterministic languages; CYK can be easily > extended with weights (which propagate sensibly, so you can use them > for a final judgment, or to heuristically prune alternatives as you > go); Valiant is easier to reason about mathematically; etc. And that's > just among the parsers in the same basic family as Earley. Do you have a source for the assertion that Earley is slower than GLR? I've heard many people say this, but I've never seen any formal comparisons made since Masaru Tomita's 1985 book, "Efficient parsing for natural language: A fast algorithm for practical systems." As far as I know, this is emphatically not true for asymptotic complexity. In 1991 Joop Leo published ?A General Context-free Parsing Algorithm Running in Linear Time on Every LR(K) Grammar Without Using Lookahead," a modification to Earley's algorithm that makes it run in linear time on LR-regular grammars. The LR-regular grammars include the LR(k) grammars for all k and are in fact a strict superset of the deterministic grammars. Again, as far as I know, GLR parsers run in linear times on grammars depending on the LR table they're using, which means in most cases LR(1) or something similar. There are many variations of the GLR algorithm now, though, so it's possible there's one I don't know about that doesn't have this limitation. As for constant factors, it's possible that GLR is better than Earley. I'm reluctant to assume that Tomita's findings still hold, though, because hardware has changed radically since 1985 and because both the original versions of both GLR and Earley had bugs. While there are now versions of both algorithms that fix the bugs, the fixes may have changed the constant factors. > In fact, even if I wanted to write an amazing parser library for > Python (and I kind of do, but I don't know if I have the time), I > still don't think I'd want to suggest it as a replacement for the > parser in CPython. Writing all the backward-compat adapters and > porting the Python parser over with all its quirks intact and building > the tests to prove that it's performance and error handling were > strictly better and so on wouldn't be nearly as much fun as other > things I could do with it. I'm working on a new parser library for Python at https://github.com/ceridwen/combinators that I intend to have some of the features that have been discussed here (it can be used for scanner-less/lexer-less parsing, if one wants; a general CFG algorithm rather than some subset thereof; linear-time parsing on a reasonable subset of CFGs; integrated semantic actions; embedding in Python rather than having a separate DSL like EBNF) and some others that haven't been. The research alone has been a lot of work, and the implementation no less so. I'd barely call what I have at the moment pre-alpha. I'm exploring an implementation based on a newer general CFG parsing algorithm, GLL ( https://www.royalholloway.ac.uk/computerscience/research/csle/gllparsers.aspx ), though I'd like to compare it to Earley on constant factors. Cara From ethan at stoneleaf.us Tue Jun 9 20:00:57 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 09 Jun 2015 11:00:57 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <20150608021829.GI20701@ando.pearwood.info> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <20150608021829.GI20701@ando.pearwood.info> Message-ID: <557729D9.9000708@stoneleaf.us> On 06/07/2015 07:18 PM, Steven D'Aprano wrote: > It's not like elif, which is uneffected by any previous if or elif > clauses. Each if/elif clause is independent. This is simply not true: each "elif" encountered is only evaluated if all the previous if/elif lines failed, so you have to pay attention to those previous lines to know if execution will even get this far. > The test is always made > (assuming execution reaches that line of code at all), Exactly. -- ~Ethan~ From wolfram.hinderer at googlemail.com Tue Jun 9 20:54:00 2015 From: wolfram.hinderer at googlemail.com (Wolfram Hinderer) Date: Tue, 09 Jun 2015 20:54:00 +0200 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> Message-ID: <55773648.8090808@googlemail.com> Am 08.06.2015 um 23:26 schrieb Nick Coghlan: > On 9 June 2015 at 01:24, Wolfram Hinderer > wrote: >> Am 08.06.2015 um 14:38 schrieb Nick Coghlan: >>> On 8 June 2015 at 22:12, Steven D'Aprano wrote: >>> >>> [In relation to named subexpressions leaking to the surrounding >>> namespace by default] >>> >>>>> What does "x[(a.b as b)] = b" mean >>>> surely it simply means the same as: >>>> >>>> b = a.b >>>> x[b] = b >>> Right, but it reveals the execution order jumping around in a way that >>> is less obvious in the absence of side effects. >> I'm lost. The evaluation order of today (right hand side first) >> would make "x[(a.b as b)] = b" mean >> >> x[a.b] = b >> b = a.b >> >> (assuming looking up a.b has no side effects). > That assumption that the LHS evaluation has no side effects is the one > that gets revealed by named subexpressions: > >>>> def subscript(): > ... print("Subscript called") > ... return 0 > ... >>>> def value(): > ... print("Value called") > ... return 42 > ... >>>> def target(): > ... print("Target called") > ... return [None] > ... >>>> target()[subscript()] = value() > Value called > Target called > Subscript called > > Hm, that's my point, isn't it? The evaluation of subscript() happens after the evaluation of value(). The object that the RHS evaluates to (i.e. value()) is determined before subscript() is evaluated. Sideeffects of subscript() may mutate this object, but can't change *which* object is assigned. But if x[(a.b as b)] = b means b = a.b x[b] = b then the evaluation of the LHS *does* change which object is assigned. That's why I asked for clarification. (I mentioned the thing about a.b not having side effects only because in my alternative x[a.b] = b b = a.b a.b is called twice, so it's no exact representation of what is going on either. But it's a lot closer, at least the right object is assigned ;-) ) From ncoghlan at gmail.com Wed Jun 10 01:46:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Jun 2015 09:46:03 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <55773648.8090808@googlemail.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> Message-ID: On 10 Jun 2015 05:00, "Wolfram Hinderer" wrote: >> >> > Hm, that's my point, isn't it? > The evaluation of subscript() happens after the evaluation of value(). > The object that the RHS evaluates to (i.e. value()) is determined before subscript() is evaluated. Sideeffects of subscript() may mutate this object, but can't change *which* object is assigned. > But if > > > x[(a.b as b)] = b > > means > > b = a.b > x[b] = b That would be: x[b] = (a.b as b) > then the evaluation of the LHS *does* change which object is assigned. That's why I asked for clarification. Execution order wouldn't change, so it would mean the following: _temp = b b = a.b x[b] = _temp This means you'd get the potentially surprising behaviour where the name binding would still happen even if the subscript assignment fails. However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From abarnert at yahoo.com Wed Jun 10 02:20:38 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jun 2015 17:20:38 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> Message-ID: <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> On Jun 9, 2015, at 16:46, Nick Coghlan wrote: > > However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases. I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements. But "containing _statement_", that's easy. In addition to the function local scope that exists today, add a statement local scope. Only an as-binding expression creates a new statement-local binding, and it does so in the smallest containing statement (so, e.g., in a while statement's condition, it's the whole while statement, suite and else suite as well as the rest of the condition). These bindings shadow outer as-bindings and function-locals. Assignments inside a statement that as-binds the variable change the statement-local variable, rather than creating a function-local. Two as-bindings within the same statement are treated like an as-binding followed by assignment in the normal (possibly implementation-dependent) evaluation order (which should rarely be relevant, unless you're deliberately writing pathological code). Of course this is much more complex than Python's current rules. But it's not that hard to reason about. In particular, even in silly cases akin to "x[(a.b as b)] = b" and "x[b] = (a.b as b)", either it does what you'd naively expect or raises an UnboundLocalError; it never uses any outer value of b. And, I think, in all of the cases you actually want people to use, it means what you want it to. It even handles cases where you put multiple as bindings for the same name in different subexpressions of an expression in the same part of a statement. Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. The question is, is that the behavior you'd intuitively want, or is escaping to the rest of the smallest statement sometimes unacceptable, or are the rules about assignments inside a controlled suite wrong in some case? From rosuav at gmail.com Wed Jun 10 02:54:55 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 10 Jun 2015 10:54:55 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas wrote: > Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. > I'd actually rather see this implemented the other way around: instead of turning this into a function call, actually have a real concept of nested scoping. Nested functions imply changes to tracebacks and such, which scoping doesn't require. How hard would it be to hack the bytecode compiler to treat two names as distinct despite appearing the same? Example: def f(x): e = 2.718281828 try: return e/x except ZeroDivisionError as e: raise ContrivedCodeException from e Currently, f.__code__.co_varnames is ('x', 'e'), and all the references to e are working with slot 1; imagine if, instead, co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 instead. Then the final act of the except clause would be to unbind its local name e (slot 2), and then any code after the except block would use slot 1 for e, and the original value would "reappear". The only place that would need to "know" about the stack of scopes is the compilation step; everything after that just uses the slots. Is this feasible? ChrisA From abarnert at yahoo.com Wed Jun 10 03:58:15 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jun 2015 18:58:15 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> On Jun 9, 2015, at 17:54, Chris Angelico wrote: > > On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas > wrote: >> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. > > I'd actually rather see this implemented the other way around: instead > of turning this into a function call, actually have a real concept of > nested scoping. Nested functions imply changes to tracebacks and such, > which scoping doesn't require. > > How hard would it be to hack the bytecode compiler to treat two names > as distinct despite appearing the same? Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes. The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()? What if the outer e was nonlocal or global? What if either e is referenced by an inner function? What if another statement re-rebinds e inside the first statement? What if you do this inside a class (or at top level)?I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. (And obviously the same if some C extension calls PyFrame_LocalsToFast or equivalent.) But for a real implementation, I'm not even sure what the rules should be, much less how to implement them. (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.) Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true? It might be better to not start off worrying about how to get there from here, and instead first try to design the complete scoping rules for a language that's like Python but with nested scopes, and then identify all the places that it would differ from Python, and then decide which parts of the existing machinery you can hack up and which parts you have to completely replace. (Maybe, for example, would be easier with new bytecodes to replace LOAD_CLOSURE, LOAD_DEREF, MAKE_CLOSURE, etc. than trying to modify the data to make those bytecodes work properly.) > Example: > > def f(x): > e = 2.718281828 > try: > return e/x > except ZeroDivisionError as e: > raise ContrivedCodeException from e > > Currently, f.__code__.co_varnames is ('x', 'e'), and all the > references to e are working with slot 1; imagine if, instead, > co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 > instead. Then the final act of the except clause would be to unbind > its local name e (slot 2), > and then any code after the except block > would use slot 1 for e, and the original value would "reappear". I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff). > The only place that would need to "know" about the stack of scopes is > the compilation step; everything after that just uses the slots. Is > this feasible? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Wed Jun 10 05:27:38 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 10 Jun 2015 13:27:38 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> Message-ID: On Wed, Jun 10, 2015 at 11:58 AM, Andrew Barnert wrote: >> How hard would it be to hack the bytecode compiler to treat two names >> as distinct despite appearing the same? > > Here's a quick&dirty idea that might work: Basically, just gensyn a name like .0 for the second e (as is done for comprehensions), compile as normal, then rename the .0 back to e in the code attributes. > That's something like what I was thinking of, yeah. > The problem is how to make this interact with all kinds of other stuff. What if someone calls locals()? Ow, that one I have no idea about. Hmm. That could be majorly problematic; if you call locals() inside the inner scope, and then use that dictionary outside it, you should expect it to work. This would be hard. > What if the outer e was nonlocal or global? The inner e will always get its magic name, and it doesn't matter what the outer e is. That's exactly the same as would happen if there were no shadowing: >>> def f(x): ... global e ... try: 1/x ... except ZeroDivisionError as e: pass ... return e**x ... >>> e=2.718281828 >>> f(3) 20.085536913011932 >>> f(0) Traceback (most recent call last): File "", line 1, in File "", line 5, in f NameError: name 'e' is not defined If x is nonzero, the except clause doesn't happen, and no shadowing happens. With this theory, the same would happen if x is zero - the "as e" would effectively be "as " or whatever the magic name is, and then "e**x" would use the global e. It would have to be an error to use a global or nonlocal statement *inside* the as-governed block: def f(x): try: whatever except Exception as e: global e # SyntaxError I can't imagine that this would be a problem to anyone. The rule is that "as X" makes X into a statement-local name, and that's incompatible with a global declaration. > What if either e is referenced by an inner function? I don't know about internals and how hard it'd be, but I would expect that the as-name propagation should continue into the function. A quick check with dis.dis() suggests that CPython uses a LOAD_DEREF/STORE_DEREF bytecode to work with nonlocals, so that one might have to become scope-aware too. (It would be based on definition, not call, so it should be able to be compiled in somehow, but I can't say for sure.) > What if another statement re-rebinds e inside the first statement? As in, something like this? def f(x): e = 2.718 try: 1/0 except Exception as e: e = 1 print(e) The "e = 1" would assign to , because it's in a scope where the local name e translates into that. Any use of that name, whether rebinding or referencing, will use the inner scope. But I would expect this sort of thing to be unusual. > What if you do this inside a class (or at top level)? At top level, it would presumably have to create another global. If you call a function from inside that block, it won't see your semi-local, though I'm not sure what happens if you _define_ a function inside a block like that: with open("spam.log", "a") as logfile: def log(x): logfile.write(x) Given that this example wouldn't work anyway (the file would get closed before the function gets called), and I can't think of any non-trivial examples where you'd actually want this, I can't call what ought to happen. > I think for a quick hack to play with this, you don't have to worry about any of those issues; just say that's illegal, and whatever happens (even a segfault) is your own fault for trying it. But for a real implementation, I'm not even sure what the rules should be, much less how to implement them. > Sure, for a quick-and-dirty. I think some will be illegal long-term too. > (I'm guessing the implementation could either involve having a stack of symbol tables, or tagging things at the AST level while we've still got a tree and using that info in the last step, but I think there's still a problem telling the machinery how to set up closure cells to link inner functions' free variables.) > I have no idea about the CPython internals, but my broad thinking is something like this: You start with an empty stack, and add to it whenever you hit an "as" clause. Whenever you look up a name, you proceed through the stack from newest to oldest; if you find the name, you use the mangled name from that stack entry. Otherwise, you use the same handling as current. > Also, all of this assumes that none of the machinery, even for tracebacks and debugging, cares about the name of the variable, just its index. Is that true? > I'm not entirely sure, but I think that tracebacks etc will start with the index and then look it up. Having duplicate names in co_varnames would allow them to look correct. Can someone confirm? >> Example: >> >> def f(x): >> e = 2.718281828 >> try: >> return e/x >> except ZeroDivisionError as e: >> raise ContrivedCodeException from e >> >> Currently, f.__code__.co_varnames is ('x', 'e'), and all the >> references to e are working with slot 1; imagine if, instead, >> co_varnames were ('x', 'e', 'e') and the last two lines used slot 2 >> instead. Then the final act of the except clause would be to unbind >> its local name e (slot 2), >> and then any code after the except block >> would use slot 1 for e, and the original value would "reappear". > > I don't think that "unbind" is a real step that needs to happen. The names have to get mapped to slot numbers at compile time anyway, so if all code outside of the except clause was compiled to LOAD_FAST 1 instead of LOAD_FAST 2, it doesn't matter that slot 2 has the same name. The only thing you need to do is the existing implicit "del e" on slot 2. (If you somehow managed to do another LOAD_FAST 2 after that, it would just be an UnboundLocalError, which is fine. But no code outside the except clause can compile to that anyway, unless there's a bug in your idea of its implementation or someone does some byteplay stuff). > The unbind is there to prevent a reference loop from causing problems. And yes, it's effectively the implicit "del e" on slot 2. ChrisA From abarnert at yahoo.com Wed Jun 10 06:03:28 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Tue, 9 Jun 2015 21:03:28 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> Message-ID: <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> On Jun 9, 2015, at 20:27, Chris Angelico wrote: > > > with open("spam.log", "a") as logfile: > def log(x): > logfile.write(x) > > Given that this example wouldn't work anyway (the file would get > closed before the function gets called), and I can't think of any > non-trivial examples where you'd actually want this, I can't call what > ought to happen. The obvious one is: with open("spam.log", "a") as logfile: def log(x): logfile.write(x) do_lots_of_stuff(logfunc=log) Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement. Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure. Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :) But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first. From rosuav at gmail.com Wed Jun 10 08:17:21 2015 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 10 Jun 2015 16:17:21 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: On Wed, Jun 10, 2015 at 2:03 PM, Andrew Barnert wrote: > On Jun 9, 2015, at 20:27, Chris Angelico wrote: >> >> >> with open("spam.log", "a") as logfile: >> def log(x): >> logfile.write(x) >> >> Given that this example wouldn't work anyway (the file would get >> closed before the function gets called), and I can't think of any >> non-trivial examples where you'd actually want this, I can't call what >> ought to happen. > > The obvious one is: > > with open("spam.log", "a") as logfile: > def log(x): > logfile.write(x) > do_lots_of_stuff(logfunc=log) > > Of course in this case you could just pass logfile.write instead of a function, but more generally, anywhere you create a helper or callback as a closure to use immediately (e.g., in a SAX parser) instead of later (e.g., in a network server or GUI) it makes sense to put a closure inside a with statement. > Sure. In this example, there'd have to be some kind of "thing" that exists as a global, and can be referenced by the log function. That's not too hard; the usage all starts and ends inside the duration of the "as" effect; any other global named "logfile" would simply be unavailable. The confusion would come if you try to span the boundary in some way - when it would be possible to call log(logfile) and have it write to the log file defined by the with block, but have its argument come from outside. At very least, that would want to be strongly discouraged for reasons of readability. > Also, remember that the whole point here is to extend as-binding so it works in if and while conditions, and maybe arbitrary expressions, and those cases it's even more obvious why you'd want to create a closure. > > Anyway, I think I know what all the compiled bytecode and code attributes for that case could look like (although I'd need to think through the edge cases), I'm just not sure if the code that compiles it today will be able to handle things without some rename-and-rename-back hack. I suppose the obvious answer is for someone to just try writing it and see. :) > > But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first. > Okay. I'll start poking around with CPython and see what I can do. I'm reminded of that spectacular slide from David Beazley's talk on CPython and PyPy tinkering, where he has that VW called CPython, and then talks about patches, extensions, PEPs... and python-ideas. https://www.youtube.com/watch?v=l_HBRhcgeuQ at the four minute mark. ChrisA From rosuav at gmail.com Wed Jun 10 17:06:26 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Jun 2015 01:06:26 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico wrote: >> But I think your quick&dirty hack may be worth playing with even if it bans this possibility and a few others, and may not be that hard to do if you make that decision, so if I were you I'd try that first. >> > > Okay. I'll start poking around with CPython and see what I can do. Here's a gross, disgusting, brutal hack. It applies only to try/except (but can easily be expanded to other places; it's just a matter of calling one function at top and bottom), and it currently assumes that you're in a function scope (not at top level, not directly in a class; methods are supported). (Should I create a tracker issue? It's not even at proof-of-concept at this point.) Here's how it works: As an 'except' block is entered (at compilation stage), a new subscope is defined. At the end of the except block, after the "e = None; del e" opcodes get added in, the subscope is popped off and disposed of. So long as there is a subscope attached to the current compilation unit, any name lookups will be redirected through it. Finally, when co_varnames is populated, names get de-mangled, thus (possibly) making duplicates in the tuple, but more importantly, getting tracebacks and such looking correct. The subscope is a tiny thing that just says "this name now becomes that mangled name", where the mangled name is the original name dot something (eg mangle "e" and get back "e.0x12345678"); they're stored in a linked list in the current compiler_unit. Currently, locals() basically ignores the magic. If there is no "regular" name to be shadowed, then it correctly picks up the interior one; if there are both forms, I've no idea how it picks which one to put into the dictionary, but it certainly can't logically retain both. The fact that it manages to not crash and burn is, in my opinion, pure luck :) Can compiler_nameop() depend on all names being interned? I have a full-on PyObject_RichCompareBool() to check for name equality; if they're all interned, I could simply do a pointer comparison instead. Next plan: Change compiler_comprehension_generator() to use subscopes rather than a full nested function, and then do performance testing. Currently, this can only have slowed things down. Removing the function call overhead from list comps could give that speed back. ChrisA -------------- next part -------------- A non-text attachment was scrubbed... Name: scope_hack.patch Type: text/x-patch Size: 5783 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nesting_demo.py Type: text/x-python Size: 813 bytes Desc: not available URL: From rymg19 at gmail.com Wed Jun 10 17:12:19 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 10 Jun 2015 10:12:19 -0500 Subject: [Python-ideas] If branch merging In-Reply-To: References: <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: Maybe it's just me, but nesting_demo.py has several junk characters at the end (^@). On June 10, 2015 10:06:26 AM CDT, Chris Angelico wrote: >On Wed, Jun 10, 2015 at 4:17 PM, Chris Angelico >wrote: >>> But I think your quick&dirty hack may be worth playing with even if >it bans this possibility and a few others, and may not be that hard to >do if you make that decision, so if I were you I'd try that first. >>> >> >> Okay. I'll start poking around with CPython and see what I can do. > >Here's a gross, disgusting, brutal hack. It applies only to try/except >(but can easily be expanded to other places; it's just a matter of >calling one function at top and bottom), and it currently assumes that >you're in a function scope (not at top level, not directly in a class; >methods are supported). > >(Should I create a tracker issue? It's not even at proof-of-concept at >this point.) > >Here's how it works: As an 'except' block is entered (at compilation >stage), a new subscope is defined. At the end of the except block, >after the "e = None; del e" opcodes get added in, the subscope is >popped off and disposed of. So long as there is a subscope attached to >the current compilation unit, any name lookups will be redirected >through it. Finally, when co_varnames is populated, names get >de-mangled, thus (possibly) making duplicates in the tuple, but more >importantly, getting tracebacks and such looking correct. > >The subscope is a tiny thing that just says "this name now becomes >that mangled name", where the mangled name is the original name dot >something (eg mangle "e" and get back "e.0x12345678"); they're stored >in a linked list in the current compiler_unit. > >Currently, locals() basically ignores the magic. If there is no >"regular" name to be shadowed, then it correctly picks up the interior >one; if there are both forms, I've no idea how it picks which one to >put into the dictionary, but it certainly can't logically retain both. >The fact that it manages to not crash and burn is, in my opinion, pure >luck :) > >Can compiler_nameop() depend on all names being interned? I have a >full-on PyObject_RichCompareBool() to check for name equality; if >they're all interned, I could simply do a pointer comparison instead. > >Next plan: Change compiler_comprehension_generator() to use subscopes >rather than a full nested function, and then do performance testing. >Currently, this can only have slowed things down. Removing the >function call overhead from list comps could give that speed back. > >ChrisA > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Jun 10 17:15:19 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Jun 2015 01:15:19 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: On Thu, Jun 11, 2015 at 1:12 AM, Ryan Gonzalez wrote: > Maybe it's just me, but nesting_demo.py has several junk characters at the > end (^@). > Hmm, I just redownloaded it, and it appears correct. The end of the file has some triple-quoted strings, the last one ends with three double quote characters and then a newline, then that's it. But maybe that's Gmail being too smart and just giving me back what I sent. ChrisA From joejev at gmail.com Wed Jun 10 17:33:08 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 11:33:08 -0400 Subject: [Python-ideas] slice.literal notation Message-ID: I was told in the thread that it might be a good idea to bring this up on python discussions. Here is a link to the proposed patch and some existing comments: http://bugs.python.org/issue24379 I often find that when working with pandas and numpy I want to store slice objects in variables to pass around and re-use; however, the syntax for constructing a slice literal outside of an indexer is very different from the syntax used inside of a subscript. This patch proposes the following change: slice.literal This would be a singleton instance of a class that looks like: class sliceliteral(object): def __getitem__(self, key): return key The basic idea is to provide an alternative constructor to 'slice' that uses the subscript syntax. This allows people to write more understandable code. Consider the following examples: reverse = slice(None, None, -1) reverse = slice.literal[::-1] all_rows_first_col = slice(None), slice(0) all_rows_first_col = slice.literal[:, 0] first_row_all_cols_but_last = slice(0), slice(None, -1) first_row_all_cols_but_last = slice.literal[0, :-1] Again, this is not intended to make the code shorter, instead, it is designed to make it more clear what the slice object your are constructing looks like. Another feature of the new `literal` object is that it is not limited to just the creation of `slice` instances; instead, it is designed to mix slices and other types together. For example: >>> slice.literal[0] 0 >>> slice.literal[0, 1] (0, 1) >>> slice.literal[0, 1:] (0, slice(1, None, None) >>> slice.literal[:, ..., ::-1] (slice(None, None, None), Ellipsis, slice(None, None, -1) These examples show that sometimes the subscript notation is much more clear that the non-subscript notation. I believe that while this is trivial, it is very convinient to have on the slice type itself so that it is quickly available. This also prevents everyone from rolling their own version that is accesible in different ways (think Py_RETURN_NONE). Another reason that chose this aproach is that it requires no change to the syntax to support. There is a second change proposed here and that is to 'slice.__repr__'. This change makes the repr of a slice object match the new literal syntax to make it easier to read. >>> slice.literal[:] slice.literal[:] >>> slice.literal[1:] slice.literal[1:] >>> slice.literal[1:-1] slice.literal[1:-1] >>> slice.literal[:-1] slice.literal[:-1] >>> slice.literal[::-1] slice.literal[::-1] This change actually affects old behaviour so I am going to upload it as a seperate patch. I understand that the change to repr much be less desirable than the addition of 'slice.literal' -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Wed Jun 10 18:01:49 2015 From: taleinat at gmail.com (Tal Einat) Date: Wed, 10 Jun 2015 19:01:49 +0300 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 6:33 PM, Joseph Jevnik wrote: > I was told in the thread that it might be a good idea to bring this up on > python discussions. Here is a link to the proposed patch and some existing > comments: http://bugs.python.org/issue24379 > > I often find that when working with pandas and numpy I want to store slice > objects in variables to pass around and re-use; however, the syntax for > constructing a slice literal outside of an indexer is very different from > the syntax used inside of a subscript. This patch proposes the following > change: > > slice.literal > > This would be a singleton instance of a class that looks like: > > class sliceliteral(object): > def __getitem__(self, key): > return key > > > The basic idea is to provide an alternative constructor to 'slice' that uses > the subscript syntax. This allows people to write more understandable code. > > Consider the following examples: > > reverse = slice(None, None, -1) > reverse = slice.literal[::-1] > > all_rows_first_col = slice(None), slice(0) > all_rows_first_col = slice.literal[:, 0] > > first_row_all_cols_but_last = slice(0), slice(None, -1) > first_row_all_cols_but_last = slice.literal[0, :-1] > > > Again, this is not intended to make the code shorter, instead, it is > designed to make it more clear what the slice object your are constructing > looks like. > > Another feature of the new `literal` object is that it is not limited to > just the creation of `slice` instances; instead, it is designed to mix > slices and other types together. For example: > >>>> slice.literal[0] > 0 >>>> slice.literal[0, 1] > (0, 1) >>>> slice.literal[0, 1:] > (0, slice(1, None, None) >>>> slice.literal[:, ..., ::-1] > (slice(None, None, None), Ellipsis, slice(None, None, -1) > > These examples show that sometimes the subscript notation is much more clear > that the non-subscript notation. > I believe that while this is trivial, it is very convinient to have on the > slice type itself so that it is quickly available. This also prevents > everyone from rolling their own version that is accesible in different ways > (think Py_RETURN_NONE). > Another reason that chose this aproach is that it requires no change to the > syntax to support. In regard with the first suggestion, this has already been mentioned on the tracker but is important enough to repeat here: This already exists in NumPy as IndexExpression, used via numpy.S_ or numpy.index_exp. For details, see: http://docs.scipy.org/doc/numpy/reference/generated/numpy.s_.html - Tal Einat From random832 at fastmail.us Wed Jun 10 18:03:06 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 10 Jun 2015 12:03:06 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: <1433952186.561727.292000441.15D5ECC2@webmail.messagingengine.com> On Wed, Jun 10, 2015, at 11:33, Joseph Jevnik wrote: > ... What about slice[...]? From joejev at gmail.com Wed Jun 10 18:10:47 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 12:10:47 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: <1433952186.561727.292000441.15D5ECC2@webmail.messagingengine.com> References: <1433952186.561727.292000441.15D5ECC2@webmail.messagingengine.com> Message-ID: I considered `slice[...]` however, this will change some existing behaviour. This would mean we need to put a metaclass on slice, and then `type(slice) is type` would no longer be true. Also, with 3.5's typing work, we are overloading the meaning of indexing a type object. Adding the slice.literal does not break anything or conflict with any syntax. On Wed, Jun 10, 2015 at 12:03 PM, wrote: > On Wed, Jun 10, 2015, at 11:33, Joseph Jevnik wrote: > > ... > > What about slice[...]? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jun 10 18:20:45 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Jun 2015 18:20:45 +0200 Subject: [Python-ideas] slice.literal notation References: Message-ID: <20150610182045.01eb4ee7@fsol> On Wed, 10 Jun 2015 19:01:49 +0300 Tal Einat wrote: > > > >>>> slice.literal[0] > > 0 > >>>> slice.literal[0, 1] > > (0, 1) > >>>> slice.literal[0, 1:] > > (0, slice(1, None, None) > >>>> slice.literal[:, ..., ::-1] > > (slice(None, None, None), Ellipsis, slice(None, None, -1) > > > > These examples show that sometimes the subscript notation is much more clear > > that the non-subscript notation. Agreed. > > I believe that while this is trivial, it is very convinient to have on the > > slice type itself so that it is quickly available. This also prevents > > everyone from rolling their own version that is accesible in different ways > > (think Py_RETURN_NONE). > > Another reason that chose this aproach is that it requires no change to the > > syntax to support. > > In regard with the first suggestion, this has already been mentioned > on the tracker but is important enough to repeat here: This already > exists in NumPy as IndexExpression, used via numpy.S_ or > numpy.index_exp. Probably, but it looks useful to enough to integrate the standard library. Another possible place for it would be the ast module. Regards Antoine. From joejev at gmail.com Wed Jun 10 18:23:32 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 12:23:32 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: <20150610182045.01eb4ee7@fsol> References: <20150610182045.01eb4ee7@fsol> Message-ID: I am not sure if this makes sense in the ast module only because it does not generate _ast.Slice objects and instead returns the keys. On Wed, Jun 10, 2015 at 12:20 PM, Antoine Pitrou wrote: > On Wed, 10 Jun 2015 19:01:49 +0300 > Tal Einat wrote: > > > > > >>>> slice.literal[0] > > > 0 > > >>>> slice.literal[0, 1] > > > (0, 1) > > >>>> slice.literal[0, 1:] > > > (0, slice(1, None, None) > > >>>> slice.literal[:, ..., ::-1] > > > (slice(None, None, None), Ellipsis, slice(None, None, -1) > > > > > > These examples show that sometimes the subscript notation is much more > clear > > > that the non-subscript notation. > > Agreed. > > > > I believe that while this is trivial, it is very convinient to have on > the > > > slice type itself so that it is quickly available. This also prevents > > > everyone from rolling their own version that is accesible in different > ways > > > (think Py_RETURN_NONE). > > > Another reason that chose this aproach is that it requires no change > to the > > > syntax to support. > > > > In regard with the first suggestion, this has already been mentioned > > on the tracker but is important enough to repeat here: This already > > exists in NumPy as IndexExpression, used via numpy.S_ or > > numpy.index_exp. > > Probably, but it looks useful to enough to integrate the standard > library. > Another possible place for it would be the ast module. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jun 10 18:26:42 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Jun 2015 18:26:42 +0200 Subject: [Python-ideas] slice.literal notation References: <20150610182045.01eb4ee7@fsol> Message-ID: <20150610182642.27fdf6a5@fsol> On Wed, 10 Jun 2015 12:23:32 -0400 Joseph Jevnik wrote: > I am not sure if this makes sense in the ast module only because it does > not generate _ast.Slice objects and instead returns the keys. There's already ast.literal_eval() there, so that was why I thought it could be related. Thought at literal_eval() *compiles* its input, which slice.literal wouldn't, so the relationship is quite distant... Regards Antoine. From mertz at gnosis.cx Wed Jun 10 21:05:38 2015 From: mertz at gnosis.cx (David Mertz) Date: Wed, 10 Jun 2015 12:05:38 -0700 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: +1 This is an elegant improvement that doesn't affect backward compatibility. Obviously, the difference between the spelling 'sliceliteral[::-1]' and 'slice.literal[::-1]' isn't that big, but having it attached to the slice type itself rather than a user class feels more natural. On Wed, Jun 10, 2015 at 8:33 AM, Joseph Jevnik wrote: > I was told in the thread that it might be a good idea to bring this up on > python discussions. Here is a link to the proposed patch and some existing > comments: http://bugs.python.org/issue24379 > > I often find that when working with pandas and numpy I want to store slice > objects in variables to pass around and re-use; however, the syntax for > constructing a slice literal outside of an indexer is very different from > the syntax used inside of a subscript. This patch proposes the following > change: > > slice.literal > > This would be a singleton instance of a class that looks like: > > class sliceliteral(object): > def __getitem__(self, key): > return key > > > The basic idea is to provide an alternative constructor to 'slice' that > uses the subscript syntax. This allows people to write more understandable > code. > > Consider the following examples: > > reverse = slice(None, None, -1) > reverse = slice.literal[::-1] > > all_rows_first_col = slice(None), slice(0) > all_rows_first_col = slice.literal[:, 0] > > first_row_all_cols_but_last = slice(0), slice(None, -1) > first_row_all_cols_but_last = slice.literal[0, :-1] > > > Again, this is not intended to make the code shorter, instead, it is > designed to make it more clear what the slice object your are constructing > looks like. > > Another feature of the new `literal` object is that it is not limited to > just the creation of `slice` instances; instead, it is designed to mix > slices and other types together. For example: > > >>> slice.literal[0] > 0 > >>> slice.literal[0, 1] > (0, 1) > >>> slice.literal[0, 1:] > (0, slice(1, None, None) > >>> slice.literal[:, ..., ::-1] > (slice(None, None, None), Ellipsis, slice(None, None, -1) > > These examples show that sometimes the subscript notation is much more > clear that the non-subscript notation. > I believe that while this is trivial, it is very convinient to have on the > slice type itself so that it is quickly available. This also prevents > everyone from rolling their own version that is accesible in different ways > (think Py_RETURN_NONE). > Another reason that chose this aproach is that it requires no change to > the syntax to support. > > There is a second change proposed here and that is to 'slice.__repr__'. > This change makes the repr of a slice object match the new literal syntax > to make it easier to read. > > >>> slice.literal[:] > slice.literal[:] > >>> slice.literal[1:] > slice.literal[1:] > >>> slice.literal[1:-1] > slice.literal[1:-1] > >>> slice.literal[:-1] > slice.literal[:-1] > >>> slice.literal[::-1] > slice.literal[::-1] > > This change actually affects old behaviour so I am going to upload it as a > seperate patch. I understand that the change to repr much be less desirable > than the addition of 'slice.literal' > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 10 21:16:42 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Jun 2015 12:16:42 -0700 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: <55788D1A.4050303@stoneleaf.us> On 06/10/2015 08:33 AM, Joseph Jevnik wrote: > The basic idea is to provide an alternative constructor to 'slice' that uses > the subscript syntax. This allows people to write more understandable code. +1 > There is a second change proposed here and that is to 'slice.__repr__'. This > change makes the repr of a slice object match the new literal syntax to make > it easier to read. -1 Having the old repr makes it possible to see what the equivalent slice() spelling is. -- ~Ethan~ From joejev at gmail.com Wed Jun 10 21:18:24 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 15:18:24 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: <55788D1A.4050303@stoneleaf.us> References: <55788D1A.4050303@stoneleaf.us> Message-ID: Ethan, I am also not 100% on the new repr, I just wanted to propose this change. In the issue, I have separated that change into it's own patch to make it easier to apply the slice.literal without the repr update. On Wed, Jun 10, 2015 at 3:16 PM, Ethan Furman wrote: > On 06/10/2015 08:33 AM, Joseph Jevnik wrote: > > The basic idea is to provide an alternative constructor to 'slice' that >> uses >> the subscript syntax. This allows people to write more understandable >> code. >> > > +1 > > > There is a second change proposed here and that is to 'slice.__repr__'. >> This >> change makes the repr of a slice object match the new literal syntax to >> make >> it easier to read. >> > > -1 > > Having the old repr makes it possible to see what the equivalent slice() > spelling is. > > -- > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Wed Jun 10 22:21:29 2015 From: taleinat at gmail.com (Tal Einat) Date: Wed, 10 Jun 2015 23:21:29 +0300 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Wed, Jun 10, 2015 at 10:05 PM, David Mertz wrote: > > On Wed, Jun 10, 2015 at 8:33 AM, Joseph Jevnik wrote: >> >> I was told in the thread that it might be a good idea to bring this up on >> python discussions. Here is a link to the proposed patch and some existing >> comments: http://bugs.python.org/issue24379 >> >> I often find that when working with pandas and numpy I want to store slice >> objects in variables to pass around and re-use; however, the syntax for >> constructing a slice literal outside of an indexer is very different from >> the syntax used inside of a subscript. This patch proposes the following >> change: >> >> slice.literal >> >> This would be a singleton instance of a class that looks like: >> >> class sliceliteral(object): >> def __getitem__(self, key): >> return key >> >> >> The basic idea is to provide an alternative constructor to 'slice' that >> uses the subscript syntax. This allows people to write more understandable >> code. >> >> Consider the following examples: >> >> reverse = slice(None, None, -1) >> reverse = slice.literal[::-1] >> >> all_rows_first_col = slice(None), slice(0) >> all_rows_first_col = slice.literal[:, 0] >> >> first_row_all_cols_but_last = slice(0), slice(None, -1) >> first_row_all_cols_but_last = slice.literal[0, :-1] >> >> >> Again, this is not intended to make the code shorter, instead, it is >> designed to make it more clear what the slice object your are constructing >> looks like. >> >> Another feature of the new `literal` object is that it is not limited to >> just the creation of `slice` instances; instead, it is designed to mix >> slices and other types together. For example: >> >> >>> slice.literal[0] >> 0 >> >>> slice.literal[0, 1] >> (0, 1) >> >>> slice.literal[0, 1:] >> (0, slice(1, None, None) >> >>> slice.literal[:, ..., ::-1] >> (slice(None, None, None), Ellipsis, slice(None, None, -1) >> >> These examples show that sometimes the subscript notation is much more >> clear that the non-subscript notation. >> I believe that while this is trivial, it is very convinient to have on the >> slice type itself so that it is quickly available. This also prevents >> everyone from rolling their own version that is accesible in different ways >> (think Py_RETURN_NONE). >> Another reason that chose this aproach is that it requires no change to >> the syntax to support. > > +1 > > This is an elegant improvement that doesn't affect backward compatibility. > Obviously, the difference between the spelling 'sliceliteral[::-1]' and > 'slice.literal[::-1]' isn't that big, but having it attached to the slice > type itself rather than a user class feels more natural. I dislike adding this to the slice class since many use cases don't result in a slice at all. For example: [0] -> int [...] -> Ellipsis [0:1, 2:3] -> 2-tuple of slice object I like NumPy's name of IndexExpression, perhaps we can stick to that? As for where it would reside, some possibilities are: * the operator module * as part of the collections.abc.Sequence abstract base class * the types module * builtins - Tal From random832 at fastmail.us Wed Jun 10 22:26:52 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 10 Jun 2015 16:26:52 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: <55788D1A.4050303@stoneleaf.us> References: <55788D1A.4050303@stoneleaf.us> Message-ID: <1433968012.620935.292234769.42B13457@webmail.messagingengine.com> On Wed, Jun 10, 2015, at 15:16, Ethan Furman wrote: > On 06/10/2015 08:33 AM, Joseph Jevnik wrote: > > There is a second change proposed here and that is to 'slice.__repr__'. This > > change makes the repr of a slice object match the new literal syntax to make > > it easier to read. > > -1 > > Having the old repr makes it possible to see what the equivalent slice() > spelling is. How about a separate method, slice.as_index_syntax() that just returns '::-1'? From greg.ewing at canterbury.ac.nz Thu Jun 11 00:16:55 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Jun 2015 10:16:55 +1200 Subject: [Python-ideas] If branch merging In-Reply-To: References: <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: <5578B757.2000002@canterbury.ac.nz> Chris Angelico wrote: > How hard would it be to hack the bytecode compiler to treat two names > as distinct despite appearing the same? Back when list comprehensions were changed to not leak the variable, it was apparently considered too hard to be worth the effort, since we ended up with the nested function implementation. -- Greg From greg.ewing at canterbury.ac.nz Thu Jun 11 00:29:01 2015 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Jun 2015 10:29:01 +1200 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: <5578BA2D.4030707@canterbury.ac.nz> David Mertz wrote: > This > patch proposes the following change: > > slice.literal It would be even nicer if the slice class itself implemented the [] syntax: myslice = slice[1:2] -- Greg From joejev at gmail.com Thu Jun 11 00:30:22 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 18:30:22 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: <5578BA2D.4030707@canterbury.ac.nz> References: <5578BA2D.4030707@canterbury.ac.nz> Message-ID: Do you think that the dropping '.literal' is worth the change in behaviour? On Wed, Jun 10, 2015 at 6:29 PM, Greg Ewing wrote: > David Mertz wrote: > > This >> patch proposes the following change: >> >> slice.literal >> > > It would be even nicer if the slice class itself > implemented the [] syntax: > > myslice = slice[1:2] > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Thu Jun 11 01:38:55 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Jun 2015 09:38:55 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <5578B757.2000002@canterbury.ac.nz> References: <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <5578B757.2000002@canterbury.ac.nz> Message-ID: On Thu, Jun 11, 2015 at 8:16 AM, Greg Ewing wrote: > Chris Angelico wrote: >> >> How hard would it be to hack the bytecode compiler to treat two names >> as distinct despite appearing the same? > > > Back when list comprehensions were changed to not leak > the variable, it was apparently considered too hard to > be worth the effort, since we ended up with the nested > function implementation. Yeah. I now have a brutal hack that does exactly that, so I'm fully expecting someone to point out "Uhh, this isn't going to work because...". ChrisA From tjreedy at udel.edu Thu Jun 11 01:43:13 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 10 Jun 2015 19:43:13 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On 6/10/2015 11:33 AM, Joseph Jevnik wrote: > I often find that when working with pandas and numpy I want to store > slice objects in variables to pass around and re-use; however, the > syntax for constructing a slice literal outside of an indexer is very > different from the syntax used inside of a subscript. This patch > proposes the following change: > > slice.literal > > This would be a singleton instance of a class that looks like: > > class sliceliteral(object): > def __getitem__(self, key): > return key Alternate constructors are implemented as class methods. class slice: ... @classmethod def literal(cls, key): if isinstance(key, cls): return key else: else raise ValueError('slice literal mush be slice') They are typically names fromxyz or from_xyz. Tal Einat pointed out that not all keys are slices > [0] -> int > [...] -> Ellipsis > [0:1, 2:3] -> 2-tuple of slice object I think the first two cases should value errors. The third might be debated, but if allowed, this would not be a slice constructor. -- Terry Jan Reedy From joejev at gmail.com Thu Jun 11 01:45:34 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 10 Jun 2015 19:45:34 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: We cannot use a class method here because slice.literal(:) is a syntax error. On Wed, Jun 10, 2015 at 7:43 PM, Terry Reedy wrote: > On 6/10/2015 11:33 AM, Joseph Jevnik wrote: > > I often find that when working with pandas and numpy I want to store >> slice objects in variables to pass around and re-use; however, the >> syntax for constructing a slice literal outside of an indexer is very >> different from the syntax used inside of a subscript. This patch >> proposes the following change: >> >> slice.literal >> >> This would be a singleton instance of a class that looks like: >> >> class sliceliteral(object): >> def __getitem__(self, key): >> return key >> > > Alternate constructors are implemented as class methods. > > class slice: > ... > @classmethod > def literal(cls, key): > if isinstance(key, cls): > return key > else: > else raise ValueError('slice literal mush be slice') > > They are typically names fromxyz or from_xyz. > > Tal Einat pointed out that not all keys are slices > > > [0] -> int > > [...] -> Ellipsis > > [0:1, 2:3] -> 2-tuple of slice object > > I think the first two cases should value errors. The third might be > debated, but if allowed, this would not be a slice constructor. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 11 02:54:46 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Jun 2015 10:54:46 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: On 10 June 2015 at 10:20, Andrew Barnert wrote: > On Jun 9, 2015, at 16:46, Nick Coghlan wrote: >> >> However if name bindings *didn't* leak out of their containing expression by default, and while/if/elif code generation instead gained machinery to retrieve the name bindings for any named subexpressions in the condition, that would eliminate most of the potentially bizarre edge cases. > > I don't think here's any consistent way to define "containing expression" that makes any sense for while/if statements. Sure there is. The gist of the basic "no leak" behaviour could be something like: 1. Any expression containing a named subexpression would automatically be converted to a lambda expression that is defined and called inline (expressions that already implicitly define their own scope, specifically comprehensions and generation expressions, would terminate the search for the "containing expression" node and allow this step to be skipped). 2. Any name references from within the expression that are not references to named subexpressions or comprehension iteration variables would be converted to parameter names for the implicitly defined lambda expression, and thus resolved in the containing scope rather than the nested scope. In that basic mode, the only thing made available from the implicitly created scope would be the result of the lambda expression. Something like: x = (250 as a)*a + b would be equivalent to: x= (lambda b: ((250 as a)*a + b))(b) if/elif/while clauses would define the behaviour of their conditional expressions slightly differently: for those, the values of any named subexpressions would also be passed back out, allowing them to be bound appropriately in the outer scope (requiring compatibility with class and module namespaces means it wouldn't be possible to use cell references here). Whether there should be a separate "bindlocal" statement for lifting named subexpressions out of an expression and binding them all locally would be an interesting question - I can't think of a good *use case* for that, but it would be a good hook for explaining the difference between the default behaviour of named subexpressions and the variant used in if/elif/while conditional expressions. > But "containing _statement_", that's easy. No, it's not, because statements already contain name binding operations that persist beyond the scope of the statement. In addition to actual assignment statements, there are also for loops, with statements, class definitions and function definitions. Having the presence of a named subexpression magically change the scope of the statement level name binding operations wouldn't be acceptable, and having some name bindings propagate but not others gets very tricky in the general case. (PEP's 403 and 3150 go into some of the complexities) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 11 03:16:00 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Jun 2015 11:16:00 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: On 10 June 2015 at 10:54, Chris Angelico wrote: > On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas > wrote: >> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. >> > > I'd actually rather see this implemented the other way around: instead > of turning this into a function call, actually have a real concept of > nested scoping. Nested functions imply changes to tracebacks and such, > which scoping doesn't require. > > How hard would it be to hack the bytecode compiler to treat two names > as distinct despite appearing the same? I tried to do this when working with Georg Brandl to implement the Python 3 change to hide the iteration variable in comprehensions and generator expressions, and I eventually gave up and used an implicit local function definition: https://mail.python.org/pipermail/python-3000/2007-March/006017.html This earlier post from just before we started working on that covers some of the approaches I tried, as well as noting why this problem is much harder than it might first seem: https://mail.python.org/pipermail/python-3000/2006-December/005207.html One of the other benefits that I don't believe came up in either of those threads is that using real frames for implicit scoping means that *other tools* already know how to cope with it - pdb, gdb, inspect, dis, traceback, etc, are all able to deal with what's going on. If you introduce a new *kind* of scope, rather than just implicitly using another level of our *existing* scoping rules, then there's a whole constellation of tools (including other interpreter implementations) that will need adjusting to model an entirely new semantic concept, rather than another instance of an existing concept. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 11 03:30:37 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Jun 2015 11:30:37 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> Message-ID: On 11 June 2015 at 11:16, Nick Coghlan wrote: > On 10 June 2015 at 10:54, Chris Angelico wrote: >> On Wed, Jun 10, 2015 at 10:20 AM, Andrew Barnert via Python-ideas >> wrote: >>> Now, for implementation: any statement that contains an as expression anywhere is compiled to a function definition and a call to that function. The only trick is that any free variables have to be compiled as nonlocals in the inner function and as captured locals in the real function. (This trick doesn't have to apply to lambdas or comprehensions, because they can't have assignment statements inside them, but a while statement can.) I believe this scales to nested statements with as-bindings, and to as-bindings inside explicit local functions and vice-versa. >>> >> >> I'd actually rather see this implemented the other way around: instead >> of turning this into a function call, actually have a real concept of >> nested scoping. Nested functions imply changes to tracebacks and such, >> which scoping doesn't require. >> >> How hard would it be to hack the bytecode compiler to treat two names >> as distinct despite appearing the same? > > I tried to do this when working with Georg Brandl to implement the > Python 3 change to hide the iteration variable in comprehensions and > generator expressions, and I eventually gave up and used an implicit > local function definition: > https://mail.python.org/pipermail/python-3000/2007-March/006017.html Re-reading that post, I found this: https://mail.python.org/pipermail/python-3000/2007-March/006085.html I don't think anyone has yet tried speeding up simple function level cases at the peephole optimiser stage of the code generation pipeline (at module and class level, the nested function is already often a speed increase due to the use of optimised local variable access in the implicitly created function scope). However, I'm not sure our pattern matching is really up to the task of detecting this at bytecode generation time - doing something about in a JIT-compiled runtime like PyPy, Numba or Pyston might be more feasible. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 11 03:54:03 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 11 Jun 2015 11:54:03 +1000 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On 11 June 2015 at 06:21, Tal Einat wrote: > On Wed, Jun 10, 2015 at 10:05 PM, David Mertz wrote: >> This is an elegant improvement that doesn't affect backward compatibility. >> Obviously, the difference between the spelling 'sliceliteral[::-1]' and >> 'slice.literal[::-1]' isn't that big, but having it attached to the slice >> type itself rather than a user class feels more natural. > > I dislike adding this to the slice class since many use cases don't > result in a slice at all. For example: > > [0] -> int > [...] -> Ellipsis > [0:1, 2:3] -> 2-tuple of slice object > > I like NumPy's name of IndexExpression, perhaps we can stick to that? > > As for where it would reside, some possibilities are: > > * the operator module > * as part of the collections.abc.Sequence abstract base class > * the types module > * builtins I'm with Tal here - I like the concept, don't like the spelling because it may return things other than slice objects. While the formal name of the operation denoted by trailing square brackets "[]" is "subscript" (with indexing and slicing being only two of its several use cases), the actual *protocol* involved in implementing that operation is getitem/setitem/delitem, so using the formal name would count as "non-obvious" in my view. Accordingly, I'd suggest putting this in under the name "operator.itemkey" (no underscore because the operator module traditionally omits them). zero = operator.itemkey[0] ellipsis = operator.itemkey[...] reverse = slice(None, None, -1) reverse = operator.itemkey[::-1] all_rows_first_col = slice(None), slice(0) all_rows_first_col = operator.itemkey[:, 0] first_row_all_cols_but_last = slice(0), slice(None, -1) first_row_all_cols_but_last = operator.itemkey[0, :-1] Documentation would say that indexing into this object produces the result of the key transformation step of getitem/setitem/delitem Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From abarnert at yahoo.com Thu Jun 11 10:38:24 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jun 2015 01:38:24 -0700 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Jun 10, 2015, at 18:54, Nick Coghlan wrote: > > I'm with Tal here - I like the concept, don't like the spelling > because it may return things other than slice objects. > > While the formal name of the operation denoted by trailing square > brackets "[]" is "subscript" (with indexing and slicing being only two > of its several use cases), the actual *protocol* involved in > implementing that operation is getitem/setitem/delitem, so using the > formal name would count as "non-obvious" in my view. > > Accordingly, I'd suggest putting this in under the name > "operator.itemkey" (no underscore because the operator module > traditionally omits them). That name seems a little odd. Normally by "key", you mean the thing you subscript a mapping with, as opposed to an index, the thing you subscript a sequence with (either specifically an integer, or the broader sense of an integer, a slice, an ellipsis, or a tuple of indices recursively). (Of course you _can_ use this with a mapping key, but then it just returns the same key you passed in, which isn't very useful, except in allowing generic code that doesn't know whether it has a key or an index and wants to pass it on to a mapping or sequence, which obviously isn't the main use here.) "itemindex" avoids the main problem with "itemkey", but it still shares the secondary problem of burying the fact that this is about slices (and tuples of plain indices and slices), not just (or even primarily) plain indices. I agree with you that "subscript" isn't a very good name either. I guess "lookup" is another possibility, and it parallels "LookupError" being the common base class of "IndexError" and "KeyError", but that sounds even less meaningful than "subscript" to me. So, I don't have a good name to offer. One last thing: Would it be worth adding bracket syntax to itemgetter, to make it easier to create slicing functions? (That wouldn't remove the need for this function, or vice versa, but since we're in operator and adding a thing that gets "called" with brackets...) > zero = operator.itemkey[0] > > ellipsis = operator.itemkey[...] > > reverse = slice(None, None, -1) > reverse = operator.itemkey[::-1] > > all_rows_first_col = slice(None), slice(0) > all_rows_first_col = operator.itemkey[:, 0] > > first_row_all_cols_but_last = slice(0), slice(None, -1) > first_row_all_cols_but_last = operator.itemkey[0, :-1] > > Documentation would say that indexing into this object produces the > result of the key transformation step of getitem/setitem/delitem > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rosuav at gmail.com Thu Jun 11 12:56:35 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 11 Jun 2015 20:56:35 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: References: <69053CA1-5779-4617-BF5C-C8B692ADBD5A@yahoo.com> <877frfrkrs.fsf@uwakimon.sk.tsukuba.ac.jp> <52A4CDB1-7B8D-4DF6-B370-472BC695930A@yahoo.com> <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: On Thu, Jun 11, 2015 at 1:06 AM, Chris Angelico wrote: > Next plan: Change compiler_comprehension_generator() to use subscopes > rather than a full nested function, and then do performance testing. > Currently, this can only have slowed things down. Removing the > function call overhead from list comps could give that speed back. > Or maybe the next plan is to hack in a "while cond as name:" handler. It works! And the name is bound only within the scope of the while block and any else block (so when you get a falsey result, you can see precisely _what_ falsey result it was). The surprising part, in my opinion, is that this actually appears to work outside a function. The demangling doesn't, but the original mangling does. It doesn't play ideally with locals() or globals(); the former appears to take the first one that it sees, and ignore the others (though I wouldn't promise that; certainly it takes exactly one local of any given name. With globals(), you get the mangled name: while input("Spam? ") as spam: print(globals()) break Spam? yes {... 'spam.0x7f2080260228': 'yes'...} With brand new syntax like "while cond as name:", it won't break anything to use a mangled name, but this is a backward-incompatible change as regards exception handling and globals(). Still, it's a fun hack. Aside from being a fun exercise for me, building a Volkswagen Helicopter, is this at all useful to anybody? ChrisA From taleinat at gmail.com Thu Jun 11 14:57:02 2015 From: taleinat at gmail.com (Tal Einat) Date: Thu, 11 Jun 2015 15:57:02 +0300 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Thu, Jun 11, 2015 at 11:38 AM, Andrew Barnert wrote: > On Jun 10, 2015, at 18:54, Nick Coghlan wrote: >> >> I'm with Tal here - I like the concept, don't like the spelling >> because it may return things other than slice objects. >> >> While the formal name of the operation denoted by trailing square >> brackets "[]" is "subscript" (with indexing and slicing being only two >> of its several use cases), the actual *protocol* involved in >> implementing that operation is getitem/setitem/delitem, so using the >> formal name would count as "non-obvious" in my view. >> >> Accordingly, I'd suggest putting this in under the name >> "operator.itemkey" (no underscore because the operator module >> traditionally omits them). > > That name seems a little odd. Normally by "key", you mean the thing you subscript a mapping with, as opposed to an index, the thing you subscript a sequence with (either specifically an integer, or the broader sense of an integer, a slice, an ellipsis, or a tuple of indices recursively). > > (Of course you _can_ use this with a mapping key, but then it just returns the same key you passed in, which isn't very useful, except in allowing generic code that doesn't know whether it has a key or an index and wants to pass it on to a mapping or sequence, which obviously isn't the main use here.) > > "itemindex" avoids the main problem with "itemkey", but it still shares the secondary problem of burying the fact that this is about slices (and tuples of plain indices and slices), not just (or even primarily) plain indices. > > I agree with you that "subscript" isn't a very good name either. > > I guess "lookup" is another possibility, and it parallels "LookupError" being the common base class of "IndexError" and "KeyError", but that sounds even less meaningful than "subscript" to me. > > So, I don't have a good name to offer. > > One last thing: Would it be worth adding bracket syntax to itemgetter, to make it easier to create slicing functions? (That wouldn't remove the need for this function, or vice versa, but since we're in operator and adding a thing that gets "called" with brackets...) I actually think "subscript" is quite good a name. It makes the explicit distinction between subscripts, indexes and slices. As for itemgetter, with X (placeholder for name we choose), you would just do itemgetter(X[::-1]), so I don't see a need to change itemgetter. - Tal From ram at rachum.com Thu Jun 11 19:23:57 2015 From: ram at rachum.com (Ram Rachum) Date: Thu, 11 Jun 2015 20:23:57 +0300 Subject: [Python-ideas] Making -m work for scripts Message-ID: Hi, What do you think about making `python -m whatever` work also for installed scripts and not just for modules? I need this now because I've installed pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in contrast to the `nosetests` of the system Python.) It's sometimes a mess to find where Linux installed the scripts related with each version of Python. But if I could do `pypy -m nosetests` then I'd have a magic solution. What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Jun 11 19:37:30 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Jun 2015 19:37:30 +0200 Subject: [Python-ideas] Making -m work for scripts References: Message-ID: <20150611193730.141204b8@fsol> On Thu, 11 Jun 2015 20:23:57 +0300 Ram Rachum wrote: > Hi, > > What do you think about making `python -m whatever` work also for installed > scripts and not just for modules? I need this now because I've installed > pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in > contrast to the `nosetests` of the system Python.) It's sometimes a mess to > find where Linux installed the scripts related with each version of Python. > But if I could do `pypy -m nosetests` then I'd have a magic solution. What > do you think? How would Python know where to find the script? Note: if "python -m nose" doesn't already work, it is probably a nice feature request for nose. Regards Antoine. From ram at rachum.com Thu Jun 11 19:39:50 2015 From: ram at rachum.com (Ram Rachum) Date: Thu, 11 Jun 2015 20:39:50 +0300 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: <20150611193730.141204b8@fsol> References: <20150611193730.141204b8@fsol> Message-ID: I know little about package management, but I assume that the information on where scripts are installed exists somewhere? Some central place that Python might be able to access? On Thu, Jun 11, 2015 at 8:37 PM, Antoine Pitrou wrote: > On Thu, 11 Jun 2015 20:23:57 +0300 > Ram Rachum wrote: > > Hi, > > > > What do you think about making `python -m whatever` work also for > installed > > scripts and not just for modules? I need this now because I've installed > > pypy on Linux, and I'm not sure how to run the `nosetests` of PyPy (in > > contrast to the `nosetests` of the system Python.) It's sometimes a mess > to > > find where Linux installed the scripts related with each version of > Python. > > But if I could do `pypy -m nosetests` then I'd have a magic solution. > What > > do you think? > > How would Python know where to find the script? > Note: if "python -m nose" doesn't already work, it is probably a nice > feature request for nose. > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jun 11 21:10:11 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Jun 2015 12:10:11 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> Message-ID: <5579DD13.7040607@stoneleaf.us> On 06/11/2015 03:56 AM, Chris Angelico wrote: > while input("Spam? ") as spam: > print(globals()) > break > > Spam? yes > {... 'spam.0x7f2080260228': 'yes'...} Having names not leak from listcomps and genexps is a good thing. Having names not leak from try/execpt blocks is a necessary thing. Having names not leak from if/else or while is confusing and irritating: there is no scope there, and at least 'while' should be similar to 'for' which also does a name binding and does /not/ unset it at the end. > Aside from being a fun exercise for me, building a Volkswagen > Helicopter, is this at all useful to anybody? I would find the 'as NAME' portion very useful as long as it wasn't shadowing nor unset. -- ~Ethan~ From rosuav at gmail.com Thu Jun 11 21:19:29 2015 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Jun 2015 05:19:29 +1000 Subject: [Python-ideas] If branch merging In-Reply-To: <5579DD13.7040607@stoneleaf.us> References: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> Message-ID: On Fri, Jun 12, 2015 at 5:10 AM, Ethan Furman wrote: > On 06/11/2015 03:56 AM, Chris Angelico wrote: > >> while input("Spam? ") as spam: >> print(globals()) >> break >> >> Spam? yes >> {... 'spam.0x7f2080260228': 'yes'...} > > > Having names not leak from listcomps and genexps is a good thing. > > Having names not leak from try/execpt blocks is a necessary thing. > > Having names not leak from if/else or while is confusing and irritating: > there is no scope there, and at least 'while' should be similar to 'for' > which also does a name binding and does /not/ unset it at the end. > > >> Aside from being a fun exercise for me, building a Volkswagen >> Helicopter, is this at all useful to anybody? > > > I would find the 'as NAME' portion very useful as long as it wasn't > shadowing nor unset. > Sure. Removing the scoping from the "while cond as target" rule is simple. Just delete a couple of lines of code (one at the top, one at the bottom), and it'll do a simple name binding. On the subject of try/except unbinding, though, there's a surprising thing in the code: the last action in an except clause is to assign None to the name, and *then* del it: try: suite except Something as e: try: except_block finally: e = None del e Why set it to None just before delling it? It's clearly no accident, so it must have a reason for existing. (With CPython sources, it's always safest to assume intelligent design.) ChrisA From abarnert at yahoo.com Thu Jun 11 22:31:54 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jun 2015 13:31:54 -0700 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: References: Message-ID: On Jun 11, 2015, at 10:23, Ram Rachum wrote: > > What do you think about making `python -m whatever` work also for installed scripts and not just for modules? The current packaging system already makes it easy for packages to make everything easy for you. Assuming nosetests is already using setuptools to create console script entry points automatically, all they have to do is make the top-level code in the module/__main__.py in the package call the same function specified as the entry point, and `pypy -m nosetests` now does the same thing as `/path/to/pypy/scripts/nosetests` no matter how the user has configured things. That's probably a one-line change. (If they're not using entry points, they made need to first factor out a `main` function to be put somewhere that can be used by both, but that still isn't very hard, and is worth doing just so they can switch to using entry points anyway.) Many packages already do this. The obvious problem is that some don't, and as an end-user your only option is to file enhancement requests with those that don't. I don't think there's any obvious way to make your idea work. Python doesn't keep track of what got installed where. (I believe it could import distutils and determine the default install location for new scripts, but even that doesn't really help. Especially since you can end up with multiple Python installations all sharing an install location like /usr/local/bin unless one of those installations does something to avoid it--for example, on a Mac, if you use Apple's pre-installed 2.7 and also a default install from python.org, they'll both install new scripts there.) Of course the pseudo-database of installed scripts could be migrated from pip to Python core, or pip could grow its own script runner that you use in place of Python, or there are probably other radical changes you could make that would enable this. But if we're going to do something big, I'd prefer to just make it easier to specify a custom script suffix in distutils.cfg and encourage distros/users/third-party Pythons/etc. to use that if they want to make multiple Pythons easy to use, instead of using different directories. Then you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs. `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly because that's what I do manually; I have a wrapper around pip that symlinks any installed scripts into /usr/local/bin with a suffix that depends on the wrapper's name, and symlink a new name for each Python I install. I'm not sure if anyone else would like that.) From ron3200 at gmail.com Fri Jun 12 01:23:03 2015 From: ron3200 at gmail.com (Ron Adam) Date: Thu, 11 Jun 2015 19:23:03 -0400 Subject: [Python-ideas] If branch merging In-Reply-To: <5579DD13.7040607@stoneleaf.us> References: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> Message-ID: On 06/11/2015 03:10 PM, Ethan Furman wrote: > On 06/11/2015 03:56 AM, Chris Angelico wrote: > >> while input("Spam? ") as spam: >> print(globals()) >> break >> >> Spam? yes >> {... 'spam.0x7f2080260228': 'yes'...} > > Having names not leak from listcomps and genexps is a good thing. In a way this makes sense because you can think of them as a type of function literal. > Having names not leak from if/else or while is confusing and irritating: > there is no scope there, and at least 'while' should be similar to 'for' > which also does a name binding and does /not/ unset it at the end. Having a group of statement share a set of values is fairly easy to think about. Having them share some values at some times, and not others at other times is not so easy to think about. I also get the feeling the solution is more complex than the problem. Ummm... to clarify that. The inconvenience of not having the solution to the apparent problem, is less of a problem than the possible problems I think might arise with the solution. It's Kind of like parsing that sentence, Ron From ethan at stoneleaf.us Fri Jun 12 02:12:32 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Jun 2015 17:12:32 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> Message-ID: <557A23F0.1040409@stoneleaf.us> On 06/11/2015 04:23 PM, Ron Adam wrote: > > > On 06/11/2015 03:10 PM, Ethan Furman wrote: >> On 06/11/2015 03:56 AM, Chris Angelico wrote: >> >>> while input("Spam? ") as spam: >>> print(globals()) >>> break >>> >>> Spam? yes >>> {... 'spam.0x7f2080260228': 'yes'...} >> >> Having names not leak from listcomps and genexps is a good thing. > > In a way this makes sense because you can think of them as a type of function literal. > > > >> Having names not leak from if/else or while is confusing and irritating: >> there is no scope there, and at least 'while' should be similar to 'for' >> which also does a name binding and does /not/ unset it at the end. > > Having a group of statement share a set of values is fairly easy to think about. But that is not how Python works. When you bind a name, that name stays until the scope is left (with one notable exception). > Having them share some values at some times, and not others at other times is not so > easy to think about. Which is why I would not have the psuedo-scope on any of them. The only place where that currently happens is in a try/except clause, and that should remain the only exception. -- ~Ethan~ From abarnert at yahoo.com Fri Jun 12 04:21:29 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Thu, 11 Jun 2015 19:21:29 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <557A23F0.1040409@stoneleaf.us> References: <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <557A23F0.1040409@stoneleaf.us> Message-ID: On Jun 11, 2015, at 17:12, Ethan Furman wrote: > >> On 06/11/2015 04:23 PM, Ron Adam wrote: >> >> >>> On 06/11/2015 03:10 PM, Ethan Furman wrote: >>>> On 06/11/2015 03:56 AM, Chris Angelico wrote: >>>> >>>> while input("Spam? ") as spam: >>>> print(globals()) >>>> break >>>> >>>> Spam? yes >>>> {... 'spam.0x7f2080260228': 'yes'...} >>> >>> Having names not leak from listcomps and genexps is a good thing. >> >> In a way this makes sense because you can think of them as a type of function literal. >> >> >> >>> Having names not leak from if/else or while is confusing and irritating: >>> there is no scope there, and at least 'while' should be similar to 'for' >>> which also does a name binding and does /not/ unset it at the end. >> >> Having a group of statement share a set of values is fairly easy to think about. > > But that is not how Python works. When you bind a name, that name stays until the scope is left (with one notable exception). What Nick was proposing was to explicitly change the way Python works. And what Chris hacked up was (part of) what Nick proposed. So you're just pointing out that this change to the way Python works would be a change to the way Python works. Well, of course it would. The question is whether it would be a good change. Nick's point was that they tried a similar change to implement comprehensions without needing to "fake it" with a hidden function, and it makes the implementation far too complex, so it doesn't even matter if it's a well-designed and desirable change. Of course it's also possible that it's not a desirable change (e.g., the current scoping rules are simple enough to keep things straight in your head while reading any function that isn't already too long to be a function, but more complex rules wouldn't be), or that it's possible desirable but not as designed (e.g., I still think Nick's idea of binding within the expression or the statement in a somewhat complex way is more confusing than just binding within the statement). But Chris's attempt to show that the implementation problems might be resolvable, and/or to give people a hack they can play with instead of having to guess, is still a reasonable response to Nick's point. I agree with your implied point that a language with two kinds of locality, one nested by block and the other function-wide, is probably not as good a design as one with only the first kind (like C) or only the second (like Python), and that's even more true in a language with closures or implicit declarations (both of which Python has), so I think any design is going to be a mess (definitely including my own straw-man design, and Nick's, and what Chris's hack implements). But it's certainly a _possible_ design, and there's nothing about Python 3.5 that means it would be impossible or backward-incompatible (as opposed to just a bad idea) to have such a design for Python 3.6. From ncoghlan at gmail.com Fri Jun 12 08:53:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Jun 2015 16:53:42 +1000 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On 11 June 2015 at 22:57, Tal Einat wrote: > > I actually think "subscript" is quite good a name. It makes the > explicit distinction between subscripts, indexes and slices. Yeah, I've warmed to it myself: zero = operator.subscript[0] ellipsis = operator.subscript[...] reverse = slice(None, None, -1) reverse = operator.subscript[::-1] all_rows_first_col = slice(None), slice(0) all_rows_first_col = operator.subscript[:, 0] first_row_all_cols_but_last = slice(0), slice(None, -1) first_row_all_cols_but_last = operator.subscript[0, :-1] I realised the essential problem with using "item" in the name is that the "item" in the method names refers to the *result*, not to the input. Since the unifying term for the different kinds of input is indeed "subscript" (covering indices, slices, multi-dimensional slices, key lookups, content addressable data structures, etc), it makes sense to just use it rather than inventing something new. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Fri Jun 12 08:55:52 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Jun 2015 15:55:52 +0900 Subject: [Python-ideas] If branch merging In-Reply-To: <5579DD13.7040607@stoneleaf.us> References: <60A27B69-FCA0-4DCC-B515-8D4CE0CE8520@yahoo.com> <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> Message-ID: <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > I would find the 'as NAME' portion very useful as long as it wasn't > shadowing nor unset. I don't understand the "not shadowing" requirement. If you're not going to create a new scope, then from foo import * if expr as val: use(val) bar(val) might very well shadow foo.val and break the invocation of bar. Is use of the identifier "val" in this context an error? Or what? From ncoghlan at gmail.com Fri Jun 12 09:13:21 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 12 Jun 2015 17:13:21 +1000 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: References: Message-ID: On 12 June 2015 at 06:31, Andrew Barnert via Python-ideas wrote: > But if we're going to do something big, I'd prefer to just make it easier to specify a custom script suffix in distutils.cfg and encourage distros/users/third-party Pythons/etc. to use that if they want to make multiple Pythons easy to use, instead of using different directories. Then you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs. `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly because that's what I do manually; I have a wrapper around pip that symlinks any installed scripts into /usr/local/bin with a suffix that depends on the wrapper's name, and symlink a new name for each Python I install. I'm not sure if anyone else would like that.) Armin Ronacher's pipsi should already make it possible to do: pypy -m pip install pipsi pypy -m pipsi install nose In theory, that should give you a nosetests in ~/.local/bin that runs in a PyPy virtualenv (I haven't actually tried it though). "Should the default Python on Linux be a per-user configuration setting rather than a system wide symlink?" was also a topic that came up at this year's language summit (see https://lwn.net/Articles/640296/, especially the straw poll results at the end ), so it's likely a proposal along those lines will happen at some point during the 3.6 development cycle. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Fri Jun 12 09:41:55 2015 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 12 Jun 2015 03:41:55 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On 6/12/2015 2:53 AM, Nick Coghlan wrote: > On 11 June 2015 at 22:57, Tal Einat wrote: >> >> I actually think "subscript" is quite good a name. It makes the >> explicit distinction between subscripts, indexes and slices. > > Yeah, I've warmed to it myself: > > zero = operator.subscript[0] > > ellipsis = operator.subscript[...] > > reverse = slice(None, None, -1) > reverse = operator.subscript[::-1] > > all_rows_first_col = slice(None), slice(0) > all_rows_first_col = operator.subscript[:, 0] > > first_row_all_cols_but_last = slice(0), slice(None, -1) > first_row_all_cols_but_last = operator.subscript[0, :-1] > > I realised the essential problem with using "item" in the name is that > the "item" in the method names refers to the *result*, not to the > input. Since the unifying term for the different kinds of input is > indeed "subscript" (covering indices, slices, multi-dimensional > slices, key lookups, content addressable data structures, etc), it > makes sense to just use it rather than inventing something new. If the feature is added, this looks pretty good to me. -- Terry Jan Reedy From ram at rachum.com Fri Jun 12 10:40:26 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 12 Jun 2015 11:40:26 +0300 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: References: Message-ID: Thanks! On Fri, Jun 12, 2015 at 10:13 AM, Nick Coghlan wrote: > On 12 June 2015 at 06:31, Andrew Barnert via Python-ideas > wrote: > > But if we're going to do something big, I'd prefer to just make it > easier to specify a custom script suffix in distutils.cfg and encourage > distros/users/third-party Pythons/etc. to use that if they want to make > multiple Pythons easy to use, instead of using different directories. Then > you'd just run `nosetests_pypy3` vs. `nosetests3` or `nosetests_jy2` vs. > `nosetests2` or `nosetests_cust3.4` vs. `nosetests3.4` or whatever. (Mainly > because that's what I do manually; I have a wrapper around pip that > symlinks any installed scripts into /usr/local/bin with a suffix that > depends on the wrapper's name, and symlink a new name for each Python I > install. I'm not sure if anyone else would like that.) > > Armin Ronacher's pipsi should already make it possible to do: > > pypy -m pip install pipsi > pypy -m pipsi install nose > > In theory, that should give you a nosetests in ~/.local/bin that runs > in a PyPy virtualenv (I haven't actually tried it though). > > "Should the default Python on Linux be a per-user configuration > setting rather than a system wide symlink?" was also a topic that came > up at this year's language summit (see > https://lwn.net/Articles/640296/, especially the straw poll results at > the end ), so it's likely a proposal along those lines will happen at > some point during the 3.6 development cycle. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Fri Jun 12 13:03:56 2015 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 12 Jun 2015 07:03:56 -0400 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: References: Message-ID: <557ABC9C.9080606@nedbatchelder.com> On 6/11/15 1:23 PM, Ram Rachum wrote: > Hi, > > What do you think about making `python -m whatever` work also for > installed scripts and not just for modules? I need this now because > I've installed pypy on Linux, and I'm not sure how to run the > `nosetests` of PyPy (in contrast to the `nosetests` of the system > Python.) It's sometimes a mess to find where Linux installed the > scripts related with each version of Python. But if I could do `pypy > -m nosetests` then I'd have a magic solution. What do you think? > This works today: $ python -m nose --Ned. From ram at rachum.com Fri Jun 12 13:06:17 2015 From: ram at rachum.com (Ram Rachum) Date: Fri, 12 Jun 2015 14:06:17 +0300 Subject: [Python-ideas] Making -m work for scripts In-Reply-To: <557ABC9C.9080606@nedbatchelder.com> References: <557ABC9C.9080606@nedbatchelder.com> Message-ID: Thanks Ned! On Fri, Jun 12, 2015 at 2:03 PM, Ned Batchelder wrote: > On 6/11/15 1:23 PM, Ram Rachum wrote: > >> Hi, >> >> What do you think about making `python -m whatever` work also for >> installed scripts and not just for modules? I need this now because I've >> installed pypy on Linux, and I'm not sure how to run the `nosetests` of >> PyPy (in contrast to the `nosetests` of the system Python.) It's sometimes >> a mess to find where Linux installed the scripts related with each version >> of Python. But if I could do `pypy -m nosetests` then I'd have a magic >> solution. What do you think? >> >> > This works today: > > $ python -m nose > > --Ned. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taleinat at gmail.com Fri Jun 12 15:27:35 2015 From: taleinat at gmail.com (Tal Einat) Date: Fri, 12 Jun 2015 16:27:35 +0300 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Fri, Jun 12, 2015 at 10:41 AM, Terry Reedy wrote: > On 6/12/2015 2:53 AM, Nick Coghlan wrote: >> >> On 11 June 2015 at 22:57, Tal Einat wrote: >>> >>> >>> I actually think "subscript" is quite good a name. It makes the >>> explicit distinction between subscripts, indexes and slices. >> >> >> Yeah, I've warmed to it myself: >> >> zero = operator.subscript[0] >> >> ellipsis = operator.subscript[...] >> >> reverse = slice(None, None, -1) >> reverse = operator.subscript[::-1] >> >> all_rows_first_col = slice(None), slice(0) >> all_rows_first_col = operator.subscript[:, 0] >> >> first_row_all_cols_but_last = slice(0), slice(None, -1) >> first_row_all_cols_but_last = operator.subscript[0, :-1] >> >> I realised the essential problem with using "item" in the name is that >> the "item" in the method names refers to the *result*, not to the >> input. Since the unifying term for the different kinds of input is >> indeed "subscript" (covering indices, slices, multi-dimensional >> slices, key lookups, content addressable data structures, etc), it >> makes sense to just use it rather than inventing something new. > > > If the feature is added, this looks pretty good to me. It looks good to me as well. +1 for adding this as described and naming it operator.subscript. - Tal Einat From ethan at stoneleaf.us Fri Jun 12 16:14:05 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Jun 2015 07:14:05 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <557AE92D.8010006@stoneleaf.us> On 06/11/2015 11:55 PM, Stephen J. Turnbull wrote: > Ethan Furman writes: > >> I would find the 'as NAME' portion very useful as long as it wasn't >> shadowing nor unset. > > I don't understand the "not shadowing" requirement. If you're not > going to create a new scope, then > > from foo import * > > if expr as val: > use(val) > > bar(val) > > might very well shadow foo.val and break the invocation of bar. Is > use of the identifier "val" in this context an error? Or what? Likewise: for val in some_iterator: use(val) bar(val) will shadow foo.val and break bar; yet for loops do not create their own scopes. with open('somefile') as val: stuff = val.read() bar(val) will also shadow foo.val and break bar, yet with contexts do not create their own scopes. And let's not forget: val = some_func() bar(val) Again -- no micro-scope, and foo.val is shadowed. -- ~Ethan~ From techtonik at gmail.com Fri Jun 12 15:14:47 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 12 Jun 2015 16:14:47 +0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <20150602224420.GA9022@phdru.name> References: <20150602224420.GA9022@phdru.name> Message-ID: I would gladly use forum interface like http://www.discourse.org/ if there was any linked in message footers. Bottom posting is not automated in Gmail and takes a lot of energy and dumb repeated keypresses to complete. On Wed, Jun 3, 2015 at 1:44 AM, Oleg Broytman wrote: > On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person wrote: >> What do you mean by replying inine? > > https://en.wikipedia.org/wiki/Posting_style > > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing? > A: Top-posting. > Q: What is the most annoying thing in e-mail? > >> On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert wrote: >> > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person >> > wrote: >> > >> > I think you're right. I was also considering ... "editing" my Python >> > distribution. If they didn't implement my suggestion for correcting floats, >> > at least they can fix this, instead of making people hack Python for good >> > results! >> > >> > >> > If you're going to reply to digests, please learn how to reply inline >> > instead of top-posting (and how to trim out all the irrelevant stuff). It's >> > next to impossible to tell which part of which of the messages you're >> > replying to even in simple cases like this one, with only 4 messages in the >> > digest. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- anatoly t. From techtonik at gmail.com Fri Jun 12 15:34:03 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Fri, 12 Jun 2015 16:34:03 +0300 Subject: [Python-ideas] Using semantic web (RDF triples) to link replacements for modules in standard library Message-ID: I failed to bring any attention to https://bitbucket.org/techtonik/python-stdlib to crowdsource collecting data about modules in stdlib, and was even warned not to distract people in @python.org lists to work on it, so I thought about a different way to decentralize the work on standard library. We have the data about top Python modules that need a redesign http://www.google.com/moderator/#15/e=28d4&t=28d4.40 but no way to propose alternatives for each module. So, I propose to use wikidata for collecting that info: https://www.wikidata.org/wiki/Wikidata:Tours 1. define concept for Python stdlib 2. define concept of module for Python stdlib 3. add a property "replacement of' 4. provide a web interface for viewing that info There is no interested entity in sponsoring the development of model and workflow in my region, and the job contracts that are available to me right are unlikely to allow me to work on this stuff, so I hope that this idea will take some traction from people interested to put semantic web to some good use, and won't be affected by "dead by origin" curse. -- anatoly t. From rymg19 at gmail.com Fri Jun 12 17:02:16 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Fri, 12 Jun 2015 10:02:16 -0500 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> Message-ID: <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik wrote: >I would gladly use forum interface like http://www.discourse.org/ >if there was any linked in message footers. Bottom posting is not >automated in Gmail and takes a lot of energy and dumb repeated >keypresses to complete. Not really. I just did it right now. > >On Wed, Jun 3, 2015 at 1:44 AM, Oleg Broytman wrote: >> On Tue, Jun 02, 2015 at 03:28:33PM -0700, u8y7541 The Awesome Person > wrote: >>> What do you mean by replying inine? >> >> https://en.wikipedia.org/wiki/Posting_style >> >> A: Because it messes up the order in which people normally read text. >> Q: Why is top-posting such a bad thing? >> A: Top-posting. >> Q: What is the most annoying thing in e-mail? >> >>> On Mon, Jun 1, 2015 at 10:22 PM, Andrew Barnert >wrote: >>> > On Jun 1, 2015, at 20:41, u8y7541 The Awesome Person >>> > wrote: >>> > >>> > I think you're right. I was also considering ... "editing" my >Python >>> > distribution. If they didn't implement my suggestion for >correcting floats, >>> > at least they can fix this, instead of making people hack Python >for good >>> > results! >>> > >>> > >>> > If you're going to reply to digests, please learn how to reply >inline >>> > instead of top-posting (and how to trim out all the irrelevant >stuff). It's >>> > next to impossible to tell which part of which of the messages >you're >>> > replying to even in simple cases like this one, with only 4 >messages in the >>> > digest. >> >> Oleg. >> -- >> Oleg Broytman http://phdru.name/ >phd at phdru.name >> Programmers don't die, they just GOSUB without RETURN. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From stephen at xemacs.org Fri Jun 12 17:14:14 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 13 Jun 2015 00:14:14 +0900 Subject: [Python-ideas] If branch merging In-Reply-To: <557AE92D.8010006@stoneleaf.us> References: <20150608121228.GL20701@ando.pearwood.info> <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> <557AE92D.8010006@stoneleaf.us> Message-ID: <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Likewise: > > for val in some_iterator: > use(val) > > bar(val) > > will shadow foo.val Yes, I understand that. What I don't understand is your statement that you would like "if expr as val:" if it *doesn't* shadow. From abarnert at yahoo.com Fri Jun 12 19:09:11 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jun 2015 10:09:11 -0700 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> Message-ID: <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> On Jun 12, 2015, at 08:02, Ryan Gonzalez wrote: > >> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik wrote: >> I would gladly use forum interface like http://www.discourse.org/ >> if there was any linked in message footers. Bottom posting is not >> automated in Gmail and takes a lot of energy and dumb repeated >> keypresses to complete. > > Not really. I just did it right now. And if that's too much work, there are Greasemonkey scripts to do it for you. (And, unlike Yahoo, Gmail doesn't seem to go out of their way to break user scripts every two weeks.) But really, in non-trivial cases, you usually want your reply interleaved with parts of the original; just jumping to the very end of the message (past the signature and list footer) and replying to the whole thing in bulk isn't much better than top-posting. And, while some MUAs do have tools to help with that better than Gmail's, there's really no way to automate it; you have to put the effort into selecting the parts you want to keep or the parts you want to remove and putting your cursor after each one. (If you're not willing to do that, and instead assume anyone who needs the context can reassemble it themselves with the help of an MUA with proper threading support--which is pretty much all of them today--then why quote at all?) From ethan at stoneleaf.us Fri Jun 12 19:21:39 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Jun 2015 10:21:39 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> References: <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> <557AE92D.8010006@stoneleaf.us> <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <557B1523.5050804@stoneleaf.us> On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote: > Ethan Furman writes: > >> Likewise: >> >> for val in some_iterator: >> use(val) >> >> bar(val) >> >> will shadow foo.val > > Yes, I understand that. What I don't understand is your statement > that you would like "if expr as val:" if it *doesn't* shadow. Ah, I think I see your point. My use of the word "shadow" was in relation to the micro-scope and the previously existing name being shadowed and then un-shadowed when the micro-scope was destroyed. If we are at module-level (not class nor function) then there should be no shadowing, but a rebinding of the name. Even try/except blocks don't "shadow", but rebind and then delete the name used to catch the exception. -- ~Ethan~ From ron3200 at gmail.com Fri Jun 12 20:25:20 2015 From: ron3200 at gmail.com (Ron Adam) Date: Fri, 12 Jun 2015 14:25:20 -0400 Subject: [Python-ideas] If branch merging In-Reply-To: <557B1523.5050804@stoneleaf.us> References: <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> <557AE92D.8010006@stoneleaf.us> <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> <557B1523.5050804@stoneleaf.us> Message-ID: On 06/12/2015 01:21 PM, Ethan Furman wrote: > On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote: >> Ethan Furman writes: >> >>> Likewise: >>> >>> for val in some_iterator: >>> use(val) >>> >>> bar(val) >>> >>> will shadow foo.val >> >> Yes, I understand that. What I don't understand is your statement >> that you would like "if expr as val:" if it *doesn't* shadow. > > Ah, I think I see your point. My use of the word "shadow" was in relation > to the micro-scope and the previously existing name being shadowed and then > un-shadowed when the micro-scope was destroyed. If we are at module-level > (not class nor function) then there should be no shadowing, but a rebinding > of the name. Even try/except blocks don't "shadow", but rebind and then > delete the name used to catch the exception. The problem can be turned around/over. Instead of specifying a name to be shadowed, the names to be shared can be specified. Then it translates to function with specified nonlocals. a = 1 # will be shared b = 2 # will be shadowed def do_loop_with_shared_items(): nonlocal a # a is a shared value. for b in some_iterator: a = use(b) do_loop_with_shared_items() print(a) # changed by loop print(b) # print 2. Not changed by loop That might be expressed as... a = 1 b = 2 with nonlocal a: # a is shared for b in some_iterator: a = use(b) # other values (b) are local to block. print(a) # changed by loop print(b) # prints 2. Not changed by loop And with this, the "as" modifier isn't needed, just don't list the item as a nonlocal. with nonlocal: a = foo.bar # a as foo.bar in this block scope only. ,,, This has the advantage of not complicating other statements and keeps the concept in a separate mental box. I like this better, but am still -0.5. I'd need to see some examples where it would be "worth it". It still feels like a solution looking for a problem to me. Cheers, Ron From abarnert at yahoo.com Fri Jun 12 20:41:28 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Fri, 12 Jun 2015 11:41:28 -0700 Subject: [Python-ideas] If branch merging In-Reply-To: References: <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> <557AE92D.8010006@stoneleaf.us> <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> <557B1523.5050804@stoneleaf.us> Message-ID: <2C5269F6-D990-459F-BFCE-E6FAD85E7A9C@yahoo.com> On Jun 12, 2015, at 11:25, Ron Adam wrote: > >> On 06/12/2015 01:21 PM, Ethan Furman wrote: >>> On 06/12/2015 08:14 AM, Stephen J. Turnbull wrote: >>> Ethan Furman writes: >>> >>>> Likewise: >>>> >>>> for val in some_iterator: >>>> use(val) >>>> >>>> bar(val) >>>> >>>> will shadow foo.val >>> >>> Yes, I understand that. What I don't understand is your statement >>> that you would like "if expr as val:" if it *doesn't* shadow. >> >> Ah, I think I see your point. My use of the word "shadow" was in relation >> to the micro-scope and the previously existing name being shadowed and then >> un-shadowed when the micro-scope was destroyed. If we are at module-level >> (not class nor function) then there should be no shadowing, but a rebinding >> of the name. Even try/except blocks don't "shadow", but rebind and then >> delete the name used to catch the exception. > > The problem can be turned around/over. Instead of specifying a name to be shadowed, the names to be shared can be specified. Then it translates to function with specified nonlocals. I really like making it explicit. I'm not sure about the turning-it-around bit. That means inside a with-nonlocal block, things don't work the same as in any another block, and that won't be at all obvious. But even without that, the idea works; to make something nested-local, you write: with local b: for b in some_iterator: a = use(b) That leaves function-local as the default, and defines statement-local in a way that's as similar as possible to the other alternatives, environment-nonlocal and global; the only real difference is that it has a suite, which is pretty much implicit in the fact that it's defining something as local to the suite. Either way seems better than the quasi-magic scoping (both my version and Nick's took a couple paragraphs to explain...) caused by as expressions and/or clauses. And that's in addition to the advantages you suggested of not complicating the syntax and keeping separate concepts separate. > I like this better, but am still -0.5. I'd need to see some examples where it would be "worth it". It still feels like a solution looking for a problem to me. Agreed. I think everyone (including myself) has put thought into this just because it's an interesting puzzle, not necessarily because the language needs it... From taleinat at gmail.com Fri Jun 12 22:59:04 2015 From: taleinat at gmail.com (Tal Einat) Date: Fri, 12 Jun 2015 23:59:04 +0300 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: On Fri, Jun 12, 2015 at 11:45 PM, Joseph Jevnik wrote: > I can update my patch to move it to the operator module Please do. Further discussion should take place on the issue tracker. From joejev at gmail.com Fri Jun 12 22:45:33 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Fri, 12 Jun 2015 16:45:33 -0400 Subject: [Python-ideas] slice.literal notation In-Reply-To: References: Message-ID: I can update my patch to move it to the operator module On Jun 12, 2015 9:28 AM, "Tal Einat" wrote: > On Fri, Jun 12, 2015 at 10:41 AM, Terry Reedy wrote: > > On 6/12/2015 2:53 AM, Nick Coghlan wrote: > >> > >> On 11 June 2015 at 22:57, Tal Einat wrote: > >>> > >>> > >>> I actually think "subscript" is quite good a name. It makes the > >>> explicit distinction between subscripts, indexes and slices. > >> > >> > >> Yeah, I've warmed to it myself: > >> > >> zero = operator.subscript[0] > >> > >> ellipsis = operator.subscript[...] > >> > >> reverse = slice(None, None, -1) > >> reverse = operator.subscript[::-1] > >> > >> all_rows_first_col = slice(None), slice(0) > >> all_rows_first_col = operator.subscript[:, 0] > >> > >> first_row_all_cols_but_last = slice(0), slice(None, -1) > >> first_row_all_cols_but_last = operator.subscript[0, :-1] > >> > >> I realised the essential problem with using "item" in the name is that > >> the "item" in the method names refers to the *result*, not to the > >> input. Since the unifying term for the different kinds of input is > >> indeed "subscript" (covering indices, slices, multi-dimensional > >> slices, key lookups, content addressable data structures, etc), it > >> makes sense to just use it rather than inventing something new. > > > > > > If the feature is added, this looks pretty good to me. > > It looks good to me as well. > > +1 for adding this as described and naming it operator.subscript. > > - Tal Einat > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Jun 13 01:50:31 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Jun 2015 09:50:31 +1000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> Message-ID: On Fri, Jun 12, 2015 at 11:14 PM, anatoly techtonik wrote: > I would gladly use forum interface like http://www.discourse.org/ > if there was any linked in message footers. Bottom posting is not > automated in Gmail and takes a lot of energy and dumb repeated > keypresses to complete. Small trick: If Gmail has all the quoted text buried behind a "click to expand" button, you don't have to grab the mouse - just press Ctrl-A to select all, and it'll expand the quoted section into actual text. Works in Chrome, not tested recently in Firefox but should work there too. Can't speak for other browsers. ChrisA From stephen at xemacs.org Sat Jun 13 03:28:58 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 13 Jun 2015 10:28:58 +0900 Subject: [Python-ideas] If branch merging In-Reply-To: <557B1523.5050804@stoneleaf.us> References: <5575B394.9070203@googlemail.com> <55773648.8090808@googlemail.com> <46F63115-A335-42DC-81F2-C0ED8360D00C@yahoo.com> <844F36A4-C031-4B60-8F73-A4EAE3449A5D@yahoo.com> <8CFEE808-CBB6-49F8-BBCB-09AD485B04AD@yahoo.com> <5579DD13.7040607@stoneleaf.us> <87y4jpifev.fsf@uwakimon.sk.tsukuba.ac.jp> <557AE92D.8010006@stoneleaf.us> <87oaklhsc9.fsf@uwakimon.sk.tsukuba.ac.jp> <557B1523.5050804@stoneleaf.us> Message-ID: <87mw04ieg5.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > > Yes, I understand that. What I don't understand is your statement > > that you would like "if expr as val:" if it *doesn't* shadow. > > Ah, I think I see your point. My use of the word "shadow" was in > relation to the micro-scope and the previously existing name being > shadowed and then un-shadowed when the micro-scope was destroyed. I see. Your use of "shadow" implies later "unshadowing", which can only happen with scope. Mine doesn't, I just associate "shadow" with rebinding. I think your usage is more accurate. Especially in Python, which has a much flatter (and more formalized) use of scopes than, say, Lisp. Thank you for your explanation, it helped (me, anyway). From stephen at xemacs.org Sat Jun 13 03:53:23 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 13 Jun 2015 10:53:23 +0900 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> Andrew Barnert via Python-ideas writes: > But really, in non-trivial cases, you usually want your reply > interleaved with parts of the original; just jumping to the very > end of the message (past the signature and list footer) and > replying to the whole thing in bulk isn't much better than > top-posting. Wrong. :-) *Bottom*-posting is *much* worse. For better or worse, top-posting is here to stay. It doesn't work very well in forums like this one, but it's not too bad if you do it the way Guido does (which of course is one of the reasons we can't get rid of it). The basic rules: 1. Don't top-post, and you're done. If you "must" top-post, then continue. 2. Only top-post short comments that make sense with minimal context (typically the subject should be enough context). If it's not going to be short, don't top-post (no exceptions -- if you've got time and the equipment to write a long post, you also are not inconvenienced by using the interlinear style). If it requires specific context presented accurately, don't top-post (no exceptions, as your top post will certainly be misunderstood and generate long threads of explaining what you thought didn't need explanation, wasting copious amounts of everybody's time and attention, and probably burying your contribution in the process). 3. Indicate in your text that you top-posted -- preferably with a sincere apology. If you're so self-centered that sincerity is impossible, an insincere apology is recommended (otherwise you'll probably end up in a few killfiles for being Beavis's friend). From breamoreboy at yahoo.co.uk Sat Jun 13 10:39:53 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 13 Jun 2015 09:39:53 +0100 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: On 22/05/2015 02:18, Ben Hoyt wrote: > Hi Python Ideas folks, > [snipped to death] > > -Ben You might find this interesting https://github.com/Psycojoker/baron The introduction states "Baron is a Full Syntax Tree (FST) library for Python. By opposition to an AST which drops some syntax information in the process of its creation (like empty lines, comments, formatting), a FST keeps everything and guarantees the operation fst_to_code(code_to_fst(source_code)) == source_code.". -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From guido at python.org Sat Jun 13 11:09:17 2015 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Jun 2015 02:09:17 -0700 Subject: [Python-ideas] Enabling access to the AST for Python code In-Reply-To: References: Message-ID: On Sat, Jun 13, 2015 at 1:39 AM, Mark Lawrence wrote: > > You might find this interesting https://github.com/Psycojoker/baron > > The introduction states "Baron is a Full Syntax Tree (FST) library for > Python. By opposition to an AST which drops some syntax information in the > process of its creation (like empty lines, comments, formatting), a FST > keeps everything and guarantees the operation > fst_to_code(code_to_fst(source_code)) == source_code.". > There's one like this in the stdlib too! It's in lib2to3 and even preserves comments and whitespace. It's used as the basis for the 2to3 fixers. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Sat Jun 13 21:54:37 2015 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 13 Jun 2015 19:54:37 +0000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com>, <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: FWIW, from my phone the previous email is completely ineditable and I can't even delete it. Hence my signature, putting the blame firmly where it belongs ;) That said, we basically always use top posting (with highlighting - a perk of formatting in emails) at work so I'm fairly used to it. The way that I read the mailing lists (oldest to newest by thread) also works better with top posting, since I'm permanently trying to catch up rather than carry on a conversation. The problems then are people who don't snip (though people who do snip are also problems... can't win here) and especially those who reply to one part in the middle of an epic and don't sign off, leaving me scrolling right to the end to figure out if they have anything to say. Cheers, Steve Top-posted from my Windows Phone ________________________________ From: Stephen J. Turnbull Sent: ?6/?12/?2015 18:53 To: Andrew Barnert Cc: python-ideas Subject: Re: [Python-ideas] Meta: Email netiquette Andrew Barnert via Python-ideas writes: > But really, in non-trivial cases, you usually want your reply > interleaved with parts of the original; just jumping to the very > end of the message (past the signature and list footer) and > replying to the whole thing in bulk isn't much better than > top-posting. Wrong. :-) *Bottom*-posting is *much* worse. For better or worse, top-posting is here to stay. It doesn't work very well in forums like this one, but it's not too bad if you do it the way Guido does (which of course is one of the reasons we can't get rid of it). The basic rules: 1. Don't top-post, and you're done. If you "must" top-post, then continue. 2. Only top-post short comments that make sense with minimal context (typically the subject should be enough context). If it's not going to be short, don't top-post (no exceptions -- if you've got time and the equipment to write a long post, you also are not inconvenienced by using the interlinear style). If it requires specific context presented accurately, don't top-post (no exceptions, as your top post will certainly be misunderstood and generate long threads of explaining what you thought didn't need explanation, wasting copious amounts of everybody's time and attention, and probably burying your contribution in the process). 3. Indicate in your text that you top-posted -- preferably with a sincere apology. If you're so self-centered that sincerity is impossible, an insincere apology is recommended (otherwise you'll probably end up in a few killfiles for being Beavis's friend). _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Sat Jun 13 22:05:51 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 14 Jun 2015 06:05:51 +1000 Subject: [Python-ideas] Meta: Email netiquette References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <85si9vgyqo.fsf@benfinney.id.au> Steve Dower writes: > FWIW, from my phone the previous email is completely ineditable and I > can't even delete it. Hence my signature, putting the blame firmly > where it belongs ;) This speaks to a deficiency in the tool. Rather than apologising for bad etiquette, surely the better course is not to use a tool that needs such apology? In other words: if your tool can't compose messages properly, don't apologise for it; instead, stop using that tool for composing messages. -- \ ?A lot of people are afraid of heights. Not me, I'm afraid of | `\ widths.? ?Steven Wright | _o__) | Ben Finney From ncoghlan at gmail.com Sun Jun 14 03:24:31 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Jun 2015 11:24:31 +1000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <85si9vgyqo.fsf@benfinney.id.au> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> <85si9vgyqo.fsf@benfinney.id.au> Message-ID: On 14 Jun 2015 06:06, "Ben Finney" wrote: > > Steve Dower > writes: > > > FWIW, from my phone the previous email is completely ineditable and I > > can't even delete it. Hence my signature, putting the blame firmly > > where it belongs ;) > > This speaks to a deficiency in the tool. Rather than apologising for bad > etiquette, surely the better course is not to use a tool that needs such > apology? > > In other words: if your tool can't compose messages properly, don't > apologise for it; instead, stop using that tool for composing messages. Far easier said than done, especially in an institutional context, as there are currently *zero* readily available email clients out there that adequately cover hybrid operation for folks bridging the gap between the open source community and the world of enterprise collaboration suites. Gmail (and its associated Android app) at least attains "not entirely awful at it" status, but I'd expect Microsoft's clients to still be assuming Outlook/Exchange style models. Cheers, Nick. > > -- > \ ?A lot of people are afraid of heights. Not me, I'm afraid of | > `\ widths.? ?Steven Wright | > _o__) | > Ben Finney > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 14 06:10:02 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Jun 2015 14:10:02 +1000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20150614041002.GZ20701@ando.pearwood.info> On Sat, Jun 13, 2015 at 10:53:23AM +0900, Stephen J. Turnbull wrote: > Andrew Barnert via Python-ideas writes: > > > But really, in non-trivial cases, you usually want your reply > > interleaved with parts of the original; just jumping to the very > > end of the message (past the signature and list footer) and > > replying to the whole thing in bulk isn't much better than > > top-posting. > > Wrong. :-) *Bottom*-posting is *much* worse. Agreed. The worst case I personally ever saw was somebody replying to a digest on a high-volume mailing list where they added "I agree!" and their signature to the very end. I actually counted how many pages of quoting there were: 29 pages, based on ~60 lines per page. (That's full A4 pages mind, it was about 50 keypresses in mutt to page through it a screen at a time.) Naturally there was no indication of which of the two dozen messages they agreed with. > For better or worse, top-posting is here to stay. It doesn't work > very well in forums like this one, but it's not too bad if you do it > the way Guido does (which of course is one of the reasons we can't get > rid of it). The basic rules: [...] This is the best description of good top-posting practice I've ever seen, thanks. For what it's worth, I think inline posters also need to follow good practice too: if the reader cannot see new content (i.e. what you wrote) within the first screen full of text, you're probably quoting too much. This rule does not apply to readers trying to read email on a phone that shows only a handful of lines at a time. If you are reading email on a screen the size of a credit card (or smaller), you cannot expect others to accomodate your choice of technology in a discussion group like this. I can't think of *any* good reason to bottom-post without trimming the quoted content. -- Steve From ncoghlan at gmail.com Sun Jun 14 07:25:08 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Jun 2015 15:25:08 +1000 Subject: [Python-ideas] PEP 432 modular bootstrap status update Message-ID: I took some time today to resync my PEP 432 modular bootstrap branch with the current default branch of CPython. While it's still very much a work in progress (with outright test failures for at least isolated mode and subinterpreter support at the moment), https://bitbucket.org/ncoghlan/cpython_sandbox/commits/5e0d1bba5d6dab7b2121a0c0e8df47ce8df74126 represents a significant milestone, as for the first time I was able to replace the current call in Py_Main that initialises the hash randomisation with the new Py_BeginInitialization API and still build an interpreter that at least basically works. The full details of what that branch is about are in https://www.python.org/dev/peps/pep-0432/, but the general idea is to split the interpreter bootstrapping sequence up into two distinct phases: * Initializing: the eval loop works, builtin data types work, frozen and builtin imports work, the compiler works, but most operating system interfaces (including external module imports) don't work * Initialized: fully configured interpreter My main goal here is actually to make the startup code easier to hack on, by having more of it take place with the interpreter in a well-defined state, rather than having a lot of APIs that may or may not work depending on exactly where we are in the bootstrapping process. The main potential benefit for end users these changes should make it easier to embed the CPython runtime in other applications (including command line applications), and *skip the initialisation steps you don't need*. A secondary potential benefit is this should make it easier to have subinterpreters that are *configured differently* from the main interpreter (so, for example, you could have a subinterpreter that had no import system configured), which opens the door to various improvements in the way subinterpreters work in general. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From techtonik at gmail.com Sun Jun 14 13:08:31 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 14 Jun 2015 14:08:31 +0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> Message-ID: On Fri, Jun 12, 2015 at 6:02 PM, Ryan Gonzalez wrote: > On June 12, 2015 8:14:47 AM CDT, anatoly techtonik wrote: >>I would gladly use forum interface like http://www.discourse.org/ >>if there was any linked in message footers. Bottom posting is not >>automated in Gmail and takes a lot of energy and dumb repeated >>keypresses to complete. > > Not really. I just did it right now. ... > Sent from my Android device with K-9 Mail. Please excuse my brevity. Hmm.. From techtonik at gmail.com Sun Jun 14 13:16:43 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 14 Jun 2015 14:16:43 +0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: On Fri, Jun 12, 2015 at 8:09 PM, Andrew Barnert wrote: > On Jun 12, 2015, at 08:02, Ryan Gonzalez wrote: >> >>> On June 12, 2015 8:14:47 AM CDT, anatoly techtonik wrote: >>> I would gladly use forum interface like http://www.discourse.org/ >>> if there was any linked in message footers. Bottom posting is not >>> automated in Gmail and takes a lot of energy and dumb repeated >>> keypresses to complete. >> >> Not really. I just did it right now. > > And if that's too much work, there are Greasemonkey scripts to do it for you. (And, unlike Yahoo, Gmail doesn't seem to go out of their way to break user scripts every two weeks.) And where are those scripts? I find references dated 2008 on some mirror sites, and I doubt that they work. Another problem is that I don't use the same Pip-Boy all the time. The process that need to be automated for Gmail is "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is the annoying combo that I have to write every time to enter bottom posting insert mode. -- anatoly t. From brett at python.org Sun Jun 14 15:11:27 2015 From: brett at python.org (Brett Cannon) Date: Sun, 14 Jun 2015 13:11:27 +0000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <20150614041002.GZ20701@ando.pearwood.info> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> <20150614041002.GZ20701@ando.pearwood.info> Message-ID: On Sun, Jun 14, 2015, 00:10 Steven D'Aprano wrote: On Sat, Jun 13, 2015 at 10:53:23AM +0900, Stephen J. Turnbull wrote: > Andrew Barnert via Python-ideas writes: > > > But really, in non-trivial cases, you usually want your reply > > interleaved with parts of the original; just jumping to the very > > end of the message (past the signature and list footer) and > > replying to the whole thing in bulk isn't much better than > > top-posting. > > Wrong. :-) *Bottom*-posting is *much* worse. Agreed. The worst case I personally ever saw was somebody replying to a digest on a high-volume mailing list where they added "I agree!" and their signature to the very end. I actually counted how many pages of quoting there were: 29 pages, based on ~60 lines per page. (That's full A4 pages mind, it was about 50 keypresses in mutt to page through it a screen at a time.) Naturally there was no indication of which of the two dozen messages they agreed with. +1 from me as well. > For better or worse, top-posting is here to stay. It doesn't work > very well in forums like this one, but it's not too bad if you do it > the way Guido does (which of course is one of the reasons we can't get > rid of it). The basic rules: [...] This is the best description of good top-posting practice I've ever seen, thanks. For what it's worth, I think inline posters also need to follow good practice too: if the reader cannot see new content (i.e. what you wrote) within the first screen full of text, you're probably quoting too much. This rule does not apply to readers trying to read email on a phone that shows only a handful of lines at a time. If you are reading email on a screen the size of a credit card (or smaller), you cannot expect others to accomodate your choice of technology in a discussion group like this. Well, we will see how long that lasts. With mobile now the predominant platform for consumption we might be approaching a point where mobiles are actually how most of us follow mailing lists (says the man writing this email from a tablet). And I bet for a lot of people it is becoming more common to follow things like this list in their spare time on their phones when they have a moment here and there. -brett I can't think of *any* good reason to bottom-post without trimming the quoted content. -- Steve _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mertz at gnosis.cx Sun Jun 14 17:36:15 2015 From: mertz at gnosis.cx (David Mertz) Date: Sun, 14 Jun 2015 08:36:15 -0700 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik wrote: > The process that need to be automated for Gmail is > "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is > the annoying combo that I have to write every time to enter bottom > posting insert mode. > How is it that Anatoly has such great difficulty editing email to intersperse responses in Gmail, and yet I find the process entirely easy and quick? Oh yeah... I guess I answered my own question by mentioning the name of the complainer. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tyler at tylercrompton.com Sun Jun 14 18:00:51 2015 From: tyler at tylercrompton.com (Tyler Crompton) Date: Sun, 14 Jun 2015 11:00:51 -0500 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: May we please continue this discussion elsewhere? I neither feel that this conversation will lead to anything constructive nor see this to be fitting for Python-ideas. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From p.f.moore at gmail.com Sun Jun 14 18:51:29 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 14 Jun 2015 17:51:29 +0100 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: On 14 June 2015 at 12:16, anatoly techtonik wrote: > The process that need to be automated for Gmail is > "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is > the annoying combo that I have to write every time to enter bottom > posting insert mode. And you consider that the time it takes you to press six keys is worth more than the time it takes all the people who want to read your message and understand its context, to scroll down, read the message from the bottom up, and then *fix* your unhelpful quoting style if they wish to quote your comment in the context of a reply? Enough said. Paul From schesis at gmail.com Sun Jun 14 19:30:42 2015 From: schesis at gmail.com (Zero Piraeus) Date: Sun, 14 Jun 2015 14:30:42 -0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <20150614041002.GZ20701@ando.pearwood.info> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> <20150614041002.GZ20701@ando.pearwood.info> Message-ID: <20150614173042.GA2629@piedra> : On Sun, Jun 14, 2015 at 02:10:02PM +1000, Steven D'Aprano wrote: > For what it's worth, I think inline posters also need to follow good > practice too: if the reader cannot see new content (i.e. what you wrote) > within the first screen full of text, you're probably quoting too much. While I agree in principle, others' failure to trim is a handy time-saving measure for me: if there's nothing but quoted text on the first screenful, I take it as a signal that a similar lack of thought went into the original content, and skip to the next message. If there *was* something worth reading after all, there's at least a chance someone who actually knows how to write email replied to it, so I'll see it anyway. -[]z. -- Zero Piraeus: respice finem http://etiol.net/pubkey.asc From techtonik at gmail.com Mon Jun 15 03:45:44 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 15 Jun 2015 04:45:44 +0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <87lhfoidbg.fsf@uwakimon.sk.tsukuba.ac.jp> <20150614041002.GZ20701@ando.pearwood.info> Message-ID: On Sun, Jun 14, 2015 at 4:11 PM, Brett Cannon wrote: > Well, we will see how long that lasts. With mobile now the predominant > platform for consumption we might be approaching a point where mobiles are > actually how most of us follow mailing lists (says the man writing this > email from a tablet). And I bet for a lot of people it is becoming more > common to follow things like this list in their spare time on their phones > when they have a moment here and there. Can Mailman provide statistics about those mobile devices? -- anatoly t. From techtonik at gmail.com Mon Jun 15 03:57:09 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 15 Jun 2015 04:57:09 +0300 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: On Sun, Jun 14, 2015 at 6:36 PM, David Mertz wrote: > On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik > wrote: >> >> The process that need to be automated for Gmail is >> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is >> the annoying combo that I have to write every time to enter bottom >> posting insert mode. > > > How is it that Anatoly has such great difficulty editing email to > intersperse responses in Gmail, and yet I find the process entirely easy and > quick? Oh yeah... I guess I answered my own question by mentioning the name > of the complainer. The requirements of the machine <-> human interface are very subjective and depend on the age of a person being introduced to enabling communication technology. I doubt that people younger than 25 are considering email as a communication method at all, and I know that annoying interfaces are the reason why people pretend not to use them. So, it may happen that @python.org discussions are limited to a certain age group because of that. -- anatoly t. From rymg19 at gmail.com Mon Jun 15 04:17:32 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Sun, 14 Jun 2015 21:17:32 -0500 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> Message-ID: <24C52955-6FB8-46E1-B37B-37D0DAF07B05@gmail.com> On June 14, 2015 8:57:09 PM CDT, anatoly techtonik wrote: >On Sun, Jun 14, 2015 at 6:36 PM, David Mertz wrote: >> On Sun, Jun 14, 2015 at 4:16 AM, anatoly techtonik > >> wrote: >>> >>> The process that need to be automated for Gmail is >>> "Down->Enter->Del->Del->Down->Down" when I enter reply mode. This is >>> the annoying combo that I have to write every time to enter bottom >>> posting insert mode. >> >> >> How is it that Anatoly has such great difficulty editing email to >> intersperse responses in Gmail, and yet I find the process entirely >easy and >> quick? Oh yeah... I guess I answered my own question by mentioning >the name >> of the complainer. > >The requirements of the machine <-> human interface are very subjective >and >depend on the age of a person being introduced to enabling >communication >technology. I doubt that people younger than 25 are considering email >as a >communication method at all, and I know that annoying interfaces are >the >reason why people pretend not to use them. So, it may happen that >@python.org >discussions are limited to a certain age group because of that. I'm under 25! It's not *annoying interfaces*; everything has some annoying interface aspect. That's very subjective. Personally, I prefer email to IM. Remember, you're referring to the nerdy youth, not the normal ones. :) -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From ncoghlan at gmail.com Mon Jun 15 07:03:10 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Jun 2015 15:03:10 +1000 Subject: [Python-ideas] Meta: Email netiquette In-Reply-To: <24C52955-6FB8-46E1-B37B-37D0DAF07B05@gmail.com> References: <20150602224420.GA9022@phdru.name> <469EF9A9-6B1D-44C4-9C74-611E9625B5BA@gmail.com> <639258B4-0D67-4224-9393-DE2E47587706@yahoo.com> <24C52955-6FB8-46E1-B37B-37D0DAF07B05@gmail.com> Message-ID: On 15 June 2015 at 12:17, Ryan Gonzalez wrote: > I'm under 25! It's not *annoying interfaces*; everything has some annoying interface aspect. That's very subjective. Personally, I prefer email to IM. > > Remember, you're referring to the nerdy youth, not the normal ones. :) Folks, before we continue further down the rabbit hole, please take into account that Anatoly has already been banned from bugs.python.org and the core-mentorship mailing list for being demonstrably unable to adjust his own behaviour to appropriately account for the needs and interests of other members of the Python community, especially the CPython core development team. (Some additional background on the latter: https://mail.python.org/pipermail//python-committers/2012-December/002289.html ) This isn't a case of "poor communication practices from someone that doesn't yet understand our expectations of appropriate behaviour when it comes to considering the needs and perspectives of others", it's "poor communication practices from someone that actively refuses to meet our standards of expected behaviour despite years of collective coaching from a range of core developers". Making a request to the respective list moderators for Anatoly's ban to be extended to also cover python-dev and python-ideas as the main core development lists (as has clearly become necessary) has been approved by the Python Software Foundation Board, but actually putting that request into effect is unfortunately a somewhat complicated distributed task with Mailman 2 so it's still a work in progress at this point in time. (It's my understanding that Mailman 3 improves the tooling made available to list moderators, but the migration of the python.org mailing lists from Mailman 2 to Mailman 3 is going to be a time consuming infrastructure maintenance task in and of itself) Regards, Nick. P.S. Pondering the broader question of managing counterproductive attempts at contributing to a community, my view on the key reason that Anatoly's ongoing attempts to "help" the core development team have proven to be a particularly challenging situation to address relates to the fact that there are two key aspects to effecting change in a community or organisation: * vocally criticising it from the outside (allowing the existing community leaders to decide whether or not they agree with the concerns raised, and subsequently come up with their preferred approach to addressing them) * working to change it from the inside (often by gaining personal credibility through contribution in non-controversial areas before pushing for potentially controversial changes in other areas of interest) Making significant structural changes to a community or organisation usually requires a combination of both activities, as influential insiders echo and amplify the voices of critical outsiders that they have come to agree with, as former insiders leave and adopt the role of critical outsider, and as formerly critical outsiders are brought into the fold as new insiders to help address the problems they noted. Refusing to listen to criticism at all is a recipe for stagnation and decline, so folks are understandably wary of shutting out critical voices in the general case. However, these "critical outsider" and "influential insider" roles in advocating for structural change are also largely mutually exclusive - for an outsider, "how could we make such a change in practice?" isn't their problem, while for insiders, agreeing with a diagnosis of a problem or concern is only the first step, as actually addressing the concern is then a complex exercise in determining where the time and energy to address the issue is going to come from, and how the particular concern stacks up against all the other problems and challenges that need to be addressed. Expanding the available pool of contributor time and energy doesn't necessarily eliminate the latter requirement for collaborative prioritisation, as collective ambition often grows right along with the size of the contributor base. It *is* possible for skilled communicators to pursue both roles at the same time (which can be an amazingly effective approach when handled well), but it's a difficult task that requires context-dependent moderation of their own behaviour, such that when they're using community specific communication channels, they operate primarily in "influential insider" mode, and largely reserve "critical outsider" mode for raising their concerns on their own platforms (e.g. a personal blog). More commonly, folks will choose one approach or the other based on their current level of engagement with the community concerned. By contrast, someone using community specific communication channels while persisting in operating in "critical outsider" mode counts as deliberately disruptive behaviour, as it involves privileging our own personal view of what we think the group's collective priorities *should* be and *forcing* a discussion on those topics, rather than gracefully accepting that there's almost always going to be a gap between our personal priorities and the collective priorities of the communities we choose to participate in, so we need to adjust our expectations accordingly. The combination of "insists on being directly involved in a particular community" and "refuses to address feedback they receive on the inappropriateness of their behaviour in that community" is fortunately rare - most folks will either move on voluntarily once it is made clear that their priorities and the group's priorities aren't aligned, or else they will adjust their behaviour to be in line with community norms whilst participating in that community. For veterans of entirely unmoderated Usenet newsgroups, the historical answer to the problematic "doesn't leave voluntarily when it is made clear that their behaviour is not welcome" pattern has been to adopt personal filters that automatically delete messages from particularly unhelpful group participants. One of the realisations that has come with the growth of online community management as a field of expertise is that this individual filtering based approach is hostile to new participants - newcomers don't know who to avoid yet, so they attempt to engage productively with folks that aren't actually interested in collaborative discussion (whether they know it or not). In the absence of enforced bans, experienced group participants then face a choice between appearing generally hostile (if they warn the newcomers away from the participants known to regularly exhibit toxic behaviour), or generally uncaring (if they leave the newcomers to their own devices). Commercially backed open source communities have actually lead the way in addressing this, as they're generally far more comfortable with asserting their authority to ban folks in the interests of fostering a more effective collaborative environment, and critical voices like Model View Culture [1] have also had a major part to play in pointing out the problems resulting from the historical approach of leaving folks to find their own means of coping with toxic behaviour. Continuing this meta-discussion here would be taking us even further off-topic for python-ideas, though, so if anyone would like to continue, I would suggest the comment thread on http://www.curiousefficiency.org/posts/2015/01/abuse-is-not-ok.html as a possible venue. [1] https://modelviewculture.com/pieces/leaving-toxic-open-source-communities -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From amber.yust at gmail.com Wed Jun 17 20:58:00 2015 From: amber.yust at gmail.com (Amber Yust) Date: Wed, 17 Jun 2015 18:58:00 +0000 Subject: [Python-ideas] Keyword-only arguments? Message-ID: One thing that has been a source of bugs and frustration in the past is the inability to designate a named keyword argument that cannot be passed as a positional argument (short of **kwargs and then keying into the dict directly). Has there been any previous discussion on the possibility of a means to designate named arguments as explicitly non-positional? Not a solid proposal, but to capture the essential difference of what I'm thinking of, along the lines of... def foo(bar, baz=None, qux: None): where bar is a required positional argument, baz is an optional argument that can have a value passed positionally or by name, and qux is an optional argument that must always be passed by keyword. Such a means would help avoid cases where a misremembered function signature results in a subtle and likely unnoticed bug due to unintended parameter/argument mismatch. (It's possible that this has been discussed before - a cursory search of python-ideas didn't bring up any direct discussion, but I may have missed something. If you have a link to prior discussion, please by all means point me at it!) -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Wed Jun 17 21:00:39 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 17 Jun 2015 15:00:39 -0400 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: https://www.python.org/dev/peps/pep-3102/ On Wed, Jun 17, 2015 at 2:58 PM, Amber Yust wrote: > One thing that has been a source of bugs and frustration in the past is > the inability to designate a named keyword argument that cannot be passed > as a positional argument (short of **kwargs and then keying into the dict > directly). Has there been any previous discussion on the possibility of a > means to designate named arguments as explicitly non-positional? > > Not a solid proposal, but to capture the essential difference of what I'm > thinking of, along the lines of... > > def foo(bar, baz=None, qux: None): > > where bar is a required positional argument, baz is an optional argument > that can have a value passed positionally or by name, and qux is an > optional argument that must always be passed by keyword. > > Such a means would help avoid cases where a misremembered function > signature results in a subtle and likely unnoticed bug due to unintended > parameter/argument mismatch. > > (It's possible that this has been discussed before - a cursory search of > python-ideas didn't bring up any direct discussion, but I may have missed > something. If you have a link to prior discussion, please by all means > point me at it!) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Wed Jun 17 21:01:20 2015 From: toddrjen at gmail.com (Todd) Date: Wed, 17 Jun 2015 21:01:20 +0200 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On Jun 17, 2015 8:58 PM, "Amber Yust" wrote: > > One thing that has been a source of bugs and frustration in the past is the inability to designate a named keyword argument that cannot be passed as a positional argument (short of **kwargs and then keying into the dict directly). Has there been any previous discussion on the possibility of a means to designate named arguments as explicitly non-positional? > > Not a solid proposal, but to capture the essential difference of what I'm thinking of, along the lines of... > > def foo(bar, baz=None, qux: None): > > where bar is a required positional argument, baz is an optional argument that can have a value passed positionally or by name, and qux is an optional argument that must always be passed by keyword. > > Such a means would help avoid cases where a misremembered function signature results in a subtle and likely unnoticed bug due to unintended parameter/argument mismatch. > > (It's possible that this has been discussed before - a cursory search of python-ideas didn't bring up any direct discussion, but I may have missed something. If you have a link to prior discussion, please by all means point me at it!) > Already present in python 3: https://www.python.org/dev/peps/pep-3102/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ckaynor at zindagigames.com Wed Jun 17 21:01:08 2015 From: ckaynor at zindagigames.com (Chris Kaynor) Date: Wed, 17 Jun 2015 12:01:08 -0700 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust wrote: > One thing that has been a source of bugs and frustration in the past is > the inability to designate a named keyword argument that cannot be passed > as a positional argument (short of **kwargs and then keying into the dict > directly). Has there been any previous discussion on the possibility of a > means to designate named arguments as explicitly non-positional? > > Not a solid proposal, but to capture the essential difference of what I'm > thinking of, along the lines of... > > def foo(bar, baz=None, qux: None): > > where bar is a required positional argument, baz is an optional argument > that can have a value passed positionally or by name, and qux is an > optional argument that must always be passed by keyword. > > Such a means would help avoid cases where a misremembered function > signature results in a subtle and likely unnoticed bug due to unintended > parameter/argument mismatch. > > (It's possible that this has been discussed before - a cursory search of > python-ideas didn't bring up any direct discussion, but I may have missed > something. If you have a link to prior discussion, please by all means > point me at it!) > This feature was added to Python 3 about 9 years ago, see https://www.python.org/dev/peps/pep-3102/. A quick search for "python keyword only arguments" on Google found it. Guido's time machine strikes again! Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From amber.yust at gmail.com Wed Jun 17 21:11:23 2015 From: amber.yust at gmail.com (Amber Yust) Date: Wed, 17 Jun 2015 19:11:23 +0000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: Interesting. I don't think I've ever seen it used, even having looked at Python 3 code. For those who have worked with more Python 3 code than I have, do you ever see it used? On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor wrote: > On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust wrote: > >> One thing that has been a source of bugs and frustration in the past is >> the inability to designate a named keyword argument that cannot be passed >> as a positional argument (short of **kwargs and then keying into the dict >> directly). Has there been any previous discussion on the possibility of a >> means to designate named arguments as explicitly non-positional? >> >> Not a solid proposal, but to capture the essential difference of what I'm >> thinking of, along the lines of... >> >> def foo(bar, baz=None, qux: None): >> >> where bar is a required positional argument, baz is an optional argument >> that can have a value passed positionally or by name, and qux is an >> optional argument that must always be passed by keyword. >> >> Such a means would help avoid cases where a misremembered function >> signature results in a subtle and likely unnoticed bug due to unintended >> parameter/argument mismatch. >> >> (It's possible that this has been discussed before - a cursory search of >> python-ideas didn't bring up any direct discussion, but I may have missed >> something. If you have a link to prior discussion, please by all means >> point me at it!) >> > > This feature was added to Python 3 about 9 years ago, see > https://www.python.org/dev/peps/pep-3102/. A quick search for "python > keyword only arguments" on Google found it. > > Guido's time machine strikes again! > > Chris > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jun 17 21:25:22 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 17 Jun 2015 12:25:22 -0700 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: <5581C9A2.5000902@stoneleaf.us> On 06/17/2015 12:11 PM, Amber Yust wrote: > > Interesting. I don't think I've ever seen it used, even having looked at Python 3 code. For those who have worked with more Python 3 code than I have, do you ever see it used? We don't typically go back and modify existing code to use new features, so your best bet to see it used is to find new features in Python 3. Or, do a grep on the source code: base64.py:def a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False): base64.py:def a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v'): codecs.py: *, _is_text_encoding=None): configparser.py: allow_no_value=False, *, delimiters=('=', ':'), configparser.py: def get(self, section, option, *, raw=False, vars=None, fallback=_UNSET): configparser.py: def _get_conv(self, section, option, conv, *, raw=False, vars=None, configparser.py: def getint(self, section, option, *, raw=False, vars=None, configparser.py: def getfloat(self, section, option, *, raw=False, vars=None, configparser.py: def getboolean(self, section, option, *, raw=False, vars=None, configparser.py: def _validate_value_types(self, *, section="", option="", value=""): configparser.py: def get(self, option, fallback=None, *, raw=False, vars=None, datetime.py:# standard time. Since that's what the local clock *does*, we want to map both difflib.py: context=False, numlines=5, *, charset='utf-8'): dis.py:def dis(x=None, *, file=None): dis.py:def distb(tb=None, *, file=None): dis.py:def show_code(co, *, file=None): dis.py:def get_instructions(x, *, first_line=None): dis.py:def disassemble(co, lasti=-1, *, file=None): dis.py: *, file=None, line_offset=0): dis.py:def _disassemble_str(source, *, file=None): dis.py: def __init__(self, x, *, first_line=None, current_offset=None): enum.py: def __call__(cls, value, names=None, *, module=None, qualname=None, type=None, start=1): enum.py: def _create_(cls, class_name, names=None, *, module=None, qualname=None, type=None, start=1): functools.py: # an ABC *base*, insert said ABC to its MRO. glob.py:def glob(pathname, *, recursive=False): glob.py:def iglob(pathname, *, recursive=False): inspect.py:attributes (co_*, im_*, tb_*, etc.) in a friendlier fashion. inspect.py:def unwrap(func, *, stop=None): inspect.py: # signature: "(*, a='spam', b, c)". Because attempting inspect.py:def _signature_from_callable(obj, *, inspect.py: def __init__(self, name, kind, *, default=_empty, annotation=_empty): inspect.py: def replace(self, *, name=_void, kind=_void, inspect.py: def __init__(self, parameters=None, *, return_annotation=_empty, inspect.py: def from_callable(cls, obj, *, follow_wrapped=True): inspect.py: def replace(self, *, parameters=_void, return_annotation=_void): inspect.py: def _bind(self, args, kwargs, *, partial=False): inspect.py: # separator to the parameters list ("foo(arg1, *, arg2)" case) inspect.py:def signature(obj, *, follow_wrapped=True): lzma.py: def __init__(self, filename=None, mode="r", *, lzma.py:def open(filename, mode="rb", *, lzma.py: optional arguments *format*, *check*, *preset* and *filters*. lzma.py: optional arguments *format*, *check* and *filters*. nntplib.py: def newgroups(self, date, *, file=None): nntplib.py: def newnews(self, group, date, *, file=None): nntplib.py: def list(self, group_pattern=None, *, file=None): nntplib.py: def help(self, *, file=None): nntplib.py: def head(self, message_spec=None, *, file=None): nntplib.py: def body(self, message_spec=None, *, file=None): nntplib.py: def article(self, message_spec=None, *, file=None): nntplib.py: def xhdr(self, hdr, str, *, file=None): nntplib.py: def xover(self, start, end, *, file=None): nntplib.py: def over(self, message_spec, *, file=None): nntplib.py: def xgtitle(self, group, *, file=None): ntpath.py:# See also module 'glob' for expansion of *, ? and [...] in pathnames. numbers.py: *, /, abs(), .conjugate, ==, and !=. os.py: def fwalk(top=".", topdown=True, onerror=None, *, follow_symlinks=False, dir_fd=None): pickle.py: def __init__(self, file, protocol=None, *, fix_imports=True): pickle.py: def __init__(self, file, *, fix_imports=True, pickle.py: Optional keyword arguments are *fix_imports*, *encoding* and pickle.py: *errors*, which are used to control compatiblity support for pickle.py:def _dump(obj, file, protocol=None, *, fix_imports=True): pickle.py:def _dumps(obj, protocol=None, *, fix_imports=True): pickle.py:def _load(file, *, fix_imports=True, encoding="ASCII", errors="strict"): pickle.py:def _loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"): plistlib.py:def load(fp, *, fmt=None, use_builtin_types=True, dict_type=dict): plistlib.py:def loads(value, *, fmt=None, use_builtin_types=True, dict_type=dict): plistlib.py:def dump(value, fp, *, fmt=FMT_XML, sort_keys=True, skipkeys=False): plistlib.py:def dumps(value, *, fmt=FMT_XML, skipkeys=False, sort_keys=True): posixpath.py:# See also module 'glob' for expansion of *, ? and [...] in pathnames. pprint.py:def pprint(object, stream=None, indent=1, width=80, depth=None, *, pprint.py:def pformat(object, indent=1, width=80, depth=None, *, compact=False): pprint.py: def __init__(self, indent=1, width=80, depth=None, stream=None, *, pydoc.py:def browse(port=0, *, open_browser=True): _pyio.py: *opener* with (*file*, *flags*). *opener* must return an open file _pyio.py: """Read up to len(b) bytes into *b*, using at most one system call _pyio.py: object is then obtained by calling opener with (*name*, *flags*). shutil.py:def copyfile(src, dst, *, follow_symlinks=True): shutil.py:def copymode(src, dst, *, follow_symlinks=True): shutil.py: def _copyxattr(src, dst, *, follow_symlinks=True): shutil.py:def copystat(src, dst, *, follow_symlinks=True): shutil.py:def copy(src, dst, *, follow_symlinks=True): shutil.py:def copy2(src, dst, *, follow_symlinks=True): socket.py: def makefile(self, mode="r", buffering=None, *, ssl.py:def create_default_context(purpose=Purpose.SERVER_AUTH, *, cafile=None, ssl.py:def _create_unverified_context(protocol=PROTOCOL_SSLv23, *, cert_reqs=None, tarfile.py: def list(self, verbose=True, *, members=None): tarfile.py: def add(self, name, arcname=None, recursive=True, exclude=None, *, filter=None): tarfile.py: def extractall(self, path=".", members=None, *, numeric_owner=False): tarfile.py: def extract(self, member, path="", set_attrs=True, *, numeric_owner=False): textwrap.py: *, textwrap.py: the *width*, it is returned as is. Otherwise, as many words threading.py: args=(), kwargs=None, *, daemon=None): timeit.py:def main(args=None, *, _wrap_timer=None): traceback.py: def __init__(self, filename, lineno, name, *, lookup_line=True, traceback.py: def extract(klass, frame_gen, *, limit=None, lookup_lines=True, traceback.py: def __init__(self, exc_type, exc_value, exc_traceback, *, limit=None, traceback.py: def format(self, *, chain=True): traceback.py: If chain is not *True*, *__cause__* and *__context__* will not be formatted. typing.py: def __new__(cls, name, bases, namespace, *, _root=False): warnings.py: def __init__(self, *, record=False, module=None): -- ~Ethan~ From guido at python.org Wed Jun 17 21:29:31 2015 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Jun 2015 21:29:31 +0200 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: It's used all over the asyncio code. On Wed, Jun 17, 2015 at 9:11 PM, Amber Yust wrote: > Interesting. I don't think I've ever seen it used, even having looked at > Python 3 code. For those who have worked with more Python 3 code than I > have, do you ever see it used? > > On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor > wrote: > >> On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust >> wrote: >> >>> One thing that has been a source of bugs and frustration in the past is >>> the inability to designate a named keyword argument that cannot be passed >>> as a positional argument (short of **kwargs and then keying into the dict >>> directly). Has there been any previous discussion on the possibility of a >>> means to designate named arguments as explicitly non-positional? >>> >>> Not a solid proposal, but to capture the essential difference of what >>> I'm thinking of, along the lines of... >>> >>> def foo(bar, baz=None, qux: None): >>> >>> where bar is a required positional argument, baz is an optional argument >>> that can have a value passed positionally or by name, and qux is an >>> optional argument that must always be passed by keyword. >>> >>> Such a means would help avoid cases where a misremembered function >>> signature results in a subtle and likely unnoticed bug due to unintended >>> parameter/argument mismatch. >>> >>> (It's possible that this has been discussed before - a cursory search of >>> python-ideas didn't bring up any direct discussion, but I may have missed >>> something. If you have a link to prior discussion, please by all means >>> point me at it!) >>> >> >> This feature was added to Python 3 about 9 years ago, see >> https://www.python.org/dev/peps/pep-3102/. A quick search for "python >> keyword only arguments" on Google found it. >> >> Guido's time machine strikes again! >> >> Chris >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From carl at oddbird.net Wed Jun 17 21:02:03 2015 From: carl at oddbird.net (Carl Meyer) Date: Wed, 17 Jun 2015 13:02:03 -0600 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: <5581C42B.3050908@oddbird.net> Hi Amber, On 06/17/2015 12:58 PM, Amber Yust wrote: > One thing that has been a source of bugs and frustration in the past is > the inability to designate a named keyword argument that cannot be > passed as a positional argument (short of **kwargs and then keying into > the dict directly). Has there been any previous discussion on the > possibility of a means to designate named arguments as explicitly > non-positional? > > Not a solid proposal, but to capture the essential difference of what > I'm thinking of, along the lines of... > > def foo(bar, baz=None, qux: None): > > where bar is a required positional argument, baz is an optional argument > that can have a value passed positionally or by name, and qux is an > optional argument that must always be passed by keyword. > > Such a means would help avoid cases where a misremembered function > signature results in a subtle and likely unnoticed bug due to unintended > parameter/argument mismatch. > > (It's possible that this has been discussed before - a cursory search of > python-ideas didn't bring up any direct discussion, but I may have > missed something. If you have a link to prior discussion, please by all > means point me at it!) I can do better than prior discussion - this already exists in Python 3: Python 3.4.2 (default, Dec 12 2014, 17:46:08) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> def foo(bar, *, baz=None): ... print(bar, baz) ... >>> foo('a', 'b') Traceback (most recent call last): File "", line 1, in TypeError: foo() takes 1 positional argument but 2 were given >>> foo('a', baz='b') a b See https://www.python.org/dev/peps/pep-3102/ Carl -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From njs at pobox.com Wed Jun 17 21:32:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 17 Jun 2015 12:32:07 -0700 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On Jun 17, 2015 12:11 PM, "Amber Yust" wrote: > > Interesting. I don't think I've ever seen it used, even having looked at Python 3 code. For those who have worked with more Python 3 code than I have, do you ever see it used? Unfortunately, no, because at this point almost all py3 APIs I see are still aiming for py2/py3 compatibility, and there's really no good way to accomplish kw only args in py2. It can be done, but it's very cumbersome; not like range or print or whatever where you can just import something from six or add some parentheses. In retrospect I wish this had been backported to 2.7, because they're super useful for making better APIs, but that ship has sailed. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Wed Jun 17 21:36:45 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 17 Jun 2015 15:36:45 -0400 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: > In retrospect I wish this had been backported to 2.7, because they're super useful for making better APIs, but that ship has sailed Why would this not be able to be ported now? It does not clash with any existing python 2 syntax so all current python 2 is still valid and has no behaviour change. On Wed, Jun 17, 2015 at 3:32 PM, Nathaniel Smith wrote: > On Jun 17, 2015 12:11 PM, "Amber Yust" wrote: > > > > Interesting. I don't think I've ever seen it used, even having looked at > Python 3 code. For those who have worked with more Python 3 code than I > have, do you ever see it used? > > Unfortunately, no, because at this point almost all py3 APIs I see are > still aiming for py2/py3 compatibility, and there's really no good way to > accomplish kw only args in py2. It can be done, but it's very cumbersome; > not like range or print or whatever where you can just import something > from six or add some parentheses. > > In retrospect I wish this had been backported to 2.7, because they're > super useful for making better APIs, but that ship has sailed. > > -n > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Wed Jun 17 21:32:36 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Wed, 17 Jun 2015 15:32:36 -0400 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: I use it for all of my python3 only code for all of the reasons that you mentioned. One of the main reasons that it is not used is that many people try to make their code work with 2 and 3. On Wed, Jun 17, 2015 at 3:11 PM, Amber Yust wrote: > Interesting. I don't think I've ever seen it used, even having looked at > Python 3 code. For those who have worked with more Python 3 code than I > have, do you ever see it used? > > On Wed, Jun 17, 2015 at 12:02 PM Chris Kaynor > wrote: > >> On Wed, Jun 17, 2015 at 11:58 AM, Amber Yust >> wrote: >> >>> One thing that has been a source of bugs and frustration in the past is >>> the inability to designate a named keyword argument that cannot be passed >>> as a positional argument (short of **kwargs and then keying into the dict >>> directly). Has there been any previous discussion on the possibility of a >>> means to designate named arguments as explicitly non-positional? >>> >>> Not a solid proposal, but to capture the essential difference of what >>> I'm thinking of, along the lines of... >>> >>> def foo(bar, baz=None, qux: None): >>> >>> where bar is a required positional argument, baz is an optional argument >>> that can have a value passed positionally or by name, and qux is an >>> optional argument that must always be passed by keyword. >>> >>> Such a means would help avoid cases where a misremembered function >>> signature results in a subtle and likely unnoticed bug due to unintended >>> parameter/argument mismatch. >>> >>> (It's possible that this has been discussed before - a cursory search of >>> python-ideas didn't bring up any direct discussion, but I may have missed >>> something. If you have a link to prior discussion, please by all means >>> point me at it!) >>> >> >> This feature was added to Python 3 about 9 years ago, see >> https://www.python.org/dev/peps/pep-3102/. A quick search for "python >> keyword only arguments" on Google found it. >> >> Guido's time machine strikes again! >> >> Chris >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geoffspear at gmail.com Wed Jun 17 22:26:55 2015 From: geoffspear at gmail.com (Geoffrey Spear) Date: Wed, 17 Jun 2015 16:26:55 -0400 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On Wed, Jun 17, 2015 at 3:36 PM, Joseph Jevnik wrote: > > In retrospect I wish this had been backported to 2.7, because they're > super useful for making better APIs, but that ship has sailed > > Why would this not be able to be ported now? It does not clash with any > existing python 2 syntax so all current python 2 is still valid and has no > behaviour change. > > > Python 2.7 is not getting new features (ssl changes notwithstanding), and there will never be a Python 2.8. There's certainly no desire to add new syntax so there would be code that will run in Python 2.7.11 only, and not in earlier 2.7 releases. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jun 18 00:52:49 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 18 Jun 2015 08:52:49 +1000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On 18 Jun 2015 6:27 am, "Geoffrey Spear" wrote: > > > > On Wed, Jun 17, 2015 at 3:36 PM, Joseph Jevnik wrote: >> >> > In retrospect I wish this had been backported to 2.7, because they're super useful for making better APIs, but that ship has sailed >> >> Why would this not be able to be ported now? It does not clash with any existing python 2 syntax so all current python 2 is still valid and has no behaviour change. >> >> > > Python 2.7 is not getting new features (ssl changes notwithstanding), and there will never be a Python 2.8. There's certainly no desire to add new syntax so there would be code that will run in Python 2.7.11 only, and not in earlier 2.7 releases. Exactly - it's only in truly exceptional cases like the PEP 466 & 476 network security changes that we'll add features to Python 2.7. Keyword-only arguments are certainly a nice enhancement, but their absence isn't actively harmful the way the aging network security capabilities were. Regards, Nick. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaiser.yann at gmail.com Sat Jun 20 19:52:03 2015 From: kaiser.yann at gmail.com (Yann Kaiser) Date: Sat, 20 Jun 2015 17:52:03 +0000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith wrote: > there's really no good way to accomplish kw only args in py2. It can be > done, but it's very cumbersome; not like range or print or whatever where > you can just import something from six or add some parentheses. > As you correctly point out, it can't be done without friction. I've attempted backporting kw-only parameters through decorators: from sigtools import modifiers @modifiers.kwoargs('kwop') def func(abc, kwop): ... @modifiers.autokwoargs def func(abc, kwop=False): ... http://sigtools.readthedocs.org/en/latest/#sigtools.modifiers.kwoargs -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Jun 20 20:13:46 2015 From: flying-sheep at web.de (Philipp A.) Date: Sat, 20 Jun 2015 18:13:46 +0000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: OK, i think it?s time to finally switch to python 3 instead of writing more horrible crutches: # coding: utf-8from __future__ import absolute_import, division, print_function, unicode_literalsfrom builtins import * import trolliusfrom trollius import From, Return from other import stuff @trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5): res = (yield From(stuff())) raise Return(res) vs. import asyncio from other import stuff @asyncio.coroutinedef awesome_stuff(a, *, b=5): res = (yield from stuff()) return res or soon: from other import stuff async def awesome_stuff(a, *, b=5): res = await stuff() return res Yann Kaiser kaiser.yann at gmail.com schrieb am Sa., 20. Juni 2015 um 19:52 Uhr: On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith wrote: > >> there's really no good way to accomplish kw only args in py2. It can be >> done, but it's very cumbersome; not like range or print or whatever where >> you can just import something from six or add some parentheses. >> > As you correctly point out, it can't be done without friction. > > I've attempted backporting kw-only parameters through decorators: > > from sigtools import modifiers > > @modifiers.kwoargs('kwop') > def func(abc, kwop): > ... > > @modifiers.autokwoargs > def func(abc, kwop=False): > ... > > http://sigtools.readthedocs.org/en/latest/#sigtools.modifiers.kwoargs > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kaiser.yann at gmail.com Sat Jun 20 20:28:18 2015 From: kaiser.yann at gmail.com (Yann Kaiser) Date: Sat, 20 Jun 2015 18:28:18 +0000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: Definitely agree with wanting to move on and focus on Python 3. I design my stuff with Python 3 first in mind, but reality tells me I need to keep supporting Python 2 users, even if that means uglifying front-page examples with things such as sigtools.modifiers.kwoargs. The existence of this thread and of some of the emails in the "signature documentation" thread only serves as proof that I'd confuse too many by keeping "kwoargs" & co a side-note. Developing on Python 3.4 is great, things like chained tracebacks are fantastic. If I can do it on Python 3, I will. But what if I want to deploy to GAE? Stuck with 2.7. So are many users. It's ugly, it sucks, but so far it is out of my hands and necessary. On Sat, 20 Jun 2015 at 20:13 Philipp A. wrote: > OK, i think it?s time to finally switch to python 3 instead of writing > more horrible crutches: > > # coding: utf-8from __future__ import absolute_import, division, print_function, unicode_literalsfrom builtins import * > import trolliusfrom trollius import From, Return > from other import stuff > @trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5): > res = (yield From(stuff())) > raise Return(res) > > vs. > > import asyncio > from other import stuff > @asyncio.coroutinedef awesome_stuff(a, *, b=5): > res = (yield from stuff()) > return res > > or soon: > > from other import stuff > > async def awesome_stuff(a, *, b=5): > res = await stuff() > return res > > Yann Kaiser kaiser.yann at gmail.com > schrieb am Sa., 20. Juni 2015 um 19:52 Uhr: > > On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith wrote: >> >>> there's really no good way to accomplish kw only args in py2. It can be >>> done, but it's very cumbersome; not like range or print or whatever where >>> you can just import something from six or add some parentheses. >>> >> As you correctly point out, it can't be done without friction. >> >> I've attempted backporting kw-only parameters through decorators: >> >> from sigtools import modifiers >> >> @modifiers.kwoargs('kwop') >> def func(abc, kwop): >> ... >> >> @modifiers.autokwoargs >> def func(abc, kwop=False): >> ... >> >> http://sigtools.readthedocs.org/en/latest/#sigtools.modifiers.kwoargs >> > _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > ? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From flying-sheep at web.de Sat Jun 20 21:05:52 2015 From: flying-sheep at web.de (Philipp A.) Date: Sat, 20 Jun 2015 19:05:52 +0000 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: sure! and having those workarounds if you really need them is great. it?s just really frustrating that so many of you are stuck with 2.7 or even less without more reason than ?sysadmin won?t install a SCL?. Yann Kaiser schrieb am Sa., 20. Juni 2015 um 20:28 Uhr: > Definitely agree with wanting to move on and focus on Python 3. > > I design my stuff with Python 3 first in mind, but reality tells me I need > to keep supporting Python 2 users, even if that means uglifying front-page > examples with things such as sigtools.modifiers.kwoargs. The existence of > this thread and of some of the emails in the "signature documentation" > thread only serves as proof that I'd confuse too many by keeping "kwoargs" > & co a side-note. > > Developing on Python 3.4 is great, things like chained tracebacks are > fantastic. If I can do it on Python 3, I will. But what if I want to deploy > to GAE? Stuck with 2.7. So are many users. > > It's ugly, it sucks, but so far it is out of my hands and necessary. > > On Sat, 20 Jun 2015 at 20:13 Philipp A. wrote: > >> OK, i think it?s time to finally switch to python 3 instead of writing >> more horrible crutches: >> >> # coding: utf-8from __future__ import absolute_import, division, print_function, unicode_literalsfrom builtins import * >> import trolliusfrom trollius import From, Return >> from other import stuff >> @trollius.coroutine at modifiers.kwoargs('b')def awesome_stuff(a, b=5): >> res = (yield From(stuff())) >> raise Return(res) >> >> vs. >> >> import asyncio >> from other import stuff >> @asyncio.coroutinedef awesome_stuff(a, *, b=5): >> res = (yield from stuff()) >> return res >> >> or soon: >> >> from other import stuff >> >> async def awesome_stuff(a, *, b=5): >> res = await stuff() >> return res >> >> Yann Kaiser kaiser.yann at gmail.com >> schrieb am Sa., 20. Juni 2015 um 19:52 Uhr: >> >> On Wed, 17 Jun 2015 at 21:33 Nathaniel Smith wrote: >>> >>>> there's really no good way to accomplish kw only args in py2. It can be >>>> done, but it's very cumbersome; not like range or print or whatever where >>>> you can just import something from six or add some parentheses. >>>> >>> As you correctly point out, it can't be done without friction. >>> >>> I've attempted backporting kw-only parameters through decorators: >>> >>> from sigtools import modifiers >>> >>> @modifiers.kwoargs('kwop') >>> def func(abc, kwop): >>> ... >>> >>> @modifiers.autokwoargs >>> def func(abc, kwop=False): >>> ... >>> >>> http://sigtools.readthedocs.org/en/latest/#sigtools.modifiers.kwoargs >>> >> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> ? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Jun 20 23:42:33 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 20 Jun 2015 15:42:33 -0600 Subject: [Python-ideas] solving multi-core Python Message-ID: tl;dr Let's exploit multiple cores by fixing up subinterpreters, exposing them in Python, and adding a mechanism to safely share objects between them. This proposal is meant to be a shot over the bow, so to speak. I plan on putting together a more complete PEP some time in the future, with content that is more refined along with references to the appropriate online resources. Feedback appreciated! Offers to help even more so! :) -eric -------- Python's multi-core story is murky at best. Not only can we be more clear on the matter, we can improve Python's support. The result of any effort must make multi-core (i.e. parallelism) support in Python obvious, unmistakable, and undeniable (and keep it Pythonic). Currently we have several concurrency models represented via threading, multiprocessing, asyncio, concurrent.futures (plus others in the cheeseshop). However, in CPython the GIL means that we don't have parallelism, except through multiprocessing which requires trade-offs. (See Dave Beazley's talk at PyCon US 2015.) This is a situation I'd like us to solve once and for all for a couple of reasons. Firstly, it is a technical roadblock for some Python developers, though I don't see that as a huge factor. Regardless, secondly, it is especially a turnoff to folks looking into Python and ultimately a PR issue. The solution boils down to natively supporting multiple cores in Python code. This is not a new topic. For a long time many have clamored for death to the GIL. Several attempts have been made over the years and failed to do it without sacrificing single-threaded performance. Furthermore, removing the GIL is perhaps an obvious solution but not the only one. Others include Trent Nelson's PyParallels, STM, and other Python implementations.. Proposal ======= In some personal correspondence Nick Coghlan, he summarized my preferred approach as "the data storage separation of multiprocessing, with the low message passing overhead of threading". For Python 3.6: * expose subinterpreters to Python in a new stdlib module: "subinterpreters" * add a new SubinterpreterExecutor to concurrent.futures * add a queue.Queue-like type that will be used to explicitly share objects between subinterpreters This is less simple than it might sound, but presents what I consider the best option for getting a meaningful improvement into Python 3.6. Also, I'm not convinced that the word "subinterpreter" properly conveys the intent, for which subinterpreters is only part of the picture. So I'm open to a better name. Influences ======== Note that I'm drawing quite a bit of inspiration from elsewhere. The idea of using subinterpreters to get this (more) efficient isolated execution is not my own (I heard it from Nick). I have also spent quite a bit of time and effort researching for this proposal. As part of that, a number of people have provided invaluable insight and encouragement as I've prepared, including Guido, Nick, Brett Cannon, Barry Warsaw, and Larry Hastings. Additionally, Hoare's "Communicating Sequential Processes" (CSP) has been a big influence on this proposal. FYI, CSP is also the inspiration for Go's concurrency model (e.g. goroutines, channels, select). Dr. Sarah Mount, who has expertise in this area, has been kind enough to agree to collaborate and even co-author the PEP that I hope comes out of this proposal. My interest in this improvement has been building for several years. Recent events, including this year's language summit, have driven me to push for something concrete in Python 3.6. The subinterpreter Module ===================== The subinterpreters module would look something like this (a la threading/multiprocessing): settrace() setprofile() stack_size() active_count() enumerate() get_ident() current_subinterpreter() Subinterpreter(...) id is_alive() running() -> Task or None run(...) -> Task # wrapper around PyRun_*, auto-calls Task.start() destroy() Task(...) # analogous to a CSP process id exception() # other stuff? # for compatibility with threading.Thread: name ident is_alive() start() run() join() Channel(...) # shared by passing as an arg to the subinterpreter-running func # this API is a bit uncooked still... pop() push() poison() # maybe select() # maybe Note that Channel objects will necessarily be shared in common between subinterpreters (where bound). This sharing will happen when the one or more of the parameters to the function passed to Task() is a Channel. Thus the channel would be open to the (sub)interpreter calling Task() (or Subinterpreter.run()) and to the new subinterpreter. Also, other channels could be fed into such a shared channel, whereby those channels would then likewise be shared between the interpreters. I don't know yet if this module should include *all* the essential pieces to implement a complete CSP library. Given the inspiration that CSP is providing, it may make sense to support it fully. It would be interesting then if the implementation here allowed the (complete?) formalisms provided by CSP (thus, e.g. rigorous proofs of concurrent system models). I expect there will also be a _subinterpreters module with low-level implementation-specific details. Related Ideas and Details Under Consideration ==================================== Some of these are details that need to be sorted out. Some are secondary ideas that may be appropriate to address in this proposal or may need to be tabled. I have some others but these should be sufficient to demonstrate the range of points to consider. * further coalesce the (concurrency/parallelism) abstractions between threading, multiprocessing, asyncio, and this proposal * only allow one running Task at a time per subinterpreter * disallow threading within subinterpreters (with legacy support in C) + ignore/remove the GIL within subinterpreters (since they would be single-threaded) * use the GIL only in the main interpreter and for interaction between subinterpreters (and a "Local Interpreter Lock" for within a subinterpreter) * disallow forking within subinterpreters * only allow passing plain functions to Task() and Subinterpreter.run() (exclude closures, other callables) * object ownership model + read-only in all but 1 subinterpreter + RW in all subinterpreters + only allow 1 subinterpreter to have any refcounts to an object (except for channels) * only allow immutable objects to be shared between subinterpreters * for better immutability, move object ref counts into a separate table * freeze (new machinery or memcopy or something) objects to make them (at least temporarily) immutable * expose a more complete CSP implementation in the stdlib (or make the subinterpreters module more compliant) * treat the main interpreter differently than subinterpreters (or treat it exactly the same) * add subinterpreter support to asyncio (the interplay between them could be interesting) Key Dependencies ================ There are a few related tasks/projects that will likely need to be resolved before subinterpreters in CPython can be used in the proposed manner. The proposal could implemented either way, but it will help the multi-core effort if these are addressed first. * fixes to subinterpreter support (there are a couple individuals who should be able to provide the necessary insight) * PEP 432 (will simplify several key implementation details) * improvements to isolation between subinterpreters (file descriptors, env vars, others) Beyond those, the scale and technical scope of this project means that I am unlikely to be able to do all the work myself to land this in Python 3.6 (though I'd still give it my best shot). That will require the involvement of various experts. I expect that the project is divisible into multiple mostly independent pieces, so that will help. Python Implementations =================== They can correct me if I'm wrong, but from what I understand both Jython and IronPython already have subinterpreter support. I'll be soliciting feedback from the different Python implementors about subinterpreter support. C Extension Modules ================= Subinterpreters already isolate extension modules (and built-in modules, including sys). PEP 384 provides some help too. However, global state in C can easily leak data between subinterpreters, breaking the desired data isolation. This is something that will need to be addressed as part of the effort. From yselivanov.ml at gmail.com Sun Jun 21 00:04:47 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 20 Jun 2015 18:04:47 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <5585E37F.4060403@gmail.com> Eric, On 2015-06-20 5:42 PM, Eric Snow wrote: > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. > This is really great. Big +1 from me, and I'd be glad to help with the PEP/implementation. [...] > * only allow immutable objects to be shared between subinterpreters Even if this is the only thing we have -- an efficient way for sharing immutable objects (such as bytes, strings, ints, and, stretching the definition of immutable, FDs) that will allow us to do a lot. Yury From njs at pobox.com Sun Jun 21 00:08:40 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 20 Jun 2015 15:08:40 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Jun 20, 2015 2:42 PM, "Eric Snow" wrote: > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. This all sounds really cool if you can pull it off, and shared-nothing threads do seem like the least impossible model to pull off. But "least impossible" and "possible" are different :-). From your email I can't tell whether this plan is viable while preserving backcompat and memory safety. Suppose I have a queue between two subinterpreters, and on this queue I place a list of dicts of user-defined-in-python objects, each of which holds a reference to a user-defined-via-the-C-api object. What happens next? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sun Jun 21 00:54:04 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 20 Jun 2015 16:54:04 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: > > On Jun 20, 2015 2:42 PM, "Eric Snow" wrote: > > > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > > exposing them in Python, and adding a mechanism to safely share > > objects between them. > > This all sounds really cool if you can pull it off, and shared-nothing threads do seem like the least impossible model to pull off. Agreed. > But "least impossible" and "possible" are different :-). From your email I can't tell whether this plan is viable while preserving backcompat and memory safety. I agree that those issues must be clearly solved in the proposal before it can be approved. I'm confident the approach I'm pursuing will afford us the necessary guarantees. I'll address those specific points directly when I can sit down and organize my thoughts. > > Suppose I have a queue between two subinterpreters, and on this queue I place a list of dicts of user-defined-in-python objects, each of which holds a reference to a user-defined-via-the-C-api object. What happens next? You've hit upon exactly the trickiness involved and why I'm thinking the best approach initially is to only allow *strictly* immutable objects to pass between interpreters. Admittedly, my description of channels is very vague.:) There are a number of possibilities with them that I'm still exploring (CSP has particular opinions...), but immutability is a characteristic that may provide the simplest *initial* approach. Going that route shouldn't preclude adding some sort of support for mutable objects later. Keep in mind that by "immutability" I'm talking about *really* immutable, perhaps going so far as treating the full memory space associated with an object as frozen. For instance, we'd have to ensure that "immutable" Python objects like strings, ints, and tuples do not change (i.e. via the C API). The contents of involved tuples/containers would have to be likewise immutable. Even changing refcounts could be too much, hence the idea of moving refcounts out to a separate table. This level of immutability would be something new to Python. We'll see if it's necessary. If it isn't too much work it might be a good idea regardless of the multi-core proposal. Also note that Barry has a (rejected) PEP from a number of years ago about freezing objects... That idea is likely out of scope as relates to my proposal, but it certainly factors in the problem space. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Sun Jun 21 00:54:15 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 20 Jun 2015 15:54:15 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <5585E37F.4060403@gmail.com> References: <5585E37F.4060403@gmail.com> Message-ID: It's worthwhile to consider fork as an alternative. IMO we'd get a lot out of making forking safer, easier, and more efficient. (e.g. respectively: adding an atfork registration mechanism; separating out the bits of multiprocessing that use pickle from those that don't; moving the refcount to a separate page, or allowing it to be frozen prior to a fork.) It sounds to me like this approach would use more memory than either regular threaded code or forking, so its main advantages are being cross-platform and less bug-prone. Is that right? Note: I don't count the IPC cost of forking, because at least on linux, any way to efficiently share objects between independent interpreters in separate threads can also be ported to independent interpreters in forked subprocesses, and *should* be. See also: multiprocessing.Value/Array. This is probably a good opportunity for that unification you mentioned. :) On Sat, Jun 20, 2015 at 3:04 PM, Yury Selivanov wrote: > On 2015-06-20 5:42 PM, Eric Snow wrote: >> * only allow immutable objects to be shared between subinterpreters > > Even if this is the only thing we have -- an efficient way > for sharing immutable objects (such as bytes, strings, ints, > and, stretching the definition of immutable, FDs) that will > allow us to do a lot. +1, this has a lot of utility, and can be extended naturally to other types and circumstances. -- Devin From ericsnowcurrently at gmail.com Sun Jun 21 01:16:37 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 20 Jun 2015 17:16:37 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" wrote: > > It's worthwhile to consider fork as an alternative. IMO we'd get a > lot out of making forking safer, easier, and more efficient. (e.g. > respectively: adding an atfork registration mechanism; separating out > the bits of multiprocessing that use pickle from those that don't; > moving the refcount to a separate page, or allowing it to be frozen > prior to a fork.) So leverage a common base of code with the multiprocessing module? > > It sounds to me like this approach would use more memory than either > regular threaded code or forking, so its main advantages are being > cross-platform and less bug-prone. Is that right? I would expect subinterpreters to use less memory. Furthermore creating them would be significantly faster. Passing objects between them would be much more efficient. And, yes, cross-platform. > > > Note: I don't count the IPC cost of forking, because at least on > linux, any way to efficiently share objects between independent > interpreters in separate threads can also be ported to independent > interpreters in forked subprocesses, How so? Subinterpreters are in the same process. For this proposal each would be on its own thread. Sharing objects between them through channels would be more efficient than IPC. Perhaps I've missed something? > and *should* be. > > See also: multiprocessing.Value/Array. This is probably a good > opportunity for that unification you mentioned. :) I'll look. > > On Sat, Jun 20, 2015 at 3:04 PM, Yury Selivanov wrote: > > On 2015-06-20 5:42 PM, Eric Snow wrote: > >> * only allow immutable objects to be shared between subinterpreters > > > > Even if this is the only thing we have -- an efficient way > > for sharing immutable objects (such as bytes, strings, ints, > > and, stretching the definition of immutable, FDs) that will > > allow us to do a lot. > > +1, this has a lot of utility, and can be extended naturally to other > types and circumstances. Agreed. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sun Jun 21 02:41:54 2015 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Jun 2015 10:41:54 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow wrote: > * disallow forking within subinterpreters I love the idea as a whole (if only because the detractors can be told "Just use subinterpreters, then you get concurrency"), but this seems like a tricky restriction. That means no subprocess.Popen, no shelling out to other applications. And I don't know what of other restrictions might limit any given program. Will it feel like subinterpreters are "write your code according to these tight restrictions and it'll work", or will it be more of "most programs will run in parallel just fine, but there are a few things to be careful of"? ChrisA From ericsnowcurrently at gmail.com Sun Jun 21 02:58:18 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 20 Jun 2015 18:58:18 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sat, Jun 20, 2015 at 6:41 PM, Chris Angelico wrote: > On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow wrote: >> * disallow forking within subinterpreters > > I love the idea as a whole (if only because the detractors can be told > "Just use subinterpreters, then you get concurrency"), but this seems > like a tricky restriction. That means no subprocess.Popen, no shelling > out to other applications. And I don't know what of other restrictions > might limit any given program. This is just something I'm thinking about. To be honest, forking probably won't be a problem. Furthermore, if there were any restriction it would likely just be on forking Python (a la multiprocessing). However, I doubt there will be a need to pursue such a restriction. As I said, there are still a lot of open questions and subtle details to sort out. > Will it feel like subinterpreters are > "write your code according to these tight restrictions and it'll > work", or will it be more of "most programs will run in parallel just > fine, but there are a few things to be careful of"? I expect that will be somewhat the case no matter what. The less restrictions the better, though. :) It's a balancing act because I expect that with some initial restrictions we can land the feature sooner. Then we could look into how to relax the restrictions. I just want to be careful that we don't paint ourselves into a corner in that regard. -eric From ncoghlan at gmail.com Sun Jun 21 03:28:12 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 11:28:12 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 21 June 2015 at 10:41, Chris Angelico wrote: > On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow wrote: >> * disallow forking within subinterpreters > > I love the idea as a whole (if only because the detractors can be told > "Just use subinterpreters, then you get concurrency"), but this seems > like a tricky restriction. That means no subprocess.Popen, no shelling > out to other applications. And I don't know what of other restrictions > might limit any given program. Will it feel like subinterpreters are > "write your code according to these tight restrictions and it'll > work", or will it be more of "most programs will run in parallel just > fine, but there are a few things to be careful of"? To calibrate expectations appropriately, it's worth thinking about the concept of Python level subinterpreter support as being broadly comparable to the JavaScript concept of web worker threads. mod_wsgi's use of the existing CPython specific subinterpreter support when embedding CPython in Apache httpd means we already know subinterpreters largely "just work" in the absence of low level C shenanigans in extension modules, but we also know keeping subinterpreters clearly subordinate to the main interpreter simplifies a number of design and implementation aspects (just as having a main thread simplified various aspects of the threading implementation), and that there will likely be things the main interpreter can do that subinterpreters can't. A couple of possible examples: * as Eric noted, we don't know yet if we'll be able to safely let subinterpreters launch subprocesses (especially via fork) * there may be restrictions on some extension modules that limit them to "main interpreter only" (e.g. if the extension module itself isn't thread-safe, then it will need to remain fully protected by the GIL) The analogous example with web workers is the fact that they don't have any access to the window object, document object or parent object in the browser DOM. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rustompmody at gmail.com Sun Jun 21 05:04:44 2015 From: rustompmody at gmail.com (Rustom Mody) Date: Sat, 20 Jun 2015 20:04:44 -0700 (PDT) Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <6edc3367-0892-42c5-94f2-608db72fcb51@googlegroups.com> On Sunday, June 21, 2015 at 6:12:22 AM UTC+5:30, Chris Angelico wrote: > > On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow > wrote: > > * disallow forking within subinterpreters > > I love the idea as a whole (if only because the detractors can be told > "Just use subinterpreters, then you get concurrency"), but this seems > like a tricky restriction. That means no subprocess.Popen, no shelling > out to other applications. And I don't know what of other restrictions > might limit any given program. Will it feel like subinterpreters are > "write your code according to these tight restrictions and it'll > work", or will it be more of "most programs will run in parallel just > fine, but there are a few things to be careful of"? > > ChrisA > Its good to get our terminology right: Are we talking parallelism or concurrency? Some references on the distinction: Bob Harper: https://existentialtype.wordpress.com/2011/03/17/parallelism-is-not-concurrency/ Rob Pike: http://concur.rspace.googlecode.com/hg/talk/concur.html#landing-slide [Or if you prefer the more famous https://www.youtube.com/watch?v=cN_DpYBzKso ] -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Jun 21 05:38:14 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 21 Jun 2015 13:38:14 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <20150621033813.GJ20701@ando.pearwood.info> On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote: > * only allow passing plain functions to Task() and > Subinterpreter.run() (exclude closures, other callables) That doesn't sound very Pythonic to me. That's going to limit the usefulness of these subinterpreters. > * object ownership model > + read-only in all but 1 subinterpreter > + RW in all subinterpreters Isn't that a contradiction? If objects are read-only in all subinterpreters (except one), how can they be read/write in all subinterpreters? All this talk about subinterpreters reminds me of an interesting blog post by Armin Ronacher: http://lucumr.pocoo.org/2014/8/16/the-python-i-would-like-to-see He's quite critical of a number of internal details of the CPython interpreter. But what I take from his post is that there could be significant advantages to giving the CPython interpreter its own local environment, like Lua and Javascript typically do, rather than the current model where there is a single process-wide global environment. Instead of having multiple subinterpreters all running inside the main interpreter, you could have multiple interpreters running in the same process, each with their own environment. I may be completely misinterpreting things here, but as I understand it, this would remove the need for the GIL, allowing even plain old threads to take advantage of multiple cores. But that's a separate issue. Armin writes: I would like to see an internal interpreter design could be based on interpreters that work independent of each other, with local base types and more, similar to how JavaScript works. This would immediately open up the door again for embedding and concurrency based on message passing. CPUs won't get any faster :) (He also talks about CPython's tp_slots system, but that's a separate issue, I think.) Now I have no idea if Armin is correct, or whether I am even interpreting his post correctly. But I'd like to hear people's thoughts on how this might interact with Eric's suggestion. -- Steve From ericsnowcurrently at gmail.com Sun Jun 21 07:01:20 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 20 Jun 2015 23:01:20 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150621033813.GJ20701@ando.pearwood.info> References: <20150621033813.GJ20701@ando.pearwood.info> Message-ID: On Jun 20, 2015 9:38 PM, "Steven D'Aprano" wrote: > > On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote: > > > * only allow passing plain functions to Task() and > > Subinterpreter.run() (exclude closures, other callables) > > That doesn't sound very Pythonic to me. That's going to limit the > usefulness of these subinterpreters. It certainly would limit their usefulness. It's a tradeoff to make the project tractable. I'm certainly not opposed to dropping such restrictions, now or as a follow-up project. Also keep in mind that the restriction is only something I'm considering. It's too early to settle on many of these details. > > > > * object ownership model > > + read-only in all but 1 subinterpreter > > + RW in all subinterpreters > > Isn't that a contradiction? If objects are read-only in all > subinterpreters (except one), how can they be read/write in all > subinterpreters? True. The two statements, like the rest in the section, are summarizing different details and ideas into which I've been looking. Several of them are mutually exclusive. > > > All this talk about subinterpreters reminds me of an interesting blog > post by Armin Ronacher: > > http:// lucumr.pocoo.org /2014/8/16/the-python-i-would-like-to-see > Interesting. I'd read that before, but not recently. Armin has some interesting points but I can't say that I agree with his analysis or his conclusions. Regardless... > He's quite critical of a number of internal details of the CPython > interpreter. But what I take from his post is that there could be > significant advantages to giving the CPython interpreter its own local > environment, like Lua and Javascript typically do, rather than the > current model where there is a single process-wide global environment. > Instead of having multiple subinterpreters all running inside the main > interpreter, you could have multiple interpreters running in the same > process, each with their own environment. But that's effectively the goal! This proposal will not work if the interpreters are not isolated. I'm not clear on what Armin thinks is shared between interpreters. The only consequential shared piece is the GIL and my proposal should render the GIL irrelevant for the most part. > > I may be completely misinterpreting things here, but as I understand it, > this would remove the need for the GIL, allowing even plain old threads > to take advantage of multiple cores. But that's a separate issue. If we restrict each subinterpreter to a single thread and are careful with how objects are shared (and sort out exrension modules) then there will be no need for the GIL *within* each subinterpreter. However there are a couple of things that will keep the GIL around for now. > > Armin writes: > > I would like to see an internal interpreter design could be based on > interpreters that work independent of each other, with local base > types and more, similar to how JavaScript works. This would > immediately open up the door again for embedding and concurrency > based on message passing. CPUs won't get any faster :) That's almost exactly what I'm aiming for. :) -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Jun 21 07:25:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 20 Jun 2015 22:25:07 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Jun 20, 2015 3:54 PM, "Eric Snow" wrote: > > > On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: > > > > On Jun 20, 2015 2:42 PM, "Eric Snow" wrote: > > > > > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > > > exposing them in Python, and adding a mechanism to safely share > > > objects between them. > > > > This all sounds really cool if you can pull it off, and shared-nothing threads do seem like the least impossible model to pull off. > > Agreed. > > > But "least impossible" and "possible" are different :-). From your email I can't tell whether this plan is viable while preserving backcompat and memory safety. > > I agree that those issues must be clearly solved in the proposal before it can be approved. I'm confident the approach I'm pursuing will afford us the necessary guarantees. I'll address those specific points directly when I can sit down and organize my thoughts. I'd love to see just a hand wavy, verbal proof-of-concept walking through how this might work in some simple but realistic case. To me a single compelling example could make this proposal feel much more concrete and achievable. > > Suppose I have a queue between two subinterpreters, and on this queue I place a list of dicts of user-defined-in-python objects, each of which holds a reference to a user-defined-via-the-C-api object. What happens next? > > You've hit upon exactly the trickiness involved and why I'm thinking the best approach initially is to only allow *strictly* immutable objects to pass between interpreters. Admittedly, my description of channels is very vague.:) There are a number of possibilities with them that I'm still exploring (CSP has particular opinions...), but immutability is a characteristic that may provide the simplest *initial* approach. Going that route shouldn't preclude adding some sort of support for mutable objects later. There aren't really many options for mutable objects, right? If you want shared nothing semantics, then transmitting a mutable object either needs to make a copy, or else be a real transfer, where the sender no longer has it (cf. Rust). I guess for the latter you'd need some new syntax for send-and-del, that requires the object to be self contained (all mutable objects reachable from it are only referenced by each other) and have only one reference in the sending process (which is the one being sent and then destroyed). > Keep in mind that by "immutability" I'm talking about *really* immutable, perhaps going so far as treating the full memory space associated with an object as frozen. For instance, we'd have to ensure that "immutable" Python objects like strings, ints, and tuples do not change (i.e. via the C API). This seems like a red herring to me. It's already the case that you can't legally use the c api to mutate tuples, ints, for any object that's ever been, say, passed to a function. So for these objects, the subinterpreter setup doesn't actually add any new constraints on user code. C code is always going to be *able* to break memory safety so long as you're using shared-memory threading at the c level to implement this stuff. We just need to make it easy not to. Refcnts and garbage collection are another matter, of course. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jun 21 08:31:33 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 16:31:33 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 21 June 2015 at 15:25, Nathaniel Smith wrote: > On Jun 20, 2015 3:54 PM, "Eric Snow" wrote: >> >> >> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: >> > >> > On Jun 20, 2015 2:42 PM, "Eric Snow" >> > wrote: >> > > >> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, >> > > exposing them in Python, and adding a mechanism to safely share >> > > objects between them. >> > >> > This all sounds really cool if you can pull it off, and shared-nothing >> > threads do seem like the least impossible model to pull off. >> >> Agreed. >> >> > But "least impossible" and "possible" are different :-). From your email >> > I can't tell whether this plan is viable while preserving backcompat and >> > memory safety. >> >> I agree that those issues must be clearly solved in the proposal before it >> can be approved. I'm confident the approach I'm pursuing will afford us the >> necessary guarantees. I'll address those specific points directly when I >> can sit down and organize my thoughts. > > I'd love to see just a hand wavy, verbal proof-of-concept walking through > how this might work in some simple but realistic case. To me a single > compelling example could make this proposal feel much more concrete and > achievable. I was one of the folks pushing Eric in this direction, and that's because it's a possibility that was conceived of a few years back, but never tried due to lack of time (and inclination for those of us that are using Python primarily as an orchestration tool and hence spend most of our time on IO bound problems rather than CPU bound ones): http://www.curiousefficiency.org/posts/2012/07/volunteer-supported-free-threaded-cross.html As mentioned there, I've at least spent some time with Graham Dumpleton over the past few years figuring out (and occasionally trying to address) some of the limitations of mod_wsgi's existing subinterpreter based WSGI app separation: https://code.google.com/p/modwsgi/wiki/ProcessesAndThreading#Python_Sub_Interpreters The fact that mod_wsgi can run most Python web applications in a subinterpreter quite happily means we already know the core mechanism works fine, and there don't appear to be any insurmountable technical hurdles between the status quo and getting to a point where we can either switch the GIL to a read/write lock where a write lock is only needed for inter-interpreter communications, or else find a way for subinterpreters to release the GIL entirely by restricting them appropriately. For inter-interpreter communication, the worst case scenario is having to rely on a memcpy based message passing system (which would still be faster than multiprocessing's serialisation + IPC overhead), but there don't appear to be any insurmountable barriers to setting up an object ownership based system instead (code that accesses PyObject_HEAD fields directly rather than through the relevant macros and functions seems to be the most likely culprit for breaking, but I think "don't do that" is a reasonable answer there). There's plenty of prior art here (including a system I once wrote in C myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with Eric's "simple matter of engineering" characterisation of the problem space. The main reason that subinterpreters have never had a Python API before is that they have enough rough edges that having to write a custom C extension module to access the API is the least of your problems if you decide you need them. At the same time, not having a Python API not only makes them much harder to test, which means various aspects of their operation are more likely to be broken, but also makes them inherently CPython specific. Eric's proposal essentially amounts to three things: 1. Filing off enough of the rough edges of the subinterpreter support that we're comfortable giving them a public Python level API that other interpreter implementations can reasonably support 2. Providing the primitives needed for safe and efficient message passing between subinterpreters 3. Allowing subinterpreters to truly execute in parallel on multicore machines All 3 of those are useful enhancements in their own right, which offers the prospect of being able to make incremental progress towards the ultimate goal of native Python level support for distributing across multiple cores within a single process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wes.turner at gmail.com Sun Jun 21 08:41:21 2015 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 21 Jun 2015 01:41:21 -0500 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: Exciting! * http://zero-buffer.readthedocs.org/en/latest/api-reference/#zero_buffer.BufferView * https://www.google.com/search?q=python+channels * https://docs.python.org/2/library/asyncore.html#module-asyncore * https://chan.readthedocs.org/en/latest/ * https://goless.readthedocs.org/en/latest/ * other approaches to the problem (with great APIs): * http://celery.readthedocs.org/en/latest/userguide/canvas.html#chords * http://discodb.readthedocs.org/en/latest/ On Jun 20, 2015 5:55 PM, "Eric Snow" wrote: > > On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: > > > > On Jun 20, 2015 2:42 PM, "Eric Snow" > wrote: > > > > > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > > > exposing them in Python, and adding a mechanism to safely share > > > objects between them. > > > > This all sounds really cool if you can pull it off, and shared-nothing > threads do seem like the least impossible model to pull off. > > Agreed. > > > But "least impossible" and "possible" are different :-). From your email > I can't tell whether this plan is viable while preserving backcompat and > memory safety. > > I agree that those issues must be clearly solved in the proposal before it > can be approved. I'm confident the approach I'm pursuing will afford us > the necessary guarantees. I'll address those specific points directly when > I can sit down and organize my thoughts. > > > > > Suppose I have a queue between two subinterpreters, and on this queue I > place a list of dicts of user-defined-in-python objects, each of which > holds a reference to a user-defined-via-the-C-api object. What happens next? > > You've hit upon exactly the trickiness involved and why I'm thinking the > best approach initially is to only allow *strictly* immutable objects to > pass between interpreters. Admittedly, my description of channels is very > vague.:) There are a number of possibilities with them that I'm still > exploring (CSP has particular opinions...), but immutability is a > characteristic that may provide the simplest *initial* approach. Going > that route shouldn't preclude adding some sort of support for mutable > objects later. > > Keep in mind that by "immutability" I'm talking about *really* immutable, > perhaps going so far as treating the full memory space associated with an > object as frozen. For instance, we'd have to ensure that "immutable" > Python objects like strings, ints, and tuples do not change (i.e. via the C > API). The contents of involved tuples/containers would have to be likewise > immutable. Even changing refcounts could be too much, hence the idea of > moving refcounts out to a separate table. > > This level of immutability would be something new to Python. We'll see if > it's necessary. If it isn't too much work it might be a good idea > regardless of the multi-core proposal. > > Also note that Barry has a (rejected) PEP from a number of years ago about > freezing objects... That idea is likely out of scope as relates to my > proposal, but it certainly factors in the problem space. > > -eric > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jun 21 11:11:49 2015 From: guido at python.org (Guido van Rossum) Date: Sun, 21 Jun 2015 11:11:49 +0200 Subject: [Python-ideas] Keyword-only arguments? In-Reply-To: References: Message-ID: My approach to this particular case has always been, if I need to port keyword-only args to older Python versions, I'll just remove the '*' and make it a documented convention that those arguments must be passed by keyword only, without enforcement. The code obfuscation is just not worth it -- it's just rarely super-important to strictly enforce this convention. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jun 21 11:48:46 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Jun 2015 11:48:46 +0200 Subject: [Python-ideas] solving multi-core Python References: Message-ID: <20150621114846.06bc8dc8@fsol> On Sun, 21 Jun 2015 16:31:33 +1000 Nick Coghlan wrote: > > For inter-interpreter communication, the worst case scenario is having > to rely on a memcpy based message passing system (which would still be > faster than multiprocessing's serialisation + IPC overhead) And memcpy() updates pointer references to dependent objects magically? Surely you meant the memdeepcopy() function that's part of every standard C library! Regards Antoine. From solipsis at pitrou.net Sun Jun 21 11:54:43 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Jun 2015 11:54:43 +0200 Subject: [Python-ideas] solving multi-core Python References: <20150621033813.GJ20701@ando.pearwood.info> Message-ID: <20150621115443.70ddcf28@fsol> On Sat, 20 Jun 2015 23:01:20 -0600 Eric Snow wrote: > The only consequential shared piece is the > GIL and my proposal should render the GIL irrelevant for the most part. All singleton objects, built-in types are shared and probably a number of other things hidden in dark closets... Not to mention the memory allocator. By the way, what you're aiming to do is conceptually quite similar to Trent's PyParallel (thought Trent doesn't use subinterpreters, his main work is around trying to making object sharing safe without any GIL to trivially protect the sharing), so you may want to pair with him. Of course, you may end up with a Windows-only Python interpreter :-) I'm under the impression you're underestimating the task at hand here. Or perhaps you're not and you're just willing to present it in a positive way :-) Regards Antoine. From ncoghlan at gmail.com Sun Jun 21 12:25:47 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 20:25:47 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150621114846.06bc8dc8@fsol> References: <20150621114846.06bc8dc8@fsol> Message-ID: On 21 June 2015 at 19:48, Antoine Pitrou wrote: > On Sun, 21 Jun 2015 16:31:33 +1000 > Nick Coghlan wrote: >> >> For inter-interpreter communication, the worst case scenario is having >> to rely on a memcpy based message passing system (which would still be >> faster than multiprocessing's serialisation + IPC overhead) > > And memcpy() updates pointer references to dependent objects magically? > Surely you meant the memdeepcopy() function that's part of every > standard C library! We already have the tools to do deep copies of object trees (although I'll concede I *was* actually thinking in terms of the classic C/C++ mistake of carelessly copying pointers around when I wrote that particular message). One of the options for deep copies tends to be a pickle/unpickle round trip, which will still incur the serialisation overhead, but not the IPC overhead. "Faster message passing than multiprocessing" sets the baseline pretty low, after all. However, this is also why Eric mentions the notions of object ownership or limiting channels to less than the full complement of Python objects. As an *added* feature at the Python level, it's possible to initially enforce restrictions that don't exist in the C level subinterpeter API, and then work to relax those restrictions over time. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sun Jun 21 12:40:43 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 21 Jun 2015 12:40:43 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> Message-ID: Nick Coghlan schrieb am 21.06.2015 um 12:25: > On 21 June 2015 at 19:48, Antoine Pitrou wrote: >> On Sun, 21 Jun 2015 16:31:33 +1000 Nick Coghlan wrote: >>> >>> For inter-interpreter communication, the worst case scenario is having >>> to rely on a memcpy based message passing system (which would still be >>> faster than multiprocessing's serialisation + IPC overhead) >> >> And memcpy() updates pointer references to dependent objects magically? >> Surely you meant the memdeepcopy() function that's part of every >> standard C library! > > We already have the tools to do deep copies of object trees (although > I'll concede I *was* actually thinking in terms of the classic C/C++ > mistake of carelessly copying pointers around when I wrote that > particular message). One of the options for deep copies tends to be a > pickle/unpickle round trip, which will still incur the serialisation > overhead, but not the IPC overhead. > > "Faster message passing than multiprocessing" sets the baseline pretty > low, after all. > > However, this is also why Eric mentions the notions of object > ownership or limiting channels to less than the full complement of > Python objects. As an *added* feature at the Python level, it's > possible to initially enforce restrictions that don't exist in the C > level subinterpeter API, and then work to relax those restrictions > over time. If objects can make it explicit that they support sharing (and preferably are allowed to implement the exact details themselves), I'm sure we'll find ways to share NumPy arrays across subinterpreters. That feature alone tends to be a quick way to make a lot of people happy. Stefan From solipsis at pitrou.net Sun Jun 21 12:41:05 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Jun 2015 12:41:05 +0200 Subject: [Python-ideas] solving multi-core Python References: <20150621114846.06bc8dc8@fsol> Message-ID: <20150621124105.52c194c9@fsol> On Sun, 21 Jun 2015 20:25:47 +1000 Nick Coghlan wrote: > On 21 June 2015 at 19:48, Antoine Pitrou wrote: > > On Sun, 21 Jun 2015 16:31:33 +1000 > > Nick Coghlan wrote: > >> > >> For inter-interpreter communication, the worst case scenario is having > >> to rely on a memcpy based message passing system (which would still be > >> faster than multiprocessing's serialisation + IPC overhead) > > > > And memcpy() updates pointer references to dependent objects magically? > > Surely you meant the memdeepcopy() function that's part of every > > standard C library! > > We already have the tools to do deep copies of object trees [...] > "Faster message passing than multiprocessing" sets the baseline pretty > low, after all. What's the goal? 10% faster? Or 10x? copy.deepcopy() uses similar internal mechanisms as pickle... Regards Antoine. From stefan_ml at behnel.de Sun Jun 21 12:54:52 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 21 Jun 2015 12:54:52 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: Eric Snow schrieb am 20.06.2015 um 23:42: > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. > [...] > In some personal correspondence Nick Coghlan, he summarized my > preferred approach as "the data storage separation of multiprocessing, > with the low message passing overhead of threading". > > For Python 3.6: > > * expose subinterpreters to Python in a new stdlib module: "subinterpreters" > * add a new SubinterpreterExecutor to concurrent.futures > * add a queue.Queue-like type that will be used to explicitly share > objects between subinterpreters > [...] > C Extension Modules > ================= > > Subinterpreters already isolate extension modules (and built-in > modules, including sys). PEP 384 provides some help too. However, > global state in C can easily leak data between subinterpreters, > breaking the desired data isolation. This is something that will need > to be addressed as part of the effort. I also had some discussions about these things with Nick before. Not sure if you really meant PEP 384 (you might have) or rather PEP 489: https://www.python.org/dev/peps/pep-0489/ I consider that one more important here, as it will eventually allow Cython modules to support subinterpreters. Unless, as you mentioned, they use global C state, but only in external C code, e.g. wrapped libraries. Cython should be able to handle most of the module internal global state on a per-interpreter basis itself, without too much user code impact. I'm totally +1 for the idea. I hope that I'll find the time (well, and money) to work on PEP 489 in Cython soon, so that I can prove it right for actual real-world code in Python 3.5. We'll then see about subinterpreter support. That's certainly the next step. Stefan From ncoghlan at gmail.com Sun Jun 21 12:57:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 20:57:42 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150621124105.52c194c9@fsol> References: <20150621114846.06bc8dc8@fsol> <20150621124105.52c194c9@fsol> Message-ID: On 21 June 2015 at 20:41, Antoine Pitrou wrote: > On Sun, 21 Jun 2015 20:25:47 +1000 > Nick Coghlan wrote: >> On 21 June 2015 at 19:48, Antoine Pitrou wrote: >> > On Sun, 21 Jun 2015 16:31:33 +1000 >> > Nick Coghlan wrote: >> >> >> >> For inter-interpreter communication, the worst case scenario is having >> >> to rely on a memcpy based message passing system (which would still be >> >> faster than multiprocessing's serialisation + IPC overhead) >> > >> > And memcpy() updates pointer references to dependent objects magically? >> > Surely you meant the memdeepcopy() function that's part of every >> > standard C library! >> >> We already have the tools to do deep copies of object trees > [...] >> "Faster message passing than multiprocessing" sets the baseline pretty >> low, after all. > > What's the goal? 10% faster? Or 10x? copy.deepcopy() uses similar > internal mechanisms as pickle... I'd want us to eventually aim for zero-copy speed for at least known immutable values (int, str, float, etc), immutable containers of immutable values (tuple, frozenset), and for types that support both publishing and consuming data via the PEP 3118 buffer protocol without making a copy. For everything else I'd be fine with a starting point that was at least no slower than multiprocessing (which shouldn't be difficult, since we'll at least save the IPC overhead even if there are cases where communication between subinterpreters falls back to serialisation rather than doing something more CPU and memory efficient). As an implementation strategy, I'd actually suggest starting with *only* the latter for simplicity's sake, even though it misses out on some of the potential speed benefits of sharing an address space. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jean-charles.douet at laposte.net Sun Jun 21 13:03:37 2015 From: jean-charles.douet at laposte.net (jean-charles.douet at laposte.net) Date: Sun, 21 Jun 2015 13:03:37 +0200 (CEST) Subject: [Python-ideas] Python-ideas Digest, Vol 103, Issue 100 In-Reply-To: References: Message-ID: <471341051.777863.1434884617911.JavaMail.zimbra@laposte.net> Hello, My dix cents may not seem to be neither very clear, nor too argumented, but I wanted to let you know that there seem to be already interesting pythonic studies about how to release the so-called GIL and about, all, for which pragmatic use cases. Reading the following article, I discovered that Python by itself seems not to be a language, but language specification : http://www.toptal.com/python/why-are-there-so-many-pythons Thus, it justifies that PyPy is a more general approach than CPython, which looks like a particular case of Python, even if the most frequently used (?) Now, there is a specific study of PyPy aimed at removing the GIL, called "Software Transactional Memory". Here it is : http://doc.pypy.org/en/latest/stm.html Hope it helps : Best regards, Jean-Charles. ----- Mail original ----- De: python-ideas-request at python.org ?: python-ideas at python.org Envoy?: Dimanche 21 Juin 2015 08:41:24 Objet: Python-ideas Digest, Vol 103, Issue 100 Send Python-ideas mailing list submissions to python-ideas at python.org To subscribe or unsubscribe via the World Wide Web, visit https://mail.python.org/mailman/listinfo/python-ideas or, via email, send a message with subject or body 'help' to python-ideas-request at python.org You can reach the person managing the list at python-ideas-owner at python.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Python-ideas digest..." Today's Topics: 1. Re: solving multi-core Python (Nathaniel Smith) 2. Re: solving multi-core Python (Nick Coghlan) 3. Re: solving multi-core Python (Wes Turner) ---------------------------------------------------------------------- Message: 1 Date: Sat, 20 Jun 2015 22:25:07 -0700 From: Nathaniel Smith To: Eric Snow Cc: python-ideas Subject: Re: [Python-ideas] solving multi-core Python Message-ID: Content-Type: text/plain; charset="utf-8" On Jun 20, 2015 3:54 PM, "Eric Snow" wrote: > > > On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: > > > > On Jun 20, 2015 2:42 PM, "Eric Snow" wrote: > > > > > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > > > exposing them in Python, and adding a mechanism to safely share > > > objects between them. > > > > This all sounds really cool if you can pull it off, and shared-nothing threads do seem like the least impossible model to pull off. > > Agreed. > > > But "least impossible" and "possible" are different :-). From your email I can't tell whether this plan is viable while preserving backcompat and memory safety. > > I agree that those issues must be clearly solved in the proposal before it can be approved. I'm confident the approach I'm pursuing will afford us the necessary guarantees. I'll address those specific points directly when I can sit down and organize my thoughts. I'd love to see just a hand wavy, verbal proof-of-concept walking through how this might work in some simple but realistic case. To me a single compelling example could make this proposal feel much more concrete and achievable. > > Suppose I have a queue between two subinterpreters, and on this queue I place a list of dicts of user-defined-in-python objects, each of which holds a reference to a user-defined-via-the-C-api object. What happens next? > > You've hit upon exactly the trickiness involved and why I'm thinking the best approach initially is to only allow *strictly* immutable objects to pass between interpreters. Admittedly, my description of channels is very vague.:) There are a number of possibilities with them that I'm still exploring (CSP has particular opinions...), but immutability is a characteristic that may provide the simplest *initial* approach. Going that route shouldn't preclude adding some sort of support for mutable objects later. There aren't really many options for mutable objects, right? If you want shared nothing semantics, then transmitting a mutable object either needs to make a copy, or else be a real transfer, where the sender no longer has it (cf. Rust). I guess for the latter you'd need some new syntax for send-and-del, that requires the object to be self contained (all mutable objects reachable from it are only referenced by each other) and have only one reference in the sending process (which is the one being sent and then destroyed). > Keep in mind that by "immutability" I'm talking about *really* immutable, perhaps going so far as treating the full memory space associated with an object as frozen. For instance, we'd have to ensure that "immutable" Python objects like strings, ints, and tuples do not change (i.e. via the C API). This seems like a red herring to me. It's already the case that you can't legally use the c api to mutate tuples, ints, for any object that's ever been, say, passed to a function. So for these objects, the subinterpreter setup doesn't actually add any new constraints on user code. C code is always going to be *able* to break memory safety so long as you're using shared-memory threading at the c level to implement this stuff. We just need to make it easy not to. Refcnts and garbage collection are another matter, of course. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Message: 2 Date: Sun, 21 Jun 2015 16:31:33 +1000 From: Nick Coghlan To: Nathaniel Smith Cc: Eric Snow , python-ideas Subject: Re: [Python-ideas] solving multi-core Python Message-ID: Content-Type: text/plain; charset=UTF-8 On 21 June 2015 at 15:25, Nathaniel Smith wrote: > On Jun 20, 2015 3:54 PM, "Eric Snow" wrote: >> >> >> On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: >> > >> > On Jun 20, 2015 2:42 PM, "Eric Snow" >> > wrote: >> > > >> > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, >> > > exposing them in Python, and adding a mechanism to safely share >> > > objects between them. >> > >> > This all sounds really cool if you can pull it off, and shared-nothing >> > threads do seem like the least impossible model to pull off. >> >> Agreed. >> >> > But "least impossible" and "possible" are different :-). From your email >> > I can't tell whether this plan is viable while preserving backcompat and >> > memory safety. >> >> I agree that those issues must be clearly solved in the proposal before it >> can be approved. I'm confident the approach I'm pursuing will afford us the >> necessary guarantees. I'll address those specific points directly when I >> can sit down and organize my thoughts. > > I'd love to see just a hand wavy, verbal proof-of-concept walking through > how this might work in some simple but realistic case. To me a single > compelling example could make this proposal feel much more concrete and > achievable. I was one of the folks pushing Eric in this direction, and that's because it's a possibility that was conceived of a few years back, but never tried due to lack of time (and inclination for those of us that are using Python primarily as an orchestration tool and hence spend most of our time on IO bound problems rather than CPU bound ones): http://www.curiousefficiency.org/posts/2012/07/volunteer-supported-free-threaded-cross.html As mentioned there, I've at least spent some time with Graham Dumpleton over the past few years figuring out (and occasionally trying to address) some of the limitations of mod_wsgi's existing subinterpreter based WSGI app separation: https://code.google.com/p/modwsgi/wiki/ProcessesAndThreading#Python_Sub_Interpreters The fact that mod_wsgi can run most Python web applications in a subinterpreter quite happily means we already know the core mechanism works fine, and there don't appear to be any insurmountable technical hurdles between the status quo and getting to a point where we can either switch the GIL to a read/write lock where a write lock is only needed for inter-interpreter communications, or else find a way for subinterpreters to release the GIL entirely by restricting them appropriately. For inter-interpreter communication, the worst case scenario is having to rely on a memcpy based message passing system (which would still be faster than multiprocessing's serialisation + IPC overhead), but there don't appear to be any insurmountable barriers to setting up an object ownership based system instead (code that accesses PyObject_HEAD fields directly rather than through the relevant macros and functions seems to be the most likely culprit for breaking, but I think "don't do that" is a reasonable answer there). There's plenty of prior art here (including a system I once wrote in C myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with Eric's "simple matter of engineering" characterisation of the problem space. The main reason that subinterpreters have never had a Python API before is that they have enough rough edges that having to write a custom C extension module to access the API is the least of your problems if you decide you need them. At the same time, not having a Python API not only makes them much harder to test, which means various aspects of their operation are more likely to be broken, but also makes them inherently CPython specific. Eric's proposal essentially amounts to three things: 1. Filing off enough of the rough edges of the subinterpreter support that we're comfortable giving them a public Python level API that other interpreter implementations can reasonably support 2. Providing the primitives needed for safe and efficient message passing between subinterpreters 3. Allowing subinterpreters to truly execute in parallel on multicore machines All 3 of those are useful enhancements in their own right, which offers the prospect of being able to make incremental progress towards the ultimate goal of native Python level support for distributing across multiple cores within a single process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia ------------------------------ Message: 3 Date: Sun, 21 Jun 2015 01:41:21 -0500 From: Wes Turner To: Eric Snow Cc: Nathaniel Smith , Python-Ideas Subject: Re: [Python-ideas] solving multi-core Python Message-ID: Content-Type: text/plain; charset="utf-8" Exciting! * http://zero-buffer.readthedocs.org/en/latest/api-reference/#zero_buffer.BufferView * https://www.google.com/search?q=python+channels * https://docs.python.org/2/library/asyncore.html#module-asyncore * https://chan.readthedocs.org/en/latest/ * https://goless.readthedocs.org/en/latest/ * other approaches to the problem (with great APIs): * http://celery.readthedocs.org/en/latest/userguide/canvas.html#chords * http://discodb.readthedocs.org/en/latest/ On Jun 20, 2015 5:55 PM, "Eric Snow" wrote: > > On Jun 20, 2015 4:08 PM, "Nathaniel Smith" wrote: > > > > On Jun 20, 2015 2:42 PM, "Eric Snow" > wrote: > > > > > > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > > > exposing them in Python, and adding a mechanism to safely share > > > objects between them. > > > > This all sounds really cool if you can pull it off, and shared-nothing > threads do seem like the least impossible model to pull off. > > Agreed. > > > But "least impossible" and "possible" are different :-). From your email > I can't tell whether this plan is viable while preserving backcompat and > memory safety. > > I agree that those issues must be clearly solved in the proposal before it > can be approved. I'm confident the approach I'm pursuing will afford us > the necessary guarantees. I'll address those specific points directly when > I can sit down and organize my thoughts. > > > > > Suppose I have a queue between two subinterpreters, and on this queue I > place a list of dicts of user-defined-in-python objects, each of which > holds a reference to a user-defined-via-the-C-api object. What happens next? > > You've hit upon exactly the trickiness involved and why I'm thinking the > best approach initially is to only allow *strictly* immutable objects to > pass between interpreters. Admittedly, my description of channels is very > vague.:) There are a number of possibilities with them that I'm still > exploring (CSP has particular opinions...), but immutability is a > characteristic that may provide the simplest *initial* approach. Going > that route shouldn't preclude adding some sort of support for mutable > objects later. > > Keep in mind that by "immutability" I'm talking about *really* immutable, > perhaps going so far as treating the full memory space associated with an > object as frozen. For instance, we'd have to ensure that "immutable" > Python objects like strings, ints, and tuples do not change (i.e. via the C > API). The contents of involved tuples/containers would have to be likewise > immutable. Even changing refcounts could be too much, hence the idea of > moving refcounts out to a separate table. > > This level of immutability would be something new to Python. We'll see if > it's necessary. If it isn't too much work it might be a good idea > regardless of the multi-core proposal. > > Also note that Barry has a (rejected) PEP from a number of years ago about > freezing objects... That idea is likely out of scope as relates to my > proposal, but it certainly factors in the problem space. > > -eric > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: ------------------------------ Subject: Digest Footer _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas ------------------------------ End of Python-ideas Digest, Vol 103, Issue 100 ********************************************** -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Jun 21 13:41:30 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 21 Jun 2015 13:41:30 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 20/06/15 23:42, Eric Snow wrote: > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. > > This proposal is meant to be a shot over the bow, so to speak. I plan > on putting together a more complete PEP some time in the future, with > content that is more refined along with references to the appropriate > online resources. > > Feedback appreciated! Offers to help even more so! :) From the perspective of software design, it would be good it the CPython interpreter provided an environment instead of using global objects. It would mean that all functions in the C API would need to take the environment pointer as their first variable, which will be a major rewrite. It would also allow the "one interpreter per thread" design similar to tcl and .NET application domains. However, from the perspective of multi-core parallel computing, I am not sure what this offers over using multiple processes. Yes, you avoid the process startup time, but on POSIX systems a fork is very fast. An certainly, forking is much more efficient than serializing Python objects. It then boils down to a workaround for the fact that Windows cannot fork, which makes it particularly bad for running CPython. You also have to start up a subinterpreter and a thread, which is not instantaneous. So I am not sure there is a lot to gain here over calling os.fork. A non-valid argument for this kind of design is that only code which uses threads for parallel computing is "real" multi-core code. So Python does not support multi-cores because multiprocessing or os.fork is just faking it. This is an argument that belongs in the intellectual junk yard. It stems from the abuse of threads among Windows and Java developers, and is rooted in the absence of fork on Windows and the formerly slow fork on Solaris. And thus they are only able to think in terms of threads. If threading.Thread does not scale the way they want, they think multicores are out of reach. So the question is, how do you want to share objects between subinterpreters? And why is it better than IPC, when your idea is to isolate subinterpreters like application domains? If you think avoiding IPC is clever, you are wrong. IPC is very fast, in fact programs written to use MPI tends to perform and scale better than programs written to use OpenMP in parallel computing. Not only is IPC fast, but you also avoid an issue called "false sharing", which can be even more detrimental than the GIL: You have parallel code, but it seems to run in serial, even though there is no explicit serialization anywhere. And by since Murphy's law is working against us, Python reference counts will be false shared unless we use multiple processes. The reason IPC in multiprocessing is slow is due to calling pickle, it is not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have the overhead of a memcpy in the kernel, which is equal to a memcpy plus some tiny constant overhead. And if you need two processes to share memory, there is something called shared memory. Thus, we can send data between processes just as fast as between subinterpreters. All in all, I think we are better off finding a better way to share Python objects between processes. P.S. Another thing to note is that with sub-interpreters, you can forget about using ctypes or anything else that uses the simplified GIL API (e.g. certain Cython generated extensions). Sturla From solipsis at pitrou.net Sun Jun 21 13:52:36 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Jun 2015 13:52:36 +0200 Subject: [Python-ideas] solving multi-core Python References: Message-ID: <20150621135236.6c31b605@fsol> On Sun, 21 Jun 2015 13:41:30 +0200 Sturla Molden wrote: > From the perspective of software design, it would be good it the > CPython interpreter provided an environment instead of using global > objects. It would mean that all functions in the C API would need to > take the environment pointer as their first variable, which will be a > major rewrite. It would also allow the "one interpreter per thread" > design similar to tcl and .NET application domains. From the point of view of API compatibility, it's unfortunately a no-no. > The reason IPC in multiprocessing is slow is due to calling pickle, it > is not the IPC in itself. No need to be pedantic :-) The "C" means communication, and pickling objects is part of the communication between Python processes. > All in all, I think we are better off finding a better way to share > Python objects between processes. Sure. This is however a complex and experimental topic (how to share a graph of garbage-collected objects between independant processes), with no guarantees of showing any results at the end. > P.S. Another thing to note is that with sub-interpreters, you can forget > about using ctypes or anything else that uses the simplified GIL API > (e.g. certain Cython generated extensions). Indeed, the PyGILState API is still not subinterpreter-compatible. There's a proposal on the tracker, IIRC, but the interested parties never made any progress on it. Regards Antoine. From jeanpierreda at gmail.com Sun Jun 21 13:55:54 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 21 Jun 2015 04:55:54 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow wrote: > > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" wrote: >> >> It's worthwhile to consider fork as an alternative. IMO we'd get a >> lot out of making forking safer, easier, and more efficient. (e.g. >> respectively: adding an atfork registration mechanism; separating out >> the bits of multiprocessing that use pickle from those that d, I still disagreeon't; >> moving the refcount to a separate page, or allowing it to be frozen >> prior to a fork.) > > So leverage a common base of code with the multiprocessing module? What is this question in response to? I don't understand. > I would expect subinterpreters to use less memory. Furthermore creating > them would be significantly faster. Passing objects between them would be > much more efficient. And, yes, cross-platform. Maybe I don't understand how subinterpreters work. AIUI, the whole point of independent subinterpreters is that they share no state. So if I have a web server, each independent serving thread has to do all of the initialization (import HTTP libraries, etc.), right? Compare with forking, where the initialization is all done and then you fork, and you are immediately ready to serve, using the data structures shared with all the other workers, which is only copied when it is written to. So forking starts up faster and uses less memory (due to shared memory.) Re passing objects, see below. I do agree it's cross-platform, but right now that's the only thing I agree with. >> Note: I don't count the IPC cost of forking, because at least on >> linux, any way to efficiently share objects between independent >> interpreters in separate threads can also be ported to independent >> interpreters in forked subprocesses, > > How so? Subinterpreters are in the same process. For this proposal each > would be on its own thread. Sharing objects between them through channels > would be more efficient than IPC. Perhaps I've missed something? You might be missing that memory can be shared between processes, not just threads, but I don't know. The reason passing objects between processes is so slow is currently *nearly entirely* the cost of serialization. That is, it's the fact that you are passing an object to an entirely separate interpreter, and need to serialize the whole object graph and so on. If you can make that fast without serialization, for shared memory threads, then all the serialization becomes unnecessary, and you can either write to a pipe (fast, if it's a non-container), or used shared memory from the beginning (instantaneous). This is possible on any POSIX OS. Linux lets you go even further. -- Devin From jeanpierreda at gmail.com Sun Jun 21 14:13:40 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 21 Jun 2015 05:13:40 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sat, Jun 20, 2015 at 11:31 PM, Nick Coghlan wrote: > For inter-interpreter communication, the worst case scenario is having > to rely on a memcpy based message passing system (which would still be > faster than multiprocessing's serialisation + IPC overhead), but there > don't appear to be any insurmountable barriers to setting up an object > ownership based system instead (code that accesses PyObject_HEAD > fields directly rather than through the relevant macros and functions > seems to be the most likely culprit for breaking, but I think "don't > do that" is a reasonable answer there). The comparison is unfair -- if you can share between subinterpreters using memcpy, then you can share between processes using just a socket write, and multiprocessing becomes nearly just as fast. > Eric's proposal essentially amounts to three things: > > 1. Filing off enough of the rough edges of the subinterpreter support > that we're comfortable giving them a public Python level API that > other interpreter implementations can reasonably support > 2. Providing the primitives needed for safe and efficient message > passing between subinterpreters > 3. Allowing subinterpreters to truly execute in parallel on multicore machines > > All 3 of those are useful enhancements in their own right, which > offers the prospect of being able to make incremental progress towards > the ultimate goal of native Python level support for distributing > across multiple cores within a single process. Why is that the goal? Whatever faults processes have, those are the problems, surely not processes in and of themselves, right? e.g. if the reason we don't like multiprocessed python is extra memory use, it's memory use we're opposed to. A solution that gives us parallel threads, but doesn't decrease memory consumption, doesn't solve anything. The solution has threads that are remarkably like processes, so I think it's really important to be careful about the differences and why this solution has the advantage. I'm not seeing that. And remember that we *do* have many examples of people using parallelized Python code in production. Are you sure you're satisfying their concerns, or whose concerns are you trying to satisfy? -- Devin From ncoghlan at gmail.com Sun Jun 21 14:55:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 22:55:42 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 21 June 2015 at 07:42, Eric Snow wrote: > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. > > This proposal is meant to be a shot over the bow, so to speak. I plan > on putting together a more complete PEP some time in the future, with > content that is more refined along with references to the appropriate > online resources. > > Feedback appreciated! Offers to help even more so! :) For folks interested in more of the background and design trade-offs involved here, with Eric's initial post published, I've now extracted and updated my old answer about the GIL from the Python 3 Q & A page, and turned it into its own article: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html Cheers, Nick. P.S. The entry for the old Q&A answer is still there, but now redirects to the new article: http://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html#but-but-surely-fixing-the-gil-is-more-important-than-fixing-unicode -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan_ml at behnel.de Sun Jun 21 15:06:57 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 21 Jun 2015 15:06:57 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: Nick Coghlan schrieb am 21.06.2015 um 03:28: > * there may be restrictions on some extension modules that limit them > to "main interpreter only" (e.g. if the extension module itself isn't > thread-safe, then it will need to remain fully protected by the GIL) Just an idea, but C extensions could opt-in to this. Calling into them has to go through some kind of callable type, usually PyCFunction. We could protect all calls to extension types and C functions with a global runtime lock (per process, not per interpreter) and Extensions could set a flag on their functions and methods (or get it inherited from their extension types etc.) that says "I don't need the lock". That allows for a very fine-grained transition. Stefan From ncoghlan at gmail.com Sun Jun 21 15:09:35 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Jun 2015 23:09:35 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 21 June 2015 at 21:41, Sturla Molden wrote: > On 20/06/15 23:42, Eric Snow wrote: >> >> tl;dr Let's exploit multiple cores by fixing up subinterpreters, >> exposing them in Python, and adding a mechanism to safely share >> objects between them. >> >> This proposal is meant to be a shot over the bow, so to speak. I plan >> on putting together a more complete PEP some time in the future, with >> content that is more refined along with references to the appropriate >> online resources. >> >> Feedback appreciated! Offers to help even more so! :) > > > > From the perspective of software design, it would be good it the CPython > interpreter provided an environment instead of using global objects. It > would mean that all functions in the C API would need to take the > environment pointer as their first variable, which will be a major rewrite. > It would also allow the "one interpreter per thread" design similar to tcl > and .NET application domains. > > However, from the perspective of multi-core parallel computing, I am not > sure what this offers over using multiple processes. > > Yes, you avoid the process startup time, but on POSIX systems a fork is very > fast. An certainly, forking is much more efficient than serializing Python > objects. It then boils down to a workaround for the fact that Windows cannot > fork, which makes it particularly bad for running CPython. You also have to > start up a subinterpreter and a thread, which is not instantaneous. So I am > not sure there is a lot to gain here over calling os.fork. Please give Eric and I the courtesy of assuming we know how CPython works. This article, which is an update of a Python 3 Q&A answer I wrote some time ago, goes into more detail on the background of this proposed investigation: http://python-notes.curiousefficiency.org/en/latest/python3/multicore_python.html > A non-valid argument for this kind of design is that only code which uses > threads for parallel computing is "real" multi-core code. So Python does not > support multi-cores because multiprocessing or os.fork is just faking it. > This is an argument that belongs in the intellectual junk yard. > It stems > from the abuse of threads among Windows and Java developers, and is rooted > in the absence of fork on Windows and the formerly slow fork on Solaris. And > thus they are only able to think in terms of threads. If threading.Thread > does not scale the way they want, they think multicores are out of reach. Sturla, expressing out and out contempt for entire communities of capable, competent developers (both the creators of Windows and Java, and the users of those platforms) has no place on the core Python mailing lists. Please refrain from casually insulting entire groups of people merely because you don't approve of their technical choices. > The reason IPC in multiprocessing is slow is due to calling pickle, it is > not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have > the overhead of a memcpy in the kernel, which is equal to a memcpy plus some > tiny constant overhead. And if you need two processes to share memory, there > is something called shared memory. Thus, we can send data between processes > just as fast as between subinterpreters. Avoiding object serialisation is indeed the main objective. With subinterpreters, we have a lot more options for that than we do with any form of IPC, including shared references to immutable objects, and the PEP 3118 buffer API. > All in all, I think we are better off finding a better way to share Python > objects between processes. This is not an either/or question, as other folks remain free to work on improving multiprocessing's IPC efficiency if they want to. We don't seem to have folks clamouring at the door to work on that, though. > P.S. Another thing to note is that with sub-interpreters, you can forget > about using ctypes or anything else that uses the simplified GIL API (e.g. > certain Cython generated extensions). Those aren't fundamental conceptual limitations, they're incidental limitations of the current design and implementation of the simplified GIL state API. One of the benefits of introducing a Python level API for subinterpreters is that it makes it easier to start testing, and hence fixing, some of those limitations (I actually just suggested to Eric off list that adding subinterpreter controls to _testcapi might be a good place to start, as that's beneficial regardless of what, if anything, ends up happening from a public API perspective) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sun Jun 21 15:21:12 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Jun 2015 15:21:12 +0200 Subject: [Python-ideas] solving multi-core Python References: <20150621135236.6c31b605@fsol> Message-ID: <20150621152112.16fedf02@fsol> On Sun, 21 Jun 2015 13:52:36 +0200 Antoine Pitrou wrote: > > > P.S. Another thing to note is that with sub-interpreters, you can forget > > about using ctypes or anything else that uses the simplified GIL API > > (e.g. certain Cython generated extensions). > > Indeed, the PyGILState API is still not subinterpreter-compatible. > There's a proposal on the tracker, IIRC, but the interested parties > never made any progress on it. For reference: https://bugs.python.org/issue10915 https://bugs.python.org/issue15751 Regards Antoine. From rosuav at gmail.com Sun Jun 21 16:12:24 2015 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 22 Jun 2015 00:12:24 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 9:41 PM, Sturla Molden wrote: > However, from the perspective of multi-core parallel computing, I am not > sure what this offers over using multiple processes. > > Yes, you avoid the process startup time, but on POSIX systems a fork is very > fast. An certainly, forking is much more efficient than serializing Python > objects. It then boils down to a workaround for the fact that Windows cannot > fork, which makes it particularly bad for running CPython. You also have to > start up a subinterpreter and a thread, which is not instantaneous. So I am > not sure there is a lot to gain here over calling os.fork. That's all very well for sending stuff *to* a subprocess. If you fork for a single job, do the job, and have the subprocess send the result directly back to the origin (eg over its socket), then terminate, then sure, you don't need a lot of IPC. But for models where there's ongoing work, maybe interacting with other subinterpreters periodically, there could be a lot of benefit. It's very easy to slip into a CGI style of mentality where requests are entirely fungible and independent, and all you're doing is parallelization, but not everything fits into that model :) I run a MUD server, for instance, where currently every connection gets its own thread; if I wanted to make use of multiple CPU cores, I would not want to have the connections handled by separate processes, because they are constantly interacting with each other, so IPC would get expensive. ChrisA From sturla.molden at gmail.com Sun Jun 21 17:14:54 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 21 Jun 2015 15:14:54 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: Message-ID: <2010780096456591925.856644sturla.molden-gmail.com@news.gmane.org> Devin Jeanpierre wrote: > The comparison is unfair -- if you can share between subinterpreters > using memcpy, then you can share between processes using just a socket > write, and multiprocessing becomes nearly just as fast. That is the main issue here. Writing to a pipe or a Unix socket is implemented with a memcpy in the kernel. So there is just a tiny constant overhead compared to using memcpy within a process. And with shared memory as IPC even this tiny overhead can be removed. The main overhead in communicating Python objects in multiprocessing is the serialization with pickle. So there is basically nothing to gain unless this part can be omitted. There is an errorneous belief among Windows programmers tht "IPC is slow". But that is because they are using out-proc DCOM server, CORBA, XMLRPC or something equally atrocious. A plain named pipe transaction is not in any way slow on Windows. Sturla From sturla.molden at gmail.com Sun Jun 21 17:45:05 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 21 Jun 2015 15:45:05 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: Message-ID: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> Nick Coghlan wrote: > Sturla, expressing out and out contempt for entire communities of > capable, competent developers (both the creators of Windows and Java, > and the users of those platforms) has no place on the core Python > mailing lists. Please refrain from casually insulting entire groups of > people merely because you don't approve of their technical choices. I am not sure what you mean. Using threads on Windows and Java comes from a necessity, not because developers are incompetent. Windows does not provide a fork and processes are heavy-weight, hence multi-threading is the obvious choice. > Avoiding object serialisation is indeed the main objective. Good. > With > subinterpreters, we have a lot more options for that than we do with > any form of IPC, including shared references to immutable objects, and > the PEP 3118 buffer API. Perhaps. One could do this with shared memory as well, but a complicating factor is that the base address must be the same (or corrected for). But one could probably do low-level magic with memory mapping to work around this. Particularly on 64-bit it is not really difficult to make sure a page is mapped to the same address in two processes. It is certainly easier to achieve within a process. But if the plan for Erlang-style "share nothing" threads is to pickle and memcpy objects, there is little or nothing to gain over using multiprocessing. Sturla From sturla.molden at gmail.com Sun Jun 21 18:13:01 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 21 Jun 2015 16:13:01 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: <20150621135236.6c31b605@fsol> Message-ID: <941749705456595776.090729sturla.molden-gmail.com@news.gmane.org> Antoine Pitrou wrote: >> The reason IPC in multiprocessing is slow is due to calling pickle, it >> is not the IPC in itself. > > No need to be pedantic :-) The "C" means communication, and pickling > objects is part of the communication between Python processes. Yes, currently it is. But is does not mean that it has to be. Clearly it is easier to avoid with multiple interpreters in the same process. But it does not mean it is unsolvable. Sturla From ron3200 at gmail.com Sun Jun 21 20:54:48 2015 From: ron3200 at gmail.com (Ron Adam) Date: Sun, 21 Jun 2015 14:54:48 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 06/20/2015 06:54 PM, Eric Snow wrote: > Also note that Barry has a (rejected) PEP from a number of years ago about > freezing objects... That idea is likely out of scope as relates to my > proposal, but it certainly factors in the problem space. How about instead of freezing, just modify a flag or counter if it's mutated. That could be turned off by default. Then have a way to turn on an ObjectMutated warning or exception if any objects is modified within a routine, code block. or function. With something like that, small parts of python can be tested and made less mutable in small sections at a time. Possibly working from the inside out. It doesn't force immutability but instead asks for it. A small but not quite so impossible step. (?) Cheers, Ron From abarnert at yahoo.com Sun Jun 21 23:08:09 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 21 Jun 2015 14:08:09 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? For example, after fork, you can no longer access any channels, and you also can't use signals, threads, fork again, imports, assignments to builtins, raising exceptions, or a whole host of other things (but of course if you exec an entirely new Python interpreter, it can do any of those things). C extension modules could just have a flag that marks whether the whole module is fork-safe or not (defaulting to not). So, this allows a subinterpreter to use subprocess (or even multiprocessing, as long as you use the forkserver or spawn mechanism), and it gives code that intentionally wants to do tricky/dangerous things a way to do them, but it avoids all of the problems with accidentally breaking a subinterpreter by forking it and then doing bad things. Second, a major question: In this proposal, are builtins and the modules map shared, or copied? If they're copied, it seems like it would be hard to do that even as efficiently as multiprocessing, much less more efficiently. Of course you could fake this with CoW, but I'm not sure how you'd do that, short of CoWing the entire heap (by using clone instead of pthreads on Linux, or by doing a bunch of explicit mmap and related calls on other POSIX systems), at which point you're pretty close to just implementing fork or vfork yourself to avoid calling fork or vfork, and unlikely to get it as efficient or as robust as what's already there. If they're shared, on the other hand, then it seems like it becomes very difficult to implement subinterpreter-safe code, because it's no longer safe to import a module, set a flag, call a registration function, etc. From abarnert at yahoo.com Sun Jun 21 23:24:19 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Sun, 21 Jun 2015 14:24:19 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <3572BE97-5A2B-4A0F-B593-2BE6605E40DA@yahoo.com> On Jun 21, 2015, at 06:09, Nick Coghlan wrote: > > Avoiding object serialisation is indeed the main objective. With > subinterpreters, we have a lot more options for that than we do with > any form of IPC, including shared references to immutable objects, and > the PEP 3118 buffer API. It seems like you could provide a way to efficiently copy and share deeper objects than integers and buffers without sharing everything, assuming the user code knows, at the time those objects are created, that they will be copied or shared. Basically, you allocate the objects into a separate arena (along with allocating their refcounts on a separate page, as already mentioned). You can't add a reference to an outside object in an arena-allocated object, although you can copy that outside object into the arena. And then you just pass or clone (possibly by using CoW memory-mapping calls, only falling back to memcpy on platforms that can't do that) entire arenas instead of individual objects (so you don't need the fictitious memdeepcpy function that someone ridiculed earlier in this thread, but you get 90% of the benefits of having one). This has the same basic advantages of forking, but it's doable efficiently on Windows, and doable less efficiently (but still better than spawn and pass) on even weird embedded platforms, and it forces code to be explicit about what gets shared and copied without forcing it to work through less-natural queue-like APIs. Also, it seems like you could fake this entire arena API on top of pickle/copy for a first implementation, then just replace the underlying implementation separately. From solipsis at pitrou.net Mon Jun 22 00:41:57 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Jun 2015 00:41:57 +0200 Subject: [Python-ideas] solving multi-core Python References: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> Message-ID: <20150622004157.52bc3239@fsol> On Sun, 21 Jun 2015 14:08:09 -0700 Andrew Barnert via Python-ideas wrote: > First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? It's actually already the case in POSIX that most things are illegal between fork() and exec(). However, to make fork() practical, many libraries or frameworks tend to ignore those problems deliberately. Regards Antoine. From ncoghlan at gmail.com Mon Jun 22 01:31:06 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 22 Jun 2015 09:31:06 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> References: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> Message-ID: On 22 Jun 2015 01:45, "Sturla Molden" wrote: > > Nick Coghlan wrote: > > > Sturla, expressing out and out contempt for entire communities of > > capable, competent developers (both the creators of Windows and Java, > > and the users of those platforms) has no place on the core Python > > mailing lists. Please refrain from casually insulting entire groups of > > people merely because you don't approve of their technical choices. > > I am not sure what you mean. Using threads on Windows and Java comes from a > necessity, not because developers are incompetent. The folks *designing* Windows and Java are also people, and as creators of development platforms go, it's hard to dispute their success in helping folks solve real problems. We should be mindful of that when drawing lessons from their experience. > Windows does not provide > a fork and processes are heavy-weight, hence multi-threading is the obvious > choice. Windows actually has superior native parallel execution APIs to Linux in some respects, but open source programming languages tend not to support them, presumably due to a combination of Microsoft's longstanding hostile perspective on open source licencing (which seems to finally be moderating with their new CEO), and the even longer standing POSIX mindset that "fork and file descriptors ought to be enough for anyone" (even if the workload in the child processes is wildly different from that in the main process). asyncio addresses that problem for Python in regards to IOCP vs select (et al), and the configurable subprocess creation options addressed it for multiprocessing, but I'm not aware of any efforts to get greenlets to use fibres when they're available. > > With > > subinterpreters, we have a lot more options for that than we do with > > any form of IPC, including shared references to immutable objects, and > > the PEP 3118 buffer API. > > Perhaps. One could do this with shared memory as well, but a complicating > factor is that the base address must be the same (or corrected for). But > one could probably do low-level magic with memory mapping to work around > this. Particularly on 64-bit it is not really difficult to make sure a page > is mapped to the same address in two processes. > > It is certainly easier to achieve within a process. But if the plan for > Erlang-style "share nothing" threads is to pickle and memcpy objects, there > is little or nothing to gain over using multiprocessing. The Python level *semantics* should be as if the objects were being copied (for ease of use), but the *implementation* should try to avoid actually doing that (for speed of execution). Assuming that can be done effectively *within* a process between subinterpreters, then the possibility arises of figuring out how to use shared memory to federate that approach across multiple processes. That could then provide a significant performance improvement for multiprocessing. But since we have the option of tackling the simpler problem of subinterpreters *first*, it makes sense to do that before diving into the cross-platform arcana involved in similarly improving the efficiency of multiprocessing's IPC. Regards, Nick. > > > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 22 01:39:20 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Jun 2015 01:39:20 +0200 Subject: [Python-ideas] solving multi-core Python References: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> Message-ID: <20150622013920.7c678330@fsol> On Mon, 22 Jun 2015 09:31:06 +1000 Nick Coghlan wrote: > > Windows actually has superior native parallel execution APIs to Linux in > some respects, but open source programming languages tend not to support > them, presumably due to a combination of Microsoft's longstanding hostile > perspective on open source licencing (which seems to finally be moderating > with their new CEO), and the even longer standing POSIX mindset that "fork > and file descriptors ought to be enough for anyone" (even if the workload > in the child processes is wildly different from that in the main process). Or perhaps the fact that those superiors APIs are a PITA. select() and friends may be crude performance-wise (though, strangely, we don't see providers migrating massively to Windows in order to improve I/O throughput), but they are simple to use. Regards Antoine. From ncoghlan at gmail.com Mon Jun 22 01:47:29 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 22 Jun 2015 09:47:29 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150622013920.7c678330@fsol> References: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> <20150622013920.7c678330@fsol> Message-ID: On 22 Jun 2015 09:40, "Antoine Pitrou" wrote: > > On Mon, 22 Jun 2015 09:31:06 +1000 > Nick Coghlan wrote: > > > > Windows actually has superior native parallel execution APIs to Linux in > > some respects, but open source programming languages tend not to support > > them, presumably due to a combination of Microsoft's longstanding hostile > > perspective on open source licencing (which seems to finally be moderating > > with their new CEO), and the even longer standing POSIX mindset that "fork > > and file descriptors ought to be enough for anyone" (even if the workload > > in the child processes is wildly different from that in the main process). > > Or perhaps the fact that those superiors APIs are a PITA. > select() and friends may be crude performance-wise (though, strangely, > we don't see providers migrating massively to Windows in order to > improve I/O throughput), but they are simple to use. Aye, there's a reason using a smart IDE like Visual Studio, IntelliJ or Eclipse is pretty much essential for both Windows and Java programming. These platforms fall squarely on the "tools maven" side of Oliver Steele's "IDE Divide": http://blog.osteele.com/posts/2004/11/ides/ The opportunity I think we have with Python is to put a cross platform text editor friendly abstraction layer across these kinds of underlying capabilities :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Jun 22 02:41:49 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 22 Jun 2015 02:41:49 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150622013920.7c678330@fsol> References: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> <20150622013920.7c678330@fsol> Message-ID: On 22/06/15 01:39, Antoine Pitrou wrote: > Or perhaps the fact that those superiors APIs are a PITA. Not all of them, no. HeapAlloc is a good example. Very easy to use, and the "one heap per thread" design often gives excellent performance compared to a single global heap. But on Linux we only have malloc et al., allocating from the global heap. How many Linux programmers have even considered using multiple heaps in combination with multi-threading? I can assure you it is not common. A good idea is to look at the Python C API. We have PyMem_Malloc, but nothing that compares to Windows' HeapAlloc. Not only does HeapAlloc remove the contention for the global heap, it can also serialize. Instead of serializing an object by traversing all references in the object tree, we just serialize the heap from which it was allocated. And as for garbage collection, why not deallocate the whole heap in one blow? Is the any reason to pair each malloc with free if one could just zap the whole heap? That is what HeapDestroy does. On Linux we would typically homebrew a memory pool to achieve the same thing. But a memory pool needs to traverse a chain of pointers and call free() multiple times, each time with contention for the spinlock protecting the global heap. And when allocating from a memory pool we also have contention for the global heap. It cannot in any way compare to the performance of the Win API HeapCreate/HeapDestroy and HeapAlloc/HeapFree. Sturla From solipsis at pitrou.net Mon Jun 22 02:49:38 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Jun 2015 02:49:38 +0200 Subject: [Python-ideas] solving multi-core Python References: <1848180938456593126.641370sturla.molden-gmail.com@news.gmane.org> <20150622013920.7c678330@fsol> Message-ID: <20150622024938.39ffba70@fsol> On Mon, 22 Jun 2015 09:47:29 +1000 Nick Coghlan wrote: > > > > Or perhaps the fact that those superiors APIs are a PITA. > > select() and friends may be crude performance-wise (though, strangely, > > we don't see providers migrating massively to Windows in order to > > improve I/O throughput), but they are simple to use. > > Aye, there's a reason using a smart IDE like Visual Studio, IntelliJ or > Eclipse is pretty much essential for both Windows and Java programming. > These platforms fall squarely on the "tools maven" side of Oliver Steele's > "IDE Divide": http://blog.osteele.com/posts/2004/11/ides/ It's not about using an IDE, it's the more complex and delicate control flow that asynchronous IO (IOCP / Overlapped) imposes compared to non-blocking IO (e.g. select()). Not to mention that lifetime issues are hard to handle safely and generically before Vista (that is, before CancelIOEx(): https://msdn.microsoft.com/en-us/library/windows/desktop/aa363792%28v=3Dv= s.85%29.aspx -- "The CancelIoEx function allows you to cancel requests in threads other than the calling thread. The CancelIo function only cancels requests in the same thread that called the CancelIo function") Regards Antoine. From ncoghlan at gmail.com Mon Jun 22 03:47:45 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 22 Jun 2015 11:47:45 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 21 June 2015 at 07:42, Eric Snow wrote: > tl;dr Let's exploit multiple cores by fixing up subinterpreters, > exposing them in Python, and adding a mechanism to safely share > objects between them. > > This proposal is meant to be a shot over the bow, so to speak. I plan > on putting together a more complete PEP some time in the future, with > content that is more refined along with references to the appropriate > online resources. > > Feedback appreciated! Offers to help even more so! :) It occurred to me in the context of another conversation that you (or someone else!) may be able to prototype some of the public API ideas for this using Jython and Vert.x: http://vertx.io/ That idea and some of the initial feedback in this thread also made me realise that it is going to be essential to keep in mind that there are key goals at two different layers here: * design a compelling implementation independent public API for CSP style programming in Python * use subinterpreters to implement that API efficiently in CPython There's a feedback loop between those two goals where limitations on what's feasible in CPython may constrain the design of the public API, and the design of the API may drive enhancements to the existing subinterpreter capability, but we shouldn't lose sight of the fact that they're *separate* goals. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jeanpierreda at gmail.com Mon Jun 22 04:16:32 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 21 Jun 2015 19:16:32 -0700 Subject: [Python-ideas] Responsive signal handling Message-ID: On the topic of obscure concurrency trivia, signal handling in Python is not very friendly, and I'm wondering if anyone is interested in reviewing changes to fix it. The world today: handlers only run in the main thread, but if the main thread is running a bytecode (e.g. a call to a C function), it will wait for that first. For example, signal handlers don't get run if you are in the middle of a lock acquisition, thread join, or (sometimes) a select call, until after the call returns (which may take a very long time). This makes it difficult to be responsive to signals without acrobatics, or without using a library that does those acrobatics for you (such as Twisted.) Being responsive to SIGTERM and SIGINT is, IMO, important for running programs in the cloud, since otherwise they may be forcefully killed by the job manager, causing user-facing errors. (It's also annoying as a command line user when you can't kill a process with anything less than SIGKILL.) I would stress that, by and large, the signal module is a trap that most people get wrong, and I don't think the stdlib solution should be the way it is. (e.g. it looks to me like gunicorn gets it wrong, and certainly asyncio did in early versions.) One could implement one or both of the following: - Keep only running signal handlers in the main thread, but allow them to run even in the middle of a call to a C function for as many C functions as we can. This is not possible in general, but it can be made to work for all blocking operations in the stdlib. Operations that run in C but just take a long time, or that are part of third-party code, will continue to inhibit responsiveness. - Run signal handlers in a dedicated separate thread. IMO this is generally better than running signal handlers in the main thread, because it eliminates the separate concept of "async-safe" and just requires "thread-safe". So you can use regular threading synchronization primitives for safety, instead of relying on luck / memorized lists of atomic/re-entrant operations. Something still needs to run in the main thread though, for e.g. KeyboardInterrupt, so this is not super straightforward. Also, it could break any code that really relies on signal handlers running in the main thread. Either approach can be turned into a library, albeit potentially hackily in the first case. -- Devin From cs at zip.com.au Mon Jun 22 09:52:07 2015 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 22 Jun 2015 17:52:07 +1000 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <20150622075207.GA60571@cskk.homeip.net> On 21Jun2015 19:16, Devin Jeanpierre wrote: >On the topic of obscure concurrency trivia, signal handling in Python >is not very friendly, and I'm wondering if anyone is interested in >reviewing changes to fix it. > >The world today: handlers only run in the main thread, but if the main >thread is running a bytecode (e.g. a call to a C function), it will >wait for that first. For example, signal handlers don't get run if you >are in the middle of a lock acquisition, thread join, or (sometimes) a >select call, until after the call returns (which may take a very long >time). > >This makes it difficult to be responsive to signals without >acrobatics, or without using a library that does those acrobatics for >you (such as Twisted.) Being responsive to SIGTERM and SIGINT is, IMO, >important for running programs in the cloud, since otherwise they may >be forcefully killed by the job manager, causing user-facing errors. >(It's also annoying as a command line user when you can't kill a >process with anything less than SIGKILL.) I agree with all of this, but I do think that handling signals in the main program by default is a sensible default: it gives very predictable behaviour. [...] >- Keep only running signal handlers in the main thread, but allow them >to run even in the middle of a call to a C function for as many C >functions as we can. This feels fragile: this means that former one could expect C calls to be "atomic" from the main thread's point of view and conversely the C functions can expect the main thread (or whatever calling thread called them) is paused during their execution. As soon as the calling thread can reactivate these guarrentees are broken. Supposing the C call is doing things to thread local Python variables, for just one scenario. So I'm -1 on this on the face of it. >This is not possible in general, but it can be made to work for all >blocking operations in the stdlib. Hmm. I'm not sure that you will find this universally so. No, I have no examples proving my intuition here. >Operations that run in C but just >take a long time, or that are part of third-party code, will continue >to inhibit responsiveness. > >- Run signal handlers in a dedicated separate thread. > >IMO this is generally better than running signal handlers in the main >thread, because it eliminates the separate concept of "async-safe" and >just requires "thread-safe". So you can use regular threading >synchronization primitives for safety, instead of relying on luck / >memorized lists of atomic/re-entrant operations. Yes, I am in favour of this or something like it. Personally I would go for either or both of: - a stdlib function to specify the thread to handle signals instead of main - a stdlib function to declare that signals should immediately place a nice descriptive "signal" object on a Queue, and leaves it to the user to handle the queue (for example, by spawning a thread to consume it) >Something still needs to run in the main thread though, for e.g. >KeyboardInterrupt, so this is not super straightforward. Is this necessarily true? >Also, it >could break any code that really relies on signal handlers running in >the main thread. Which is why it should never be the default; I am firmly of the opinion that that changed handling should be requested by the program. Cheers, Cameron Simpson Facts do not discourage the conspiracy-minded. - Robert Crawford From mal at egenix.com Mon Jun 22 10:16:52 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Jun 2015 10:16:52 +0200 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <5587C474.7020507@egenix.com> On 22.06.2015 04:16, Devin Jeanpierre wrote: > On the topic of obscure concurrency trivia, signal handling in Python > is not very friendly, and I'm wondering if anyone is interested in > reviewing changes to fix it. > > The world today: handlers only run in the main thread, but if the main > thread is running a bytecode (e.g. a call to a C function), it will > wait for that first. For example, signal handlers don't get run if you > are in the middle of a lock acquisition, thread join, or (sometimes) a > select call, until after the call returns (which may take a very long > time). IMO, the above can easily be solved by going with an application design which doesn't use the main thread for any long running tasks, but instead runs these in separate threads. I don't know what the overall situation is today, but at least in the past, signal handling only worked reliably across platforms in the main thread of the application. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 22 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78 2015-06-10: Released mxODBC Plone/Zope DA 2.2.2 http://egenix.com/go76 2015-07-20: EuroPython 2015, Bilbao, Spain ... 28 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Mon Jun 22 10:28:04 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Jun 2015 10:28:04 +0200 Subject: [Python-ideas] Responsive signal handling In-Reply-To: <5587C474.7020507@egenix.com> References: <5587C474.7020507@egenix.com> Message-ID: I would regret losing the behavior where just raising an exception in a signal handler causes the main thread to be interrupted by that exception. I agree it would be nice if handlers ran when the main thread is waiting for I/O. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From aquavitae69 at gmail.com Mon Jun 22 10:30:26 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Mon, 22 Jun 2015 10:30:26 +0200 Subject: [Python-ideas] Pathlib additions & changes Message-ID: Hi Recently I've been trying out pathlib in some real code, and while it is a vast improvement on messing around with os, os.path and shutil there are a couple of suggestions I'd like to make. ** TL;DR: Add Path.copy, add Path.remove (replaces Path.rmdir and Path.unlink) and add flags to several methods. Details ====== 1. Add a copy method, i.e. source_path.copy(target_path), which by default should behave like shutil.copy2. 2. Why bother with the distinction between Path.unlink and Path.rmdir? Obviously they apply to different types of paths, but to a user, either way you just want to remove whatever is at the path, so perhaps just have a single Path.remove instead (but see point 3 below). 3. There are several other minor irritations where a common pattern requires several lines or the use of a lower-level library such as shutil. For example: * Mkdir where path exists, but we don't care (common pattern on scripts) if not path.exists(): path.mkdir(parent=True) * Recursively remove a directory (no sane way using pathlib alone) shutil.rmtree(str(path)) * Move a file, creating parents if necessary py> if not target.parent.exists(): target.parent.mkdir(parents=true) source.rename(target) There are others, but these are a couple that spring to mind. There are three options. Either we add a bunch of specific functions for each of these (e.g. Path.rmtree, Path.rename_with_mkdir, etc), or we add a whole lot of boolean arguments (e.g. Path.rename(make_parents=True), or we use flags, e.g. Path.rename(flags=MAKE_PARENTS) Using flags is, IMHO the neatest solution, and could replace some boolean arguments already included. What follows is a suggestion of where flags might be useful, including the new methods suggested above. I haven't put a huge amount of thought into these, wanting to just get the general idea on the table, so I'm sure that upon closer inspection some won't make much sense or could be better named. chmod: RECURSIVE | DONT_FOLLOW_SYMLINKS copy: WITH_STATS | MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING iterdir: RECURSIVE (maybe not worth it because of globbing) lchmod: RECURSIVE (Could be dropped in favour of chmod(flags=DONT_FOLLOW_SYMLINKS)) lstat: (Could be dropped in favour of stat(flags=DONT_FOLLOW_SYMLINKS)) mkdir: MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING remove: RECURSIVE rename: MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING replace: (Could be dropped in favour of rename(flags=OVERWRITE_EXISTING) ) rmdir: (Could be dropped in favour of remove) stat: DONT_FOLLOW_SYMLINKS touch: MAKE_PARENTS | IGNORE_EXISTING unlink: (Could be dropped in favour of remove) Regards David -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Jun 22 10:41:17 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 22 Jun 2015 10:41:17 +0200 Subject: [Python-ideas] Responsive signal handling References: Message-ID: <20150622104117.31b69ddc@fsol> On Sun, 21 Jun 2015 19:16:32 -0700 Devin Jeanpierre wrote: > > One could implement one or both of the following: > > - Keep only running signal handlers in the main thread, but allow them > to run even in the middle of a call to a C function for as many C > functions as we can. > This is not possible in general, but it can be made to work for all > blocking operations in the stdlib. Are you aware that it is already the case today (perhaps not for "all blocking operations", but at least for those that return EINTR when interrupted)? By the way, have you read https://www.python.org/dev/peps/pep-0475/ ? Regards Antoine. From jeanpierreda at gmail.com Mon Jun 22 10:42:38 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 22 Jun 2015 01:42:38 -0700 Subject: [Python-ideas] Responsive signal handling In-Reply-To: <20150622075207.GA60571@cskk.homeip.net> References: <20150622075207.GA60571@cskk.homeip.net> Message-ID: On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson wrote: > On 21Jun2015 19:16, Devin Jeanpierre wrote: >> This is not possible in general, but it can be made to work for all >> blocking operations in the stdlib. > > > Hmm. I'm not sure that you will find this universally so. No, I have no > examples proving my intuition here. If you fix select, everything else can hypothetically follow, as you can run all C functions in another thread and signal the result is ready using an fd. (This is my idea for a hacky third-party library. It only ruins stack traces!) >> Operations that run in C but just >> take a long time, or that are part of third-party code, will continue >> to inhibit responsiveness. >> >> - Run signal handlers in a dedicated separate thread. >> >> IMO this is generally better than running signal handlers in the main >> thread, because it eliminates the separate concept of "async-safe" and >> just requires "thread-safe". So you can use regular threading >> synchronization primitives for safety, instead of relying on luck / >> memorized lists of atomic/re-entrant operations. > > > Yes, I am in favour of this or something like it. Personally I would go for > either or both of: > > - a stdlib function to specify the thread to handle signals instead of main This just moves the problem to another thread. One can already today try to keep the main thread free to handle signals, it's just hard. > - a stdlib function to declare that signals should immediately place a nice > descriptive "signal" object on a Queue, and leaves it to the user to handle > the queue (for example, by spawning a thread to consume it) I like this. It mirror's Linux's selectfd, too. One small correction, it can't literally be a Queue, because those aren't safe to use in signal handlers. (It can be a pipe that is wrapped in a Queue-like interface, though, and if we do that, we can even use native signalfd if we want.) It also resolves an unspoken concern I had, which is that silently starting threads for the user feels icky. >> Something still needs to run in the main thread though, for e.g. >> KeyboardInterrupt, so this is not super straightforward. > > > Is this necessarily true? What I mean is that there needs to be a way to raise KeyboardInterrupt in the main thread from a signal handler. If, as you suggest, the old behavior stays around, then that's enough. Another option, if we went with a dedicated signal handling thread, would be that uncaught exceptions propagate to the main thread when it gets around to it. -- Devin From jeanpierreda at gmail.com Mon Jun 22 10:56:54 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 22 Jun 2015 01:56:54 -0700 Subject: [Python-ideas] Responsive signal handling In-Reply-To: <20150622104117.31b69ddc@fsol> References: <20150622104117.31b69ddc@fsol> Message-ID: On Mon, Jun 22, 2015 at 1:41 AM, Antoine Pitrou wrote: > On Sun, 21 Jun 2015 19:16:32 -0700 > Devin Jeanpierre > wrote: >> >> One could implement one or both of the following: >> >> - Keep only running signal handlers in the main thread, but allow them >> to run even in the middle of a call to a C function for as many C >> functions as we can. >> This is not possible in general, but it can be made to work for all >> blocking operations in the stdlib. > > Are you aware that it is already the case today (perhaps not for "all > blocking operations", but at least for those that return EINTR when > interrupted)? That only applies when the signal arrives during the system call. For example, if you call Python select.select() and a signal is received during argument parsing, EINTR is not returned because C select() is not running yet. See https://bugs.python.org/issue5315 . The only cross-platform fix for this that I am aware of is to make select() use the self-pipe trick: - set up a pipe, and use set_wakeup_fd to make signals write to that pipe - check for signals in case any arrived before you called set_wakeup_fd - call select() as before, but also select on the pipe Without something like this, where a signal is handled no matter when it comes in, even a call which returns EINTR usually can miss signals, resulting in potentially drastically reduced responsiveness. This exact trick doesn't work for every blocking call, just for the ones in the select module. I provided a patch on that issue which does this. It is atrocious. :( If I rewrote it, I'd prefer to write it as a pure-python wrapper around select(). > By the way, have you read https://www.python.org/dev/peps/pep-0475/ ? I did once, but I reread it now. I think the PEP is focused not on making signal handling more responsive, but on making EINTR less of a trap. Although it does mention responsiveness in use case 2, it doesn't go far enough. I think the following cases matter: - Signals that arrive before the system call starts, but after the Python function call begins - Signals that arrive during a call to a blocking function which doesn't return EINTR - Signals that arrive during a call to a C function which doesn't block at all, but is just slow -- Devin From andrew.svetlov at gmail.com Mon Jun 22 12:13:18 2015 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 22 Jun 2015 13:13:18 +0300 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: <20150622104117.31b69ddc@fsol> Message-ID: IIRC signal handler may be blocked by threading synchronization primitives etc. on Python 2.7 but it's not an issue for Python 3. I don't recall exact version -- it's 3.2 likely. Maybe Benjamin Peterson would provide more info -- hg blame says he is author of EINTR processing in _threadmodule.c On Mon, Jun 22, 2015 at 11:56 AM, Devin Jeanpierre wrote: > On Mon, Jun 22, 2015 at 1:41 AM, Antoine Pitrou wrote: >> On Sun, 21 Jun 2015 19:16:32 -0700 >> Devin Jeanpierre >> wrote: >>> >>> One could implement one or both of the following: >>> >>> - Keep only running signal handlers in the main thread, but allow them >>> to run even in the middle of a call to a C function for as many C >>> functions as we can. >>> This is not possible in general, but it can be made to work for all >>> blocking operations in the stdlib. >> >> Are you aware that it is already the case today (perhaps not for "all >> blocking operations", but at least for those that return EINTR when >> interrupted)? > > That only applies when the signal arrives during the system call. For > example, if you call Python select.select() and a signal is received > during argument parsing, EINTR is not returned because C select() is > not running yet. See https://bugs.python.org/issue5315 . > The only cross-platform fix for this that I am aware of is to make > select() use the self-pipe trick: > > - set up a pipe, and use set_wakeup_fd to make signals write to that pipe > - check for signals in case any arrived before you called set_wakeup_fd > - call select() as before, but also select on the pipe > > Without something like this, where a signal is handled no matter when > it comes in, even a call which returns EINTR usually can miss signals, > resulting in potentially drastically reduced responsiveness. This > exact trick doesn't work for every blocking call, just for the ones in > the select module. > > I provided a patch on that issue which does this. It is atrocious. :( > If I rewrote it, I'd prefer to write it as a pure-python wrapper > around select(). > >> By the way, have you read https://www.python.org/dev/peps/pep-0475/ ? > > I did once, but I reread it now. I think the PEP is focused not on > making signal handling more responsive, but on making EINTR less of a > trap. Although it does mention responsiveness in use case 2, it > doesn't go far enough. > > I think the following cases matter: > > - Signals that arrive before the system call starts, but after the > Python function call begins > - Signals that arrive during a call to a blocking function which > doesn't return EINTR > - Signals that arrive during a call to a C function which doesn't > block at all, but is just slow > > -- Devin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Thanks, Andrew Svetlov From rymg19 at gmail.com Mon Jun 22 16:30:22 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Mon, 22 Jun 2015 09:30:22 -0500 Subject: [Python-ideas] Pathlib additions & changes In-Reply-To: References: Message-ID: On June 22, 2015 3:30:26 AM CDT, David Townshend wrote: >Hi > >Recently I've been trying out pathlib in some real code, and while it >is a >vast improvement on messing around with os, os.path and shutil there >are a >couple of suggestions I'd like to make. > >** TL;DR: Add Path.copy, add Path.remove (replaces Path.rmdir and >Path.unlink) and add flags to several methods. > >Details >====== > >1. Add a copy method, i.e. source_path.copy(target_path), which by >default >should behave like shutil.copy2. > >2. Why bother with the distinction between Path.unlink and Path.rmdir? >Obviously they apply to different types of paths, but to a user, either >way >you just want to remove whatever is at the path, so perhaps just have a >single Path.remove instead (but see point 3 below). > >3. There are several other minor irritations where a common pattern >requires several lines or the use of a lower-level library such as >shutil. >For example: > >* Mkdir where path exists, but we don't care (common pattern on >scripts) > if not path.exists(): > path.mkdir(parent=True) You can just do' try: path.mkdir(parent=True) except FileExistsError: pass > > * Recursively remove a directory (no sane way using pathlib alone) > shutil.rmtree(str(path)) > > * Move a file, creating parents if necessary > py> if not target.parent.exists(): > target.parent.mkdir(parents=true) > source.rename(target) > >There are others, but these are a couple that spring to mind. There >are >three options. Either we add a bunch of specific functions for each of >these (e.g. Path.rmtree, Path.rename_with_mkdir, etc), or we add a >whole >lot of boolean arguments (e.g. Path.rename(make_parents=True), or we >use >flags, e.g. Path.rename(flags=MAKE_PARENTS) > >Using flags is, IMHO the neatest solution, and could replace some >boolean >arguments already included. What follows is a suggestion of where >flags >might be useful, including the new methods suggested above. I haven't >put >a huge amount of thought into these, wanting to just get the general >idea >on the table, so I'm sure that upon closer inspection some won't make >much >sense or could be better named. I prefer keyword-only arguments. Flags aren't really Pythonic, IMO. > > chmod: RECURSIVE | DONT_FOLLOW_SYMLINKS > copy: WITH_STATS | MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING > iterdir: RECURSIVE (maybe not worth it because of globbing) > lchmod: RECURSIVE (Could be dropped in favour of >chmod(flags=DONT_FOLLOW_SYMLINKS)) >lstat: (Could be dropped in favour of stat(flags=DONT_FOLLOW_SYMLINKS)) > mkdir: MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING > remove: RECURSIVE > rename: MAKE_PARENTS | OVERWRITE_EXISTING | IGNORE_EXISTING >replace: (Could be dropped in favour of >rename(flags=OVERWRITE_EXISTING) >) > rmdir: (Could be dropped in favour of remove) > stat: DONT_FOLLOW_SYMLINKS > touch: MAKE_PARENTS | IGNORE_EXISTING > unlink: (Could be dropped in favour of remove) > >Regards >David > > >------------------------------------------------------------------------ > >_______________________________________________ >Python-ideas mailing list >Python-ideas at python.org >https://mail.python.org/mailman/listinfo/python-ideas >Code of Conduct: http://python.org/psf/codeofconduct/ -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From rosuav at gmail.com Mon Jun 22 16:59:18 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 23 Jun 2015 00:59:18 +1000 Subject: [Python-ideas] Pathlib additions & changes In-Reply-To: References: Message-ID: On Mon, Jun 22, 2015 at 6:30 PM, David Townshend wrote: > 3. There are several other minor irritations where a common pattern > requires several lines or the use of a lower-level library such as shutil. > For example: > > * Recursively remove a directory (no sane way using pathlib alone) > shutil.rmtree(str(path)) I'm not sure shutil should be considered a lower-level library. It's a separate set of tools aimed at shell-like functionality. Removing a directory tree seems right for shutil; what if shutil.rmtree() would accept a Path object as an alternative to a str? That'd make reasonable sense, and it'd feel like the two modules were working well together. (Or can it already?) ChrisA From p.f.moore at gmail.com Mon Jun 22 17:53:24 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 22 Jun 2015 16:53:24 +0100 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: <20150622075207.GA60571@cskk.homeip.net> Message-ID: On 22 June 2015 at 09:42, Devin Jeanpierre wrote: > On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson wrote: >> On 21Jun2015 19:16, Devin Jeanpierre wrote: >>> This is not possible in general, but it can be made to work for all >>> blocking operations in the stdlib. >> >> >> Hmm. I'm not sure that you will find this universally so. No, I have no >> examples proving my intuition here. > > If you fix select, everything else can hypothetically follow, as you > can run all C functions in another thread and signal the result is > ready using an fd. (This is my idea for a hacky third-party library. > It only ruins stack traces!) This particular approach presumably only works on Unix? (On Windows, select is not a general signalling operation, it only works for sockets). Presumably a cross-platform solution would need to use appropriate OS-native signalling based on the platform? Paul From p.f.moore at gmail.com Mon Jun 22 17:59:26 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 22 Jun 2015 16:59:26 +0100 Subject: [Python-ideas] Pathlib additions & changes In-Reply-To: References: Message-ID: On 22 June 2015 at 15:59, Chris Angelico wrote: > On Mon, Jun 22, 2015 at 6:30 PM, David Townshend wrote: >> 3. There are several other minor irritations where a common pattern >> requires several lines or the use of a lower-level library such as shutil. >> For example: >> >> * Recursively remove a directory (no sane way using pathlib alone) >> shutil.rmtree(str(path)) > > I'm not sure shutil should be considered a lower-level library. It's a > separate set of tools aimed at shell-like functionality. Removing a > directory tree seems right for shutil; what if shutil.rmtree() would > accept a Path object as an alternative to a str? That'd make > reasonable sense, and it'd feel like the two modules were working well > together. Agreed, shutil is higher level than pathlib, not lower. Having more stdlib functions (shutil is the most obvious example, but there are others) take pathlib.Path objects as well as strings would be a good change (and would set a nice example for 3rd party file manipulation modules). I'm sure the usual "patches welcome" applies :-) The main irritation about using "higher level" modules with path objects is the proliferation of str() calls. Accepting path objects natively fixes that: from shutil import rmtree rmtree(path) looks fine to me. Paul From aquavitae69 at gmail.com Mon Jun 22 18:44:27 2015 From: aquavitae69 at gmail.com (David Townshend) Date: Mon, 22 Jun 2015 18:44:27 +0200 Subject: [Python-ideas] Pathlib additions & changes In-Reply-To: References: Message-ID: On 22 Jun 2015 17:59, "Paul Moore" wrote: > > On 22 June 2015 at 15:59, Chris Angelico wrote: > > On Mon, Jun 22, 2015 at 6:30 PM, David Townshend wrote: > >> 3. There are several other minor irritations where a common pattern > >> requires several lines or the use of a lower-level library such as shutil. > >> For example: > >> > >> * Recursively remove a directory (no sane way using pathlib alone) > >> shutil.rmtree(str(path)) > > > > I'm not sure shutil should be considered a lower-level library. It's a > > separate set of tools aimed at shell-like functionality. Removing a > > directory tree seems right for shutil; what if shutil.rmtree() would > > accept a Path object as an alternative to a str? That'd make > > reasonable sense, and it'd feel like the two modules were working well > > together. > > Agreed, shutil is higher level than pathlib, not lower. > > Having more stdlib functions (shutil is the most obvious example, but > there are others) take pathlib.Path objects as well as strings would > be a good change (and would set a nice example for 3rd party file > manipulation modules). I'm sure the usual "patches welcome" applies > :-) > > The main irritation about using "higher level" modules with path > objects is the proliferation of str() calls. Accepting path objects > natively fixes that: > > from shutil import rmtree > rmtree(path) > > looks fine to me. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ I was going on the fact that the PEP talks about possibly including shutil functions, but I have no problem with making them accept Paths instead. If that's the best approach I'll see if I can put together a patch. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jun 22 19:03:11 2015 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 22 Jun 2015 17:03:11 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sat, Jun 20, 2015 at 5:42 PM Chris Angelico wrote: > On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow > wrote: > > * disallow forking within subinterpreters > > I love the idea as a whole (if only because the detractors can be told > "Just use subinterpreters, then you get concurrency"), but this seems > like a tricky restriction. That means no subprocess.Popen, no shelling > out to other applications. And I don't know what of other restrictions > might limit any given program. Will it feel like subinterpreters are > "write your code according to these tight restrictions and it'll > work", or will it be more of "most programs will run in parallel just > fine, but there are a few things to be careful of"? > It wouldn't disallow use of subprocess, only os.fork(). C extension modules can alway fork. The restriction being placed in this scheme is: "if your extension module code forks from a subinterpreter, the child process MUST not return control to Python." I'm not sure if this restriction would actually be *needed* or not but I agree with it regardless. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Mon Jun 22 19:37:01 2015 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 22 Jun 2015 17:37:01 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Sun, Jun 21, 2015 at 4:56 AM Devin Jeanpierre wrote: > On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow > wrote: > > > > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" > wrote: > >> > >> It's worthwhile to consider fork as an alternative. IMO we'd get a > >> lot out of making forking safer, easier, and more efficient. (e.g. > >> respectively: adding an atfork registration mechanism; separating out > >> the bits of multiprocessing that use pickle from those that d, I still > disagreeon't; > >> moving the refcount to a separate page, or allowing it to be frozen > >> prior to a fork.) > > > > So leverage a common base of code with the multiprocessing module? > > What is this question in response to? I don't understand. > > > I would expect subinterpreters to use less memory. Furthermore creating > > them would be significantly faster. Passing objects between them would > be > > much more efficient. And, yes, cross-platform. > > Maybe I don't understand how subinterpreters work. AIUI, the whole > point of independent subinterpreters is that they share no state. So > if I have a web server, each independent serving thread has to do all > of the initialization (import HTTP libraries, etc.), right? Compare > with forking, where the initialization is all done and then you fork, > and you are immediately ready to serve, using the data structures > shared with all the other workers, which is only copied when it is > written to. > Unfortunately CPython subinterpreters do share some state, though it is not visible to the running code in many cases. Thus the other mentions of "wouldn't it be nice if CPython didn't assume a single global state per process" (100% agreed, but tangential to this discussion)... https://docs.python.org/3/c-api/init.html#sub-interpreter-support You are correct that some things that could make sense to share, such as imported modules, would not be shared as they are in a forked environment. This is an important oddity of subinterpreters: They have to re-import everything other than extension modules. When you've got a big process with a ton of modules (like, say, 100s of protocol buffers...), that's going to be a non-starter (pun intended) for the use of threads+subinterpreters as a fast form of concurrency if they need to import most of those from each subinterpreter. startup latency and cpu usage += lots. (possibly uses more memory as well but given our existing refcount implementation forcing needless PyObject page writes during a read causing fork to copy-on-write... impossible to guess) What this means for subinterpreters in this case is not much different from starting up multiple worker processes: You need to start them up and wait for them to be ready to serve, then reuse them as long as feasible before recycling them to start up a new one. The startup cost is high. I'm not entirely sold on this overall proposal, but I think a result of it *could* be to make our subinterpreter support better which would be a good thing. We have had to turn people away from subinterpreters in the past for use as part of their multithreaded C++ server where they wanted to occasionally run some Python code in embedded interpreters as part of serving some requests. Doing that would suddenly single thread their application (GIIIIIIL!) for all requests currently executing Python code despite multiple subinterpreters. The general advice for that: Run multiple Python processes and make RPCs to those from the C++ code. It allows for parallelism and ultimately scales better, if ever needed, as it can be easily spread across machines. Which one is more complex to maintain? Good question. -gps > > Re passing objects, see below. > > I do agree it's cross-platform, but right now that's the only thing I > agree with. > > >> Note: I don't count the IPC cost of forking, because at least on > >> linux, any way to efficiently share objects between independent > >> interpreters in separate threads can also be ported to independent > >> interpreters in forked subprocesses, > > > > How so? Subinterpreters are in the same process. For this proposal each > > would be on its own thread. Sharing objects between them through > channels > > would be more efficient than IPC. Perhaps I've missed something? > > You might be missing that memory can be shared between processes, not > just threads, but I don't know. > > The reason passing objects between processes is so slow is currently > *nearly entirely* the cost of serialization. That is, it's the fact > that you are passing an object to an entirely separate interpreter, > and need to serialize the whole object graph and so on. If you can > make that fast without serialization, for shared memory threads, then > all the serialization becomes unnecessary, and you can either write to > a pipe (fast, if it's a non-container), or used shared memory from the > beginning (instantaneous). This is possible on any POSIX OS. Linux > lets you go even further. > > -- Devin > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 23 00:30:13 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Jun 2015 08:30:13 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On 23 Jun 2015 03:37, "Gregory P. Smith" wrote: > > > > On Sun, Jun 21, 2015 at 4:56 AM Devin Jeanpierre wrote: >> >> On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow wrote: >> > >> > On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" wrote: >> >> >> >> It's worthwhile to consider fork as an alternative. IMO we'd get a >> >> lot out of making forking safer, easier, and more efficient. (e.g. >> >> respectively: adding an atfork registration mechanism; separating out >> >> the bits of multiprocessing that use pickle from those that d, I still disagreeon't; >> >> moving the refcount to a separate page, or allowing it to be frozen >> >> prior to a fork.) >> > >> > So leverage a common base of code with the multiprocessing module? >> >> What is this question in response to? I don't understand. >> >> > I would expect subinterpreters to use less memory. Furthermore creating >> > them would be significantly faster. Passing objects between them would be >> > much more efficient. And, yes, cross-platform. >> >> Maybe I don't understand how subinterpreters work. AIUI, the whole >> point of independent subinterpreters is that they share no state. So >> if I have a web server, each independent serving thread has to do all >> of the initialization (import HTTP libraries, etc.), right? Compare >> with forking, where the initialization is all done and then you fork, >> and you are immediately ready to serve, using the data structures >> shared with all the other workers, which is only copied when it is >> written to. > > > Unfortunately CPython subinterpreters do share some state, though it is not visible to the running code in many cases. Thus the other mentions of "wouldn't it be nice if CPython didn't assume a single global state per process" (100% agreed, but tangential to this discussion)... > > https://docs.python.org/3/c-api/init.html#sub-interpreter-support > > You are correct that some things that could make sense to share, such as imported modules, would not be shared as they are in a forked environment. > > This is an important oddity of subinterpreters: They have to re-import everything other than extension modules. When you've got a big process with a ton of modules (like, say, 100s of protocol buffers...), that's going to be a non-starter (pun intended) for the use of threads+subinterpreters as a fast form of concurrency if they need to import most of those from each subinterpreter. startup latency and cpu usage += lots. (possibly uses more memory as well but given our existing refcount implementation forcing needless PyObject page writes during a read causing fork to copy-on-write... impossible to guess) > > What this means for subinterpreters in this case is not much different from starting up multiple worker processes: You need to start them up and wait for them to be ready to serve, then reuse them as long as feasible before recycling them to start up a new one. The startup cost is high. While I don't believe it's clear from the current text in the PEP (mostly because I only figured it out while hacking on the prototype implementation), PEP 432 should actually give us much better control over how subinterpreters are configured, as many more interpreter settings move out of global variables and into the interpreter state: https://www.python.org/dev/peps/pep-0432/ (the global variables will still exist, but primarily as an input to the initial configuration of the main interpreter) The current state of that work can be seen at https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits While a lot of things are broken there, it's at least to the point where it can start running the regression test suite under the new 2-phase initialisation model. Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Jun 23 00:50:09 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 22 Jun 2015 22:50:09 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: <5585E37F.4060403@gmail.com> Message-ID: <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> "Gregory P. Smith" wrote: > What this means for subinterpreters in this case is not much different from > starting up multiple worker processes: You need to start them up and wait > for them to be ready to serve, then reuse them as long as feasible before > recycling them to start up a new one. The startup cost is high. The statup cost for worker processes is high on Windows. It is very small on nearly any other OS. Sturla From pmiscml at gmail.com Tue Jun 23 01:15:30 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 23 Jun 2015 02:15:30 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats Message-ID: <20150623021530.74ce1ebe@x230> Hello from MicroPython, a lean Python implementation scaling down to run even on microcontrollers (https://github.com/micropython/micropython). Our target hardware base oftentimes lacks floating point support, and using software emulation is expensive. So, we would like to have versions of some timing functions, taking/returning millisecond and/or microsecond values as integers. The most functionality we're interested in: 1. Delays 2. Relative time (from an arbitrary starting point, expected to be wrapped) 3. Calculating time differences, with immunity to wrap-around. The first presented assumption is to use "time.sleep()" for delays, "time.monotonic()" for relative time as the base. Would somebody gave alternative/better suggestions? Second question is how to modify their names for millisecond/microsecond versions. For sleep(), "msleep" and "usleep" would be concise possibilities, but that doesn't map well to monotonic(), leading to "mmonotonic". So, better idea is to use "_ms" and "_us" suffixes: sleep_ms() sleep_us() monotonic_ms() monotonic_us() Point 3 above isn't currently addressed by time module at all. https://www.python.org/dev/peps/pep-0418/ mentions some internal workaround for overflows/wrap-arounds on some systems. Due to lean-ness of our hardware base, we'd like to make this matter explicit to the applications and avoid internal workarounds. Proposed solution is to have time.elapsed(time1, time2) function, which can take values as returned by monotonic_ms(), monotonic_us(). Assuming that results of both functions are encoded and wrap consistently (this is reasonable assumption), there's no need for 2 separate elapsed_ms(), elapsed_us() function. So, the above are rough ideas we (well, I) have. We'd like to get wider Python community feedback on them, see if there're better/alternative ideas, how Pythonic it is, etc. To clarify, this should not be construed as proposal to add the above functions to CPython. -- Best regards, Paul mailto:pmiscml at gmail.com From greg at krypto.org Tue Jun 23 01:29:17 2015 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 22 Jun 2015 23:29:17 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Jun 22, 2015 at 3:51 PM Sturla Molden wrote: > "Gregory P. Smith" wrote: > > > What this means for subinterpreters in this case is not much different > from > > starting up multiple worker processes: You need to start them up and wait > > for them to be ready to serve, then reuse them as long as feasible before > > recycling them to start up a new one. The startup cost is high. > > The statup cost for worker processes is high on Windows. It is very small > on nearly any other OS. > While I understand that Windows adds some overhead there, startup time for Python worker processes is high on all OSes. Python startup is slow in general. It slows down further based on the modules you must import before you can begin work. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Tue Jun 23 01:51:34 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Mon, 22 Jun 2015 16:51:34 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Jun 22, 2015 at 4:29 PM, Gregory P. Smith wrote: > > On Mon, Jun 22, 2015 at 3:51 PM Sturla Molden > wrote: >> >> "Gregory P. Smith" wrote: >> >> > What this means for subinterpreters in this case is not much different >> > from >> > starting up multiple worker processes: You need to start them up and >> > wait >> > for them to be ready to serve, then reuse them as long as feasible >> > before >> > recycling them to start up a new one. The startup cost is high. >> >> The statup cost for worker processes is high on Windows. It is very small >> on nearly any other OS. > > While I understand that Windows adds some overhead there, startup time for > Python worker processes is high on all OSes. > > Python startup is slow in general. It slows down further based on the > modules you must import before you can begin work. Python does *very* little work on fork, which is what Sturla is alluding to. (Fork doesn't exist on Windows.) The only part I've found forking to be slow with is if you need to delay initialization of a thread pool and everything that depends on a thread pool until after the fork. This could hypothetically be made faster with subinterpreters if the thread pool was shared among all subinterpreters (e.g. if it was written in C.), but I would *expect* fork to be faster overall. That said, worker startup time is not actually very interesting anyway, since workers should restart rarely. I think its biggest impact is probably the time it takes to start your entire task from scratch. -- Devin From njs at pobox.com Tue Jun 23 01:59:40 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 22 Jun 2015 16:59:40 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith wrote: > This is an important oddity of subinterpreters: They have to re-import > everything other than extension modules. When you've got a big process with > a ton of modules (like, say, 100s of protocol buffers...), that's going to > be a non-starter (pun intended) for the use of threads+subinterpreters as a > fast form of concurrency if they need to import most of those from each > subinterpreter. startup latency and cpu usage += lots. (possibly uses more > memory as well but given our existing refcount implementation forcing > needless PyObject page writes during a read causing fork to copy-on-write... > impossible to guess) > > What this means for subinterpreters in this case is not much different from > starting up multiple worker processes: You need to start them up and wait > for them to be ready to serve, then reuse them as long as feasible before > recycling them to start up a new one. The startup cost is high. One possibility would be for subinterpreters to copy modules from the main interpreter -- I guess your average module is mostly dicts, strings, type objects, and functions; strings and functions are already immutable and could be shared without copying, and I guess copying the dicts and type objects into the subinterpreter is much cheaper than hitting the disk etc. to do a real import. (Though certainly not free.) This would have interesting semantic implications -- it would give similar effects to fork(), with subinterpreters starting from a snapshot of the main interpreter's global state. > I'm not entirely sold on this overall proposal, but I think a result of it > could be to make our subinterpreter support better which would be a good > thing. > > We have had to turn people away from subinterpreters in the past for use as > part of their multithreaded C++ server where they wanted to occasionally run > some Python code in embedded interpreters as part of serving some requests. > Doing that would suddenly single thread their application (GIIIIIIL!) for > all requests currently executing Python code despite multiple > subinterpreters. I've also talked to HPC users who discovered this problem the hard way (e.g. http://www-atlas.lbl.gov/, folks working on the Large Hadron Collider) -- they've been using Python as an extension language in some large physics codes but are now porting those bits to C++ because of the GIL issues. (In this context startup overhead should be easily amortized, but switching to an RPC model is not going to happen.) -n -- Nathaniel J. Smith -- http://vorpus.org From rosuav at gmail.com Tue Jun 23 01:59:51 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 23 Jun 2015 09:59:51 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Tue, Jun 23, 2015 at 3:03 AM, Gregory P. Smith wrote: > On Sat, Jun 20, 2015 at 5:42 PM Chris Angelico wrote: >> >> On Sun, Jun 21, 2015 at 7:42 AM, Eric Snow >> wrote: >> > * disallow forking within subinterpreters >> >> I love the idea as a whole (if only because the detractors can be told >> "Just use subinterpreters, then you get concurrency"), but this seems >> like a tricky restriction. That means no subprocess.Popen, no shelling >> out to other applications. And I don't know what of other restrictions >> might limit any given program. Will it feel like subinterpreters are >> "write your code according to these tight restrictions and it'll >> work", or will it be more of "most programs will run in parallel just >> fine, but there are a few things to be careful of"? > > > It wouldn't disallow use of subprocess, only os.fork(). C extension modules > can alway fork. The restriction being placed in this scheme is: "if your > extension module code forks from a subinterpreter, the child process MUST > not return control to Python." > > I'm not sure if this restriction would actually be needed or not but I agree > with it regardless. Oh! That's fine, then. Sounds good to me! ChrisA From rosuav at gmail.com Tue Jun 23 02:03:10 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 23 Jun 2015 10:03:10 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith wrote: > One possibility would be for subinterpreters to copy modules from the > main interpreter -- I guess your average module is mostly dicts, > strings, type objects, and functions; strings and functions are > already immutable and could be shared without copying, and I guess > copying the dicts and type objects into the subinterpreter is much > cheaper than hitting the disk etc. to do a real import. (Though > certainly not free.) FWIW, functions aren't immutable, but code objects are. ChrisA From greg at krypto.org Tue Jun 23 02:03:14 2015 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 23 Jun 2015 00:03:14 +0000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150623021530.74ce1ebe@x230> References: <20150623021530.74ce1ebe@x230> Message-ID: On Mon, Jun 22, 2015 at 4:15 PM Paul Sokolovsky wrote: > > > Hello from MicroPython, a lean Python implementation > scaling down to run even on microcontrollers > (https://github.com/micropython/micropython). > > Our target hardware base oftentimes lacks floating point support, and > using software emulation is expensive. So, we would like to have > versions of some timing functions, taking/returning millisecond and/or > microsecond values as integers. > > The most functionality we're interested in: > > 1. Delays > 2. Relative time (from an arbitrary starting point, expected to be > wrapped) > 3. Calculating time differences, with immunity to wrap-around. > > The first presented assumption is to use "time.sleep()" for delays, > "time.monotonic()" for relative time as the base. Would somebody gave > alternative/better suggestions? > > Second question is how to modify their names for > millisecond/microsecond versions. For sleep(), "msleep" and "usleep" > would be concise possibilities, but that doesn't map well to > monotonic(), leading to "mmonotonic". So, better idea is to use "_ms" > and "_us" suffixes: > > sleep_ms() > sleep_us() > monotonic_ms() > monotonic_us() > If you're going to add new function names, going with the _unit suffix seems best. Another option to consider: keyword only arguments. time.sleep(ms=31416) time.sleep(us=31415927) time.sleep(ns=31415296536) # We could use the long form names milliseconds, microseconds and nanoseconds but i worry with those that people would inevitably confuse ms with microseconds as times and APIs usually given the standard abbreviations rather than spelled out. time.monotonic(return_int_ns=True) ? # This seems ugly. time.monotonic_ns() seems better. These should be acceptable to add to Python 3.6 for consistency. I do not think we should have functions for each ms/us/ns unit if adding functions. Just choose the most useful high precision unit and let people do the math as needed for the others. Point 3 above isn't currently addressed by time module at all. > https://www.python.org/dev/peps/pep-0418/ mentions some internal > workaround for overflows/wrap-arounds on some systems. Due to > lean-ness of our hardware base, we'd like to make this matter explicit > to the applications and avoid internal workarounds. Proposed solution > is to have time.elapsed(time1, time2) function, which can take values > as returned by monotonic_ms(), monotonic_us(). Assuming that results of > both functions are encoded and wrap consistently (this is reasonable > assumption), there's no need for 2 separate elapsed_ms(), elapsed_us() > function. > Reading the PEP my takeaway is that wrap-around of underlying deficient system APIs should be handled by the Python VM for the user. It sounds like we should explicitly spell this out though. I don't think time.elapsed() could ever provide any utility in either case, just use subtraction. time.elapsed() wouldn't know when and where the time values came from and magically be able to apply wrap around or not to them. -gps So, the above are rough ideas we (well, I) have. We'd like to get wider > Python community feedback on them, see if there're better/alternative > ideas, how Pythonic it is, etc. To clarify, this should not be construed > as proposal to add the above functions to CPython. > > > -- > Best regards, > Paul mailto:pmiscml at gmail.com > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Tue Jun 23 01:00:41 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 23 Jun 2015 09:00:41 +1000 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <20150622230041.GA46183@cskk.homeip.net> On 22Jun2015 01:42, Devin Jeanpierre wrote: >On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson wrote: >> On 21Jun2015 19:16, Devin Jeanpierre wrote: >>> Operations that run in C but just >>> take a long time, or that are part of third-party code, will continue >>> to inhibit responsiveness. >>> >>> - Run signal handlers in a dedicated separate thread. >>> >>> IMO this is generally better than running signal handlers in the main >>> thread, because it eliminates the separate concept of "async-safe" and >>> just requires "thread-safe". So you can use regular threading >>> synchronization primitives for safety, instead of relying on luck / >>> memorized lists of atomic/re-entrant operations. >> >> Yes, I am in favour of this or something like it. Personally I would go for >> either or both of: >> >> - a stdlib function to specify the thread to handle signals instead of main > >This just moves the problem to another thread. One can already today >try to keep the main thread free to handle signals, it's just hard. Yes. But it is very easy to ensure that a specifial purpose Thread is free to handle signals. And it is arguably the minimalist change. >> - a stdlib function to declare that signals should immediately place a nice >> descriptive "signal" object on a Queue, and leaves it to the user to handle >> the queue (for example, by spawning a thread to consume it) > >I like this. It mirror's Linux's selectfd, too. One small correction, >it can't literally be a Queue, because those aren't safe to use in >signal handlers. (It can be a pipe that is wrapped in a Queue-like >interface, though, and if we do that, we can even use native signalfd >if we want.) > >It also resolves an unspoken concern I had, which is that silently >starting threads for the user feels icky. I wasn't proposing silently starting threads. I imagined the former suggestion would be handed a thread as the signal target. >>> Something still needs to run in the main thread though, for e.g. >>> KeyboardInterrupt, so this is not super straightforward. >> >> Is this necessarily true? > >What I mean is that there needs to be a way to raise KeyboardInterrupt >in the main thread from a signal handler. If, as you suggest, the old >behavior stays around, then that's enough. I was imagining the old behaviour stayed around by default, not necessarily as fixed behaviour. But "KeyboardInterrupt occurs in the main thread" is handy. Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e. always going to the main thread) but to extend "raise" to accept a thread argument: raise blah in thread Given that signals are already presented as occuring between Python opcodes, it seems reasonable to me that the signal situation could be addressed with a common mechanism extended to exceptions. How often is the question "how do I terminate another thread?" raised on python-list? Often. The standard answer is "set a flag and have the thread consult it". That is very sensitive to how often the flag is polled: too often and it infuses the code with noise (not to mention ungainly loop termination logic etc), too infrequently and the response is much like your issue here with signals: it can be arbitrarily delayed. Suppose one could raise signals in another thread? Then the answer becomes "raise exception in other_thread". And the other thread will abort as soon as the next python opcode would fire. It has several advantages: it removes any need to poll some shared state, or the set up shared state it lets the target thread remain nice and pythonic, letting unhandled exceptions simply abort the thread automatically as they would anyway it lets the target thread catch the exception and handle it if desired it dovetails neatly with our hypothetical special signal handling thread: the handling thread has merely to "raise KeyboardInterrupt in main_thread" to get the behaviour you seek to preserve, _without_ making SIGINT specially handled - the specialness is not an aspect of the handling thread's code, not hardwired >Another option, if we went with a dedicated signal handling thread, >would be that uncaught exceptions propagate to the main thread when it >gets around to it. Perhaps. But I'd rather not; you _can_ always catch every exception and if we have "raise exception in thread" we can implement the above trivially for programs which want it. Cheers, Cameron Simpson Nothing is impossible for the man who doesn't have to do it. From alexander.belopolsky at gmail.com Tue Jun 23 02:35:45 2015 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 22 Jun 2015 20:35:45 -0400 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> Message-ID: On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith wrote: > # We could use the long form names milliseconds, microseconds and > nanoseconds but i worry with those that people would inevitably confuse ms > with microseconds as times and APIs usually given the standard > abbreviations rather than spelled out. Note that datetime.timedelta uses long names: >>> timedelta(milliseconds=5, microseconds=3) datetime.timedelta(0, 0, 5003) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Tue Jun 23 02:39:14 2015 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 23 Jun 2015 00:39:14 +0000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> Message-ID: On Mon, Jun 22, 2015 at 5:35 PM Alexander Belopolsky < alexander.belopolsky at gmail.com> wrote: > > On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith wrote: > >> # We could use the long form names milliseconds, microseconds and >> nanoseconds but i worry with those that people would inevitably confuse ms >> with microseconds as times and APIs usually given the standard >> abbreviations rather than spelled out. > > > Note that datetime.timedelta uses long names: > > >>> timedelta(milliseconds=5, microseconds=3) > datetime.timedelta(0, 0, 5003) > That is a good vote for consistency with its API... -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jun 23 05:52:47 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 23 Jun 2015 13:52:47 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On 23 June 2015 at 10:03, Chris Angelico wrote: > On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith wrote: >> One possibility would be for subinterpreters to copy modules from the >> main interpreter -- I guess your average module is mostly dicts, >> strings, type objects, and functions; strings and functions are >> already immutable and could be shared without copying, and I guess >> copying the dicts and type objects into the subinterpreter is much >> cheaper than hitting the disk etc. to do a real import. (Though >> certainly not free.) > > FWIW, functions aren't immutable, but code objects are. Anything we come up with for optimised data sharing via channels could be applied to passing a prebuilt sys.modules dictionary through to subinterpreters. The key for me is to start from a well-defined "shared nothing" semantic model, but then look for ways to exploit the fact that we actually *are* running in the same address space to avoid copy objects. The current reference-counts-embedded-in-the-object-structs memory layout also plays havoc with the all-or-nothing page level copy-on-write semantics used by the fork() syscall at the operating system layer, so some of the ideas we've been considering (specifically, those related to moving the reference counter bookkeeping out of the object structs themselves) would potentially help with that as well (but would also have other hard to predict performance consequences). There's a reason Eric announced this as the *start* of a research project, rather than as a finished proposal - while it seems conceptually sound overall, there are a vast number of details to be considered that will no doubt hold a great many devils :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cs at zip.com.au Tue Jun 23 06:37:46 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 23 Jun 2015 14:37:46 +1000 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <20150623043746.GA32614@cskk.homeip.net> On 22Jun2015 16:53, Paul Moore wrote: >On 22 June 2015 at 09:42, Devin Jeanpierre wrote: >> On Mon, Jun 22, 2015 at 12:52 AM, Cameron Simpson wrote: >>> On 21Jun2015 19:16, Devin Jeanpierre wrote: >>>> This is not possible in general, but it can be made to work for all >>>> blocking operations in the stdlib. >>> >>> Hmm. I'm not sure that you will find this universally so. No, I have no >>> examples proving my intuition here. >> >> If you fix select, everything else can hypothetically follow, as you >> can run all C functions in another thread and signal the result is >> ready using an fd. (This is my idea for a hacky third-party library. >> It only ruins stack traces!) > >This particular approach presumably only works on Unix? (On Windows, >select is not a general signalling operation, it only works for >sockets). Presumably a cross-platform solution would need to use >appropriate OS-native signalling based on the platform? Certainly. Cheers, Cameron Simpson The thought of suicide is a comforting one, for with it has come a calm passage through many a bad night. - Fred Nieztsche From cs at zip.com.au Tue Jun 23 06:44:11 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 23 Jun 2015 14:44:11 +1000 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <20150623044411.GA36548@cskk.homeip.net> On 22Jun2015 10:28, Guido van Rossum wrote: >I would regret losing the behavior where just raising an exception in a >signal handler causes the main thread to be interrupted by that exception. I don't think any of us is sguuesting losing that as the default situation. I can see that losing this could be a side effect of a program choosing one of these alternatives. >I agree it would be nice if handlers ran when the main thread is waiting >for I/O. Hmm. That sounds doable (I speak as one totally unfamiliar with CPython's internals:-) Is this affected or improved by the recent discussions about I/O restarting over a signal? Cheers, Cameron Simpson No system, regardless of how sophisticated, can repeal the laws of physics or overcome careless driving actions. - Mercedes Benz From abarnert at yahoo.com Tue Jun 23 07:42:13 2015 From: abarnert at yahoo.com (Andrew Barnert) Date: Mon, 22 Jun 2015 22:42:13 -0700 Subject: [Python-ideas] Responsive signal handling In-Reply-To: <20150622230041.GA46183@cskk.homeip.net> References: <20150622230041.GA46183@cskk.homeip.net> Message-ID: On Jun 22, 2015, at 16:00, Cameron Simpson wrote: > > Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e. always going to the main thread) but to extend "raise" to accept a thread argument: > > raise blah in thread Does this need to be syntax? Why not just: mythread.throw(blah) This could even use the same mechanism as signals in 3.6, while possibly being backportable to something hackier in a C extension module for older versions. From random832 at fastmail.us Tue Jun 23 07:56:50 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Tue, 23 Jun 2015 01:56:50 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150622004157.52bc3239@fsol> References: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> <20150622004157.52bc3239@fsol> Message-ID: <1435039010.1862325.305213161.0848CAEB@webmail.messagingengine.com> On Sun, Jun 21, 2015, at 18:41, Antoine Pitrou wrote: > It's actually already the case in POSIX that most things are illegal > between fork() and exec(). However, to make fork() practical, many > libraries or frameworks tend to ignore those problems deliberately. I'm not _entirely_ sure that this applies to single-threaded programs, or even to multi-threaded programs that don't use constructs that will cause problems. The text is: "A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls." Note that it uses "may only" (which is ambiguous) rather than "shall only". It could be read that "only [stuff] until exec" is a suggestion of what the child process "may" do, under the circumstances described, to avoid the particular problems being discussed, rather than as a general prohibition. And the next paragraph is "When the application calls fork() from a signal handler and any of the fork handlers registered by pthread_atfork() calls a function that is not async-signal-safe, the behavior is undefined." suggesting that the behavior is _not_ likewise undefined when it was not called from a signal handler. Now, *vfork* is a ridiculous can of worms, which is why nobody uses it anymore, and certainly not within Python. From njs at pobox.com Tue Jun 23 08:18:24 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 22 Jun 2015 23:18:24 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <1435039010.1862325.305213161.0848CAEB@webmail.messagingengine.com> References: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> <20150622004157.52bc3239@fsol> <1435039010.1862325.305213161.0848CAEB@webmail.messagingengine.com> Message-ID: On Mon, Jun 22, 2015 at 10:56 PM, wrote: > On Sun, Jun 21, 2015, at 18:41, Antoine Pitrou wrote: >> It's actually already the case in POSIX that most things are illegal >> between fork() and exec(). However, to make fork() practical, many >> libraries or frameworks tend to ignore those problems deliberately. > > I'm not _entirely_ sure that this applies to single-threaded programs, > or even to multi-threaded programs that don't use constructs that will > cause problems. > > The text is: "A process shall be created with a single thread. If a > multi-threaded process calls fork(), the new process shall contain a > replica of the calling thread and its entire address space, possibly > including the states of mutexes and other resources. Consequently, to > avoid errors, the child process may only execute async-signal-safe > operations until such time as one of the exec functions is called. Fork > handlers may be established by means of the pthread_atfork() function in > order to maintain application invariants across fork() calls." > > Note that it uses "may only" (which is ambiguous) rather than "shall > only". It could be read that "only [stuff] until exec" is a suggestion > of what the child process "may" do, under the circumstances described, > to avoid the particular problems being discussed, rather than as a > general prohibition. Yeah, basically the way this works out is: (a) in practice on mainstream systems you can get away with forking and then doing whatever, so long as none of the threads in the parent process were holding any crucial locks, and the child is prepared for them to have all disappeared. (b) But, if something does break, then system builders reserve the right to laugh in your face. You can argue about things being technically ambiguous or whatever, but that's how it works. E.g. if you have a single-threaded program that does a matrix multiply, then forks, and then the child does a matrix multiply, and you run it on OS X linked to Apple's standard libraries, then the child will lock up, and if you report this to Apple they will close it as not-a-bug. > And the next paragraph is "When the application calls fork() from a > signal handler and any of the fork handlers registered by > pthread_atfork() calls a function that is not async-signal-safe, the > behavior is undefined." suggesting that the behavior is _not_ likewise > undefined when it was not called from a signal handler. I wouldn't read anything into this. pthread_atfork registers three handlers, and two of them are run in the parent process, where normally they'd be allowed to call any functions they like. -n -- Nathaniel J. Smith -- http://vorpus.org From cs at zip.com.au Tue Jun 23 08:08:45 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 23 Jun 2015 16:08:45 +1000 Subject: [Python-ideas] Responsive signal handling In-Reply-To: References: Message-ID: <20150623060845.GA55283@cskk.homeip.net> On 22Jun2015 22:42, Andrew Barnert wrote: >On Jun 22, 2015, at 16:00, Cameron Simpson wrote: >> >> Perhaps a better solution here is not to keep KeyboardInterrupt special (i.e. always going to the main thread) but to extend "raise" to accept a thread argument: >> >> raise blah in thread > >Does this need to be syntax? Why not just: > > mythread.throw(blah) > >This could even use the same mechanism as signals in 3.6, while possibly being backportable to something hackier in a C extension module for older versions. Indeed. I think that extending raise's syntax is a little easier on the eye, but the advantage is small. Certainly giving threads a throw method would function as well. I was indeed hoping that signals and exceptions could be delivered the same way via such a mechanism, which would also allow signals to be delivered to a chosen thread. Cheers, Cameron Simpson Thus spake the master programmer: "A well written program is its own heaven; a poorly-written program its own hell." From sturla.molden at gmail.com Tue Jun 23 13:57:47 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 23 Jun 2015 13:57:47 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> Message-ID: On 23/06/15 01:29, Gregory P. Smith wrote: > While I understand that Windows adds some overhead there, startup time > for Python worker processes is high on all OSes. No it is not. A fork() will clone the process. You don't need to run any initialization code after that. You don't need to start a new Python interpreter -- you already have one. You don't need to run module imports -- they are already imported. You don't need to pickle and build Python objects -- they are already there. Everything you had in the parent process is ready to use the child process. This magic happens so fast it is comparable to the time it takes Windows to start a thread. On Windows, CreateProcess starts an "almost empty" process. You therefore have a lot of setup code to run. This is what makes starting Python processes with multiprocessing so much slower on Windows. It is not that Windows processes are more hevy-weight than threads, they are, but the real issue is all the setup code you need to run. On Linux and Mac, you don't need to run any setup code code after a fork(). Sturla From j.wielicki at sotecware.net Tue Jun 23 15:14:25 2015 From: j.wielicki at sotecware.net (Jonas Wielicki) Date: Tue, 23 Jun 2015 15:14:25 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> Message-ID: <55895BB1.30609@sotecware.net> On 23.06.2015 13:57, Sturla Molden wrote: > On 23/06/15 01:29, Gregory P. Smith wrote: > >> While I understand that Windows adds some overhead there, startup time >> for Python worker processes is high on all OSes. > > No it is not. > > A fork() will clone the process. You don't need to run any > initialization code after that. You don't need to start a new Python > interpreter -- you already have one. You don't need to run module > imports -- they are already imported. You don't need to pickle and build > Python objects -- they are already there. Everything you had in the > parent process is ready to use the child process. This magic happens so > fast it is comparable to the time it takes Windows to start a thread. To be fair, you will nevertheless get a slowdown when copy-on-write kicks in while first using whatever was cloned from the parent. This is nothing which blocks execution, but slows down execution. That is no time which can directly be measured during the fork() call, but I would still count it into start up cost. regards, jwi -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From trent at snakebite.org Tue Jun 23 15:53:01 2015 From: trent at snakebite.org (Trent Nelson) Date: Tue, 23 Jun 2015 09:53:01 -0400 Subject: [Python-ideas] PyParallel update (was: solving multi-core Python) Message-ID: <20150623135257.GA94530@trent.me> On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote: > Furthermore, removing the GIL is perhaps an obvious solution but not > the only one. Others include Trent Nelson's PyParallels, STM, and > other Python implementations.. So, I've been sprinting relentlessly on PyParallel since Christmas, and recently reached my v0.0 milestone of being able to handle all the TEFB tests, plus get the "instantaneous wiki search" thing working too. The TEFB (Techempower Framework Benchmarks) implementation is here: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/tefb/tefb.py?at=3.3-px (The aim was to have it compete in this: https://www.techempower.com/benchmarks/#section=data-r10, but unfortunately they broke their Windows support after round 9, so there's no way to get PyParallel into the official results without fixing that first.) The wiki thing is here: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/wiki/wiki.py?at=3.3-px I particularly like the wiki example as it leverages a lot of benefits afforded by PyParallel's approach to parallelism, concurrency and asynchronous I/O: - Load a digital search trie (datrie.Trie) that contains every Wikipedia title and the byte-offset within the wiki.xml where the title was found. (Once loaded the RSS of python.exe is about 11GB; the trie itself has about 16 million items in it.) - Load a numpy array of sorted 64-bit integer offsets. This allows us to do a searchsorted() (binary search) against a given offset in order to derive the next offset. - Once we have a way of getting two byte offsets, we can use ranged HTTP requests (and TransmitFile behind the scenes) to efficiently read random chunks of the file asynchronously. (Windows has a huge advantage here -- there's simply no way to achieve similar functionality on POSIX in a non-blocking fashion (sendfile can block, a disk read() can block, a memory reference into a mmap'd file that isn't in memory will page fault, which will block).) The performance has far surpassed anything I could have imagined back during the async I/O discussions in September 2012, so, time to stick a fork in it and document the experience, which is what I'll be working on in the coming weeks. In the mean time: - There are installers available here for those that wish to play around with the current state of things: http://download.pyparallel.org/ - I wrote a little helper thing that diffs the hg tree against the original v3.3.5 tag I based the work off and committed the diffs directly -- this provides a way to review the changes that were made in order to get to the current level of functionality: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/diffs/?at=3.3-px (It only includes files that existed in the v3.3.5 tag, I don't include diffs for new files I've added.) It's probably useful reviewing the diffs after perusing pyparallel.h: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Include/pyparallel.h?at=3.3-px#cl-345 ....as you'll see lots of guards in place in most of the diffs. E.g.: Py_GUARD() -- make sure we never hit this from a parallel context Px_GUARD() -- make sure we never hit this from a main thread Py_GUARD_OBJ(o) -- make sure object o is always a main thread object Px_GUARD_OBJ(o) -- make sure object o is always a parallel object PyPx_GUARD_OBJ(o) -- if we're a parallel context, make sure it's a parallel object, if we're a main thread, make sure it's a main thread object. If you haven't heard of PyParallel before, this might be a good place to start: https://speakerdeck.com/trent/. The core concepts haven't really changed since here (re: parallel contexts, main thread, main thread objects, parallel thread objects): https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores?slide=91 Basically, if we're a main thread, "do what we normally do", if we're a parallel thread, "divert to a thread-safe alternative". And a final note: I like the recent async additions. I mean, it's unfortunate that the new keyword clashes with the module name I used to hide all the PyParallel trickery, but I'm at the point now where calling something like this from within a parallel context is exactly what I need: async f.write(...) async cursor.execute(...) I've been working on PyParallel on-and-off now for ~2.5 years and have learned a lot and churned out a lot of code -- documenting it all is actually somewhat daunting (where do I start?!), so, if anyone has specific questions about how I addressed certain things, I'm more than happy to elicit more detail on specifics. Trent. From trent at snakebite.org Tue Jun 23 16:03:55 2015 From: trent at snakebite.org (Trent Nelson) Date: Tue, 23 Jun 2015 10:03:55 -0400 Subject: [Python-ideas] PyParallel update (was: solving multi-core Python) In-Reply-To: <20150623135257.GA94530@trent.me> References: <20150623135257.GA94530@trent.me> Message-ID: <20150623140354.GB94530@trent.me> On Tue, Jun 23, 2015 at 09:53:01AM -0400, Trent Nelson wrote: > On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote: > > Furthermore, removing the GIL is perhaps an obvious solution but not > > the only one. Others include Trent Nelson's PyParallels, STM, and > > other Python implementations.. > > So, I've been sprinting relentlessly on PyParallel since Christmas, and > recently reached my v0.0 milestone of being able to handle all the TEFB > tests, plus get the "instantaneous wiki search" thing working too. > > The TEFB (Techempower Framework Benchmarks) implementation is here: > https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/tefb/tefb.py?at=3.3-px > (The aim was to have it compete in this: https://www.techempower.com/benchmarks/#section=data-r10, but unfortunately they broke their Windows support after round 9, so there's no way to get PyParallel into the official results without fixing that first.) > > The wiki thing is here: > > https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/wiki/wiki.py?at=3.3-px > > I particularly like the wiki example as it leverages a lot of benefits > afforded by PyParallel's approach to parallelism, concurrency and > asynchronous I/O: > - Load a digital search trie (datrie.Trie) that contains every > Wikipedia title and the byte-offset within the wiki.xml where > the title was found. (Once loaded the RSS of python.exe is about > 11GB; the trie itself has about 16 million items in it.) Oops, I was off by about 12 million: C:\PyParallel33>python.exe PyParallel 3.3.5 (3.3-px:829ae345012e+, Jun 15 2015, 16:54:16) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.chdir('examples\\wiki') >>> import wiki as w About to load titles trie, this will take a while... >>> len(w.titles) 27962169 From sturla.molden at gmail.com Tue Jun 23 16:55:31 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 23 Jun 2015 16:55:31 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <55895BB1.30609@sotecware.net> References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> <55895BB1.30609@sotecware.net> Message-ID: On 23/06/15 15:14, Jonas Wielicki wrote: > To be fair, you will nevertheless get a slowdown when copy-on-write > kicks in while first using whatever was cloned from the parent. This is > nothing which blocks execution, but slows down execution. Yes, particularly because of reference counts. Unfortunately Python stores refcounts within the PyObject struct. And when a refcount is updated a copy of the entire 4 KB page is triggered. There would be fare less of this if refcounts was kept in dedicated pages. Sturla From barry at python.org Tue Jun 23 18:01:18 2015 From: barry at python.org (Barry Warsaw) Date: Tue, 23 Jun 2015 12:01:18 -0400 Subject: [Python-ideas] solving multi-core Python References: <5585E37F.4060403@gmail.com> Message-ID: <20150623120118.333922bb@anarchist.wooz.org> On Jun 23, 2015, at 01:52 PM, Nick Coghlan wrote: >The current reference-counts-embedded-in-the-object-structs memory >layout also plays havoc with the all-or-nothing page level >copy-on-write semantics used by the fork() syscall at the operating >system layer, so some of the ideas we've been considering >(specifically, those related to moving the reference counter >bookkeeping out of the object structs themselves) would potentially >help with that as well (but would also have other hard to predict >performance consequences). A crazy offshoot idea would be something like Emacs' unexec, where during the build process you could preload a bunch of always-used immutable modules, then freeze the state in such a way that starting up again later would be much faster, because the imports (and probably more importantly, the searching) could be avoided. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From jsbueno at python.org.br Tue Jun 23 18:55:09 2015 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 23 Jun 2015 13:55:09 -0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> Message-ID: For new functions altogether, maybe namespaces could be the a nice option -- from time.milliseconds import sleep, monotonic Named parameters would be a better way to implement it,though - I just don't know if having to go through the function that does have to be ready to handle floats anyway won't be in the way of the desired optimization On 22 June 2015 at 21:39, Gregory P. Smith wrote: > > > On Mon, Jun 22, 2015 at 5:35 PM Alexander Belopolsky > wrote: >> >> >> On Mon, Jun 22, 2015 at 8:03 PM, Gregory P. Smith wrote: >>> >>> # We could use the long form names milliseconds, microseconds and >>> nanoseconds but i worry with those that people would inevitably confuse ms >>> with microseconds as times and APIs usually given the standard abbreviations >>> rather than spelled out. >> >> >> Note that datetime.timedelta uses long names: >> >> >>> timedelta(milliseconds=5, microseconds=3) >> datetime.timedelta(0, 0, 5003) > > > That is a good vote for consistency with its API... > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From trent at snakebite.org Tue Jun 23 20:29:40 2015 From: trent at snakebite.org (Trent Nelson) Date: Tue, 23 Jun 2015 14:29:40 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> Message-ID: <20150623182939.GD94530@trent.me> On Sun, Jun 21, 2015 at 12:40:43PM +0200, Stefan Behnel wrote: > Nick Coghlan schrieb am 21.06.2015 um 12:25: > > On 21 June 2015 at 19:48, Antoine Pitrou wrote: > >> On Sun, 21 Jun 2015 16:31:33 +1000 Nick Coghlan wrote: > >>> > >>> For inter-interpreter communication, the worst case scenario is having > >>> to rely on a memcpy based message passing system (which would still be > >>> faster than multiprocessing's serialisation + IPC overhead) > >> > >> And memcpy() updates pointer references to dependent objects magically? > >> Surely you meant the memdeepcopy() function that's part of every > >> standard C library! > > > > We already have the tools to do deep copies of object trees (although > > I'll concede I *was* actually thinking in terms of the classic C/C++ > > mistake of carelessly copying pointers around when I wrote that > > particular message). One of the options for deep copies tends to be a > > pickle/unpickle round trip, which will still incur the serialisation > > overhead, but not the IPC overhead. > > > > "Faster message passing than multiprocessing" sets the baseline pretty > > low, after all. > > > > However, this is also why Eric mentions the notions of object > > ownership or limiting channels to less than the full complement of > > Python objects. As an *added* feature at the Python level, it's > > possible to initially enforce restrictions that don't exist in the C > > level subinterpeter API, and then work to relax those restrictions > > over time. > > If objects can make it explicit that they support sharing (and preferably > are allowed to implement the exact details themselves), I'm sure we'll find > ways to share NumPy arrays across subinterpreters. That feature alone tends > to be a quick way to make a lot of people happy. FWIW, the following commit was all it took to get NumPy playing nicely with PyParallel: https://github.com/pyparallel/numpy/commit/046311ac1d66cec789fa8fd79b1b582a3dea26a8 It uses thread-local buckets instead of static ones, and calls out to PyMem_Raw(Malloc|Realloc|Calloc|Free) instead of the normal libc counterparts. This means PyParallel will intercept the call within a parallel context and divert it to the per-context heap. Example parallel callback using NumPy: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/examples/wiki/wiki.py?at=3.3-px#cl-285 (Also, datrie is a Cython module, and that seems to work fine as well, which is neat, as it means you could sub out the entire Python callback with a Cythonized version, including all the relatively-slow-compared-to-C http header parsing that happens in async.http.server.) Trent. From greg at krypto.org Tue Jun 23 21:26:23 2015 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 23 Jun 2015 19:26:23 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150623120118.333922bb@anarchist.wooz.org> References: <5585E37F.4060403@gmail.com> <20150623120118.333922bb@anarchist.wooz.org> Message-ID: On Tue, Jun 23, 2015 at 9:01 AM Barry Warsaw wrote: > On Jun 23, 2015, at 01:52 PM, Nick Coghlan wrote: > > >The current reference-counts-embedded-in-the-object-structs memory > >layout also plays havoc with the all-or-nothing page level > >copy-on-write semantics used by the fork() syscall at the operating > >system layer, so some of the ideas we've been considering > >(specifically, those related to moving the reference counter > >bookkeeping out of the object structs themselves) would potentially > >help with that as well (but would also have other hard to predict > >performance consequences). > > A crazy offshoot idea would be something like Emacs' unexec, where during > the > build process you could preload a bunch of always-used immutable modules, > then > freeze the state in such a way that starting up again later would be much > faster, because the imports (and probably more importantly, the searching) > could be avoided. > I actually would like something like this for Python, but I want it to work with hash randomization rather than freezing a single fixed hash seed. That means you'd need to record the location of all hash tables and cached hashes and fix them up after loading such a binary image at process start time, much like processing relocations when loading a binary executable. Non trivial. -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Tue Jun 23 22:25:00 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Tue, 23 Jun 2015 23:25:00 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> Message-ID: <20150623232500.45efccdf@x230> Hello, On Tue, 23 Jun 2015 00:03:14 +0000 "Gregory P. Smith" wrote: [] > > sleep_ms() > > sleep_us() > > monotonic_ms() > > monotonic_us() > > > > If you're going to add new function names, going with the _unit suffix > seems best. > > Another option to consider: keyword only arguments. > > time.sleep(ms=31416) > time.sleep(us=31415927) > time.sleep(ns=31415296536) That doesn't immediately map to usage for monotonic(), as you mention below. Another issue is that keywords arguments on average (and for MicroPython all the time) are less efficient than positional. Put it other way, t = monotonic_ns() t = monotonic_ns() - t is going to give lower number than t = monotonic(ns=True) t = monotonic(ns=True) - t , and the closer it to 0, the better. > # We could use the long form names milliseconds, microseconds and > nanoseconds but i worry with those that people would inevitably > confuse ms with microseconds as times and APIs usually given the > standard abbreviations rather than spelled out. Another issue is that full spellings are rather long. Logistically, while function names can be expected to have autocompletion support, keyword arguments not necessarily. > time.monotonic(return_int_ns=True) ? > # This seems ugly. time.monotonic_ns() seems better. > > These should be acceptable to add to Python 3.6 for consistency. Well, as I mentioned, I'm personally not looking for this to be implemented in CPython right away. Ideally, this should be tested by >1 independent "embedded" Python implementation first, and only then, based on the actual experience, submitted as a PEP. That's rather better than "desktop" CPython, which doesn't care about all the subtle "embedded" aspects "forced" a way to implement it. > I do not think we should have functions for each ms/us/ns unit if > adding functions. Just choose the most useful high precision unit > and let people do the math as needed for the others. Well, that's one of examples of that "desktop" thinking ;-). Consider for example that 2^32 microseconds is just over an hour, so expressing everything in microseconds would require arbitrary-precision integers, which may be just the same kind of burden for an embedded system as floats. > > Point 3 above isn't currently addressed by time module at all. > > https://www.python.org/dev/peps/pep-0418/ mentions some internal [] > Reading the PEP my takeaway is that wrap-around of underlying > deficient system APIs should be handled by the Python VM for the > user. It sounds like we should explicitly spell this out though. This is another point which is overlooked by "desktop" programmers - time counters can, will, and do wrap around. Big OSes try hard to to hide this fact, and indeed succeed well enough, so in cases when they do fail, it has shattering effect (at least PR-wise) - Y2K, Y2K38 problems. For an embedded programmer wrapping counters is objective reality, and we wouldn't like to hide that fact in MicroPython (of course, only for these, newly introduced real-time precision time functions). > I don't think time.elapsed() could ever provide any utility in either > case, just use subtraction. Can't work. Previous value of monotonic_us() is 65530, next value is 10, what does it tell you? > time.elapsed() wouldn't know when and > where the time values came from and magically be able to apply wrap > around or not to them. Well, as I mentioned, it's an API contract that elapsed() takes values of monotonic_ms(), monotonic_us(), etc. functions, and knows law how their values change (likely, apply unsigned power-of-2 modular arithmetics). There's additional restriction that this change law for all of monotonic_ms(), monotonic_us() is the same, but I personally find this an acceptable restriction to not bloat API even further. (But it is a restriction, for example, if nano/microsecond time source is 24-bit counter, than millisecond time is limited to 24 bits too). > > -gps > -- Best regards, Paul mailto:pmiscml at gmail.com From solipsis at pitrou.net Tue Jun 23 22:46:23 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 23 Jun 2015 22:46:23 +0200 Subject: [Python-ideas] solving multi-core Python References: <20150621114846.06bc8dc8@fsol> <20150623182939.GD94530@trent.me> Message-ID: <20150623224623.70e98b07@fsol> Hey Trent, You may be interested in this PR for Numpy: https://github.com/numpy/numpy/pull/5470 Regards Antoine. > FWIW, the following commit was all it took to get NumPy playing > nicely with PyParallel: > > https://github.com/pyparallel/numpy/commit/046311ac1d66cec789fa8fd79b1b582a3dea26a8 From greg at krypto.org Wed Jun 24 01:32:55 2015 From: greg at krypto.org (Gregory P. Smith) Date: Tue, 23 Jun 2015 23:32:55 +0000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150623232500.45efccdf@x230> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> Message-ID: On Tue, Jun 23, 2015 at 1:25 PM Paul Sokolovsky wrote: > Hello, > > On Tue, 23 Jun 2015 00:03:14 +0000 > "Gregory P. Smith" wrote: > > [] > > > > sleep_ms() > > > sleep_us() > > > monotonic_ms() > > > monotonic_us() > > > > > > > If you're going to add new function names, going with the _unit suffix > > seems best. > > > > Another option to consider: keyword only arguments. > > > > time.sleep(ms=31416) > > time.sleep(us=31415927) > > time.sleep(ns=31415296536) > > That doesn't immediately map to usage for monotonic(), as you mention > below. > > Another issue is that keywords arguments on average (and for > MicroPython all the time) are less efficient than positional. Put it > other way, > > t = monotonic_ns() > t = monotonic_ns() - t > > is going to give lower number than > > t = monotonic(ns=True) > t = monotonic(ns=True) - t > > , and the closer it to 0, the better. > > > # We could use the long form names milliseconds, microseconds and > > nanoseconds but i worry with those that people would inevitably > > confuse ms with microseconds as times and APIs usually given the > > standard abbreviations rather than spelled out. > > Another issue is that full spellings are rather long. Logistically, > while function names can be expected to have autocompletion support, > keyword arguments not necessarily. > > > time.monotonic(return_int_ns=True) ? > > # This seems ugly. time.monotonic_ns() seems better. > > > > These should be acceptable to add to Python 3.6 for consistency. > > Well, as I mentioned, I'm personally not looking for this to be > implemented in CPython right away. Ideally, this should be tested by >1 > independent "embedded" Python implementation first, and only then, based > on the actual experience, submitted as a PEP. That's rather better than > "desktop" CPython, which doesn't care about all the subtle "embedded" > aspects "forced" a way to implement it. > > > I do not think we should have functions for each ms/us/ns unit if > > adding functions. Just choose the most useful high precision unit > > and let people do the math as needed for the others. > > Well, that's one of examples of that "desktop" thinking ;-). > Consider for example that 2^32 microseconds is just over an hour, so > expressing everything in microseconds would require arbitrary-precision > integers, which may be just the same kind of burden for an embedded > system as floats. > > I know. I was actually hoping you'd respond on that point because I haven't used micropython yet. I assumed it had bignum, or at least fixed "big" 64-bit number, support. But if it does not, having specific functions for the needed resolutions makes a lot of sense. > > Point 3 above isn't currently addressed by time module at all. > > > https://www.python.org/dev/peps/pep-0418/ mentions some internal > > [] > > > Reading the PEP my takeaway is that wrap-around of underlying > > deficient system APIs should be handled by the Python VM for the > > user. It sounds like we should explicitly spell this out though. > > This is another point which is overlooked by "desktop" programmers - > time counters can, will, and do wrap around. Big OSes try hard to to > hide this fact, and indeed succeed well enough, so in cases when they > do fail, it has shattering effect (at least PR-wise) - Y2K, Y2K38 > problems. For an embedded programmer wrapping counters is objective > reality, and we wouldn't like to hide that fact in MicroPython > (of course, only for these, newly introduced real-time precision time > functions). > I still don't see how an elapsed() function taking two arbitrary integer arguments could work in a meaningful manner. Even if you assume they are the same units, the only assumption that can be made is that if the second int is lower than the first, at least one wraparound occurred. > I don't think time.elapsed() could ever provide any utility in either > > case, just use subtraction. > > Can't work. Previous value of monotonic_us() is 65530, next value is > 10, what does it tell you? > At least one wrap around occurred. without more information you cannot know how many. > time.elapsed() wouldn't know when and > > where the time values came from and magically be able to apply wrap > > around or not to them. > > Well, as I mentioned, it's an API contract that elapsed() takes values > of monotonic_ms(), monotonic_us(), etc. functions, and knows law how > their values change (likely, apply unsigned power-of-2 modular > arithmetics). There's additional restriction that this change law for > all of monotonic_ms(), monotonic_us() is the same, but I personally > find this an acceptable restriction to not bloat API even further. (But > it is a restriction, for example, if nano/microsecond time source is > 24-bit counter, than millisecond time is limited to 24 bits too). > I guess what I'm missing is how you intend to tell elapsed() which of the _ms vs _us vs _ns functions the values came from. I'm assuming that all functions are likely to exist at once rather than there being only one high resolution integer time function. Given that, yes, you can make elapsed() do what you want. But I really think you should call it something more specific than elapsed if the function is serving as a common source of information on how a particular type of timer on the system works. monotonic_elapsed() perhaps? etc.. Also, agreed, we don't need these in 3.6. I'm not seeing anything really objectionable for inclusion in a future 3.x which is all I'm really looking out for. It sounds like avoiding keyword arguments and adding _ms _us and _ns variants of functions is the practical solution for micropython. -gps (awaiting his WiPys :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Wed Jun 24 01:32:47 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Tue, 23 Jun 2015 16:32:47 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> <55895BB1.30609@sotecware.net> Message-ID: On Tue, Jun 23, 2015 at 7:55 AM, Sturla Molden wrote: > On 23/06/15 15:14, Jonas Wielicki wrote: > >> To be fair, you will nevertheless get a slowdown when copy-on-write >> kicks in while first using whatever was cloned from the parent. This is >> nothing which blocks execution, but slows down execution. > > > Yes, particularly because of reference counts. Unfortunately Python stores > refcounts within the PyObject struct. And when a refcount is updated a copy > of the entire 4 KB page is triggered. There would be fare less of this if > refcounts was kept in dedicated pages. A coworker of mine wrote a patch to Python that allows you to freeze refcounts for all existing objects before forking, if the correct compile options are set. This adds overhead to incref/decref, but dramatically changes the python+fork memory usage story. (I haven't personally played with it much, but it sounds decent.) If there's any interest I can try to upstream this change, guarded behind a compiler flag. We've also tried moving refcounts to their own pages, like you and Nick suggest, but it breaks a *lot* of third-party code. I can try to upstream it. If it's guarded by a compiler flag it is probably still useful, just any users would have to grep through their dependencies to make sure nothing directly accesses the refcount. (The stdlib can be made to work.) It sounds like it would also be useful for the main project in the topic of this thread, so I imagine there's more momentum behind it. -- Devin From njs at pobox.com Wed Jun 24 01:46:05 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 23 Jun 2015 16:46:05 -0700 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> Message-ID: On Tue, Jun 23, 2015 at 4:32 PM, Gregory P. Smith wrote: > > I still don't see how an elapsed() function taking two arbitrary integer > arguments could work in a meaningful manner. Even if you assume they are > the same units, the only assumption that can be made is that if the second > int is lower than the first, at least one wraparound occurred. Assuming you have an n-bit clock: (1) if you have arbitrary storage and the ability to do some sort of interrupt handling at least once per wraparound period, then you can reliably measure any duration. (2) if you don't have that, but can assume that at most one wraparound has occurred, then you can reliably measure any duration up to 2**n time units. (3) if you can't even make that assumption, then you can't reliably measure any duration whatsoever, so there's no point in even having the clock. I guess micropython is targeting platforms that can't afford option (1), but would like to at least take advantage of option (2)? -n -- Nathaniel J. Smith -- http://vorpus.org From ericsnowcurrently at gmail.com Wed Jun 24 04:18:36 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 20:18:36 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sat, Jun 20, 2015 at 11:25 PM, Nathaniel Smith wrote: > I'd love to see just a hand wavy, verbal proof-of-concept walking through > how this might work in some simple but realistic case. To me a single > compelling example could make this proposal feel much more concrete and > achievable. Here's a vague example: ------------------ from subinterpreters import Subinterpreter, Channel def handle_job(val): if not isinstance(val, (int, float)): raise RuntimeError("{!r} not a valid arg".format(val)) # something potentially expensive... def runner(ch): while True: value = ch.pop() # blocks if value is None: break handle_job(value) ch = Channel() sub = Subinterpreter() task = sub.run(runner, ch) data = get_data() for immutable_item in data: ch.push(immutable_item) if task.is_alive(): ch.push(None) task.join() exc = task.exception() if exc is not None: raise RuntimeError from exc def verify(data): # make sure runner did its job ... task = sub.run(verify, data) # do other stuff while we wait task.join() sub.destroy() ------------------ > There aren't really many options for mutable objects, right? If you want > shared nothing semantics, then transmitting a mutable object either needs to > make a copy, or else be a real transfer, where the sender no longer has it > (cf. Rust). > > I guess for the latter you'd need some new syntax for send-and-del, that > requires the object to be self contained (all mutable objects reachable from > it are only referenced by each other) and have only one reference in the > sending process (which is the one being sent and then destroyed). Right. The idea of a self-contained object graph is something we'd need if we went that route. That's why initially we should focus on sharing only immutable objects. > >> Keep in mind that by "immutability" I'm talking about *really* immutable, >> perhaps going so far as treating the full memory space associated with an >> object as frozen. For instance, we'd have to ensure that "immutable" Python >> objects like strings, ints, and tuples do not change (i.e. via the C API). > > This seems like a red herring to me. It's already the case that you can't > legally use the c api to mutate tuples, ints, for any object that's ever > been, say, passed to a function. So for these objects, the subinterpreter > setup doesn't actually add any new constraints on user code. Fair enough. > > C code is always going to be *able* to break memory safety so long as you're > using shared-memory threading at the c level to implement this stuff. We > just need to make it easy not to. Exactly. > > Refcnts and garbage collection are another matter, of course. Agreed. :) -eric From ericsnowcurrently at gmail.com Wed Jun 24 04:37:43 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 20:37:43 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 12:31 AM, Nick Coghlan wrote: > The fact that mod_wsgi can run most Python web applications in a > subinterpreter quite happily means we already know the core mechanism > works fine, This is a pretty important point. > and there don't appear to be any insurmountable technical > hurdles between the status quo and getting to a point where we can > either switch the GIL to a read/write lock where a write lock is only > needed for inter-interpreter communications, or else find a way for > subinterpreters to release the GIL entirely by restricting them > appropriately. Proper multi-core operation will require at least some changes relative to the GIL. My goal is to execute the least amount of change at first. We can build on that. > > For inter-interpreter communication, the worst case scenario is having > to rely on a memcpy based message passing system (which would still be > faster than multiprocessing's serialisation + IPC overhead), By initially focusing on immutable objects we shouldn't need to go that far. That said, a memcpy-based solution may very well be a good next step once the basic goals of the project are met. > but there > don't appear to be any insurmountable barriers to setting up an object > ownership based system instead Agreed. That's something we can experiment with once we get the core of the project working. > (code that accesses PyObject_HEAD > fields directly rather than through the relevant macros and functions > seems to be the most likely culprit for breaking, but I think "don't > do that" is a reasonable answer there). :) > > There's plenty of prior art here (including a system I once wrote in C > myself atop TI's DSP/BIOS MBX and TSK APIs), so I'm comfortable with > Eric's "simple matter of engineering" characterisation of the problem > space. Good. :) > > The main reason that subinterpreters have never had a Python API > before is that they have enough rough edges that having to write a > custom C extension module to access the API is the least of your > problems if you decide you need them. At the same time, not having a > Python API not only makes them much harder to test, which means > various aspects of their operation are more likely to be broken, but > also makes them inherently CPython specific. > > Eric's proposal essentially amounts to three things: > > 1. Filing off enough of the rough edges of the subinterpreter support > that we're comfortable giving them a public Python level API that > other interpreter implementations can reasonably support > 2. Providing the primitives needed for safe and efficient message > passing between subinterpreters > 3. Allowing subinterpreters to truly execute in parallel on multicore machines > > All 3 of those are useful enhancements in their own right, which > offers the prospect of being able to make incremental progress towards > the ultimate goal of native Python level support for distributing > across multiple cores within a single process. Yep. That sums it up pretty well. That decomposition should make it a bit easier to move the project forward. -eric From ericsnowcurrently at gmail.com Wed Jun 24 04:39:32 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 20:39:32 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 12:41 AM, Wes Turner wrote: > Exciting! > > * > http://zero-buffer.readthedocs.org/en/latest/api-reference/#zero_buffer.BufferView > * https://www.google.com/search?q=python+channels > * https://docs.python.org/2/library/asyncore.html#module-asyncore > * https://chan.readthedocs.org/en/latest/ > * https://goless.readthedocs.org/en/latest/ > * other approaches to the problem (with great APIs): > * http://celery.readthedocs.org/en/latest/userguide/canvas.html#chords > * http://discodb.readthedocs.org/en/latest/ Thanks. -eric From ericsnowcurrently at gmail.com Wed Jun 24 05:05:13 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 21:05:13 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150621115443.70ddcf28@fsol> References: <20150621033813.GJ20701@ando.pearwood.info> <20150621115443.70ddcf28@fsol> Message-ID: On Sun, Jun 21, 2015 at 3:54 AM, Antoine Pitrou wrote: > On Sat, 20 Jun 2015 23:01:20 -0600 > Eric Snow > wrote: >> The only consequential shared piece is the >> GIL and my proposal should render the GIL irrelevant for the most part. > > All singleton objects, built-in types are shared and probably a number > of other things hidden in dark closets... Yep. I expect we'll be able to sort those out under the assumption that 99% of the time they can be treated as immutable. We'll then have to find a way to keep the corner cases from breaking the subinterpreter isolation. >Not to mention the memory allocator. This is a sticky part that I've been considering from almost day 1. It's not the #1 problem to solve, but it will be an important one if we want to have truly parallel subinterpreters. > > By the way, what you're aiming to do is conceptually quite similar to > Trent's PyParallel (thought Trent doesn't use subinterpreters, his main > work is around trying to making object sharing safe without any GIL to > trivially protect the sharing), so you may want to pair with him. Of > course, you may end up with a Windows-only Python interpreter :-) Right. I read through Trent's work on several occasions and have gleaned a couple lessons related to object sharing. I was planning on getting in touch with Trent in the near future. > > I'm under the impression you're underestimating the task at hand here. > Or perhaps you're not and you're just willing to present it in a > positive way :-) I'd like to think it's the latter. :) The main reason why I'm hopeful we can make a meaningful change for 3.6 is that I don't foresee any major changes to CPython's internals. Nearly all the necessary pieces are already there. I'm also intent on taking a minimal approach initially. We can build on it from there, easing restrictions that allowed us to roll out the initial implementation more quickly. All that said, I won't be surprised if it takes the entire 3.6 dev cycle to get it right. -eric From ericsnowcurrently at gmail.com Wed Jun 24 05:08:36 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 21:08:36 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> Message-ID: On Sun, Jun 21, 2015 at 4:25 AM, Nick Coghlan wrote: > We already have the tools to do deep copies of object trees (although > I'll concede I *was* actually thinking in terms of the classic C/C++ > mistake of carelessly copying pointers around when I wrote that > particular message). One of the options for deep copies tends to be a > pickle/unpickle round trip, which will still incur the serialisation > overhead, but not the IPC overhead. This does make me wonder if it would be worth pursuing a mechanism for encapsulating an object graph, such that it would be easier to manage/copy the graph as a whole. > > "Faster message passing than multiprocessing" sets the baseline pretty > low, after all. > > However, this is also why Eric mentions the notions of object > ownership or limiting channels to less than the full complement of > Python objects. As an *added* feature at the Python level, it's > possible to initially enforce restrictions that don't exist in the C > level subinterpeter API, and then work to relax those restrictions > over time. Precisely. -eric From ericsnowcurrently at gmail.com Wed Jun 24 06:15:10 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 22:15:10 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> Message-ID: On Sun, Jun 21, 2015 at 4:40 AM, Stefan Behnel wrote: > If objects can make it explicit that they support sharing (and preferably > are allowed to implement the exact details themselves), I'm sure we'll find > ways to share NumPy arrays across subinterpreters. That feature alone tends > to be a quick way to make a lot of people happy. Are you thinking of something along the lines of a dunder method (e.g. __reduce__)? -eric From ericsnowcurrently at gmail.com Wed Jun 24 06:19:07 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 22:19:07 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 4:54 AM, Stefan Behnel wrote: > I also had some discussions about these things with Nick before. Not sure > if you really meant PEP 384 (you might have) or rather PEP 489: > > https://www.python.org/dev/peps/pep-0489/ > I did mean PEP 384, but PEP 489 is certainly related as I expect we'll make participation in this subinterpreter model by extension modules opt-in. Basically they will need to promise that they will work within the restricted environment. > I consider that one more important here, as it will eventually allow Cython > modules to support subinterpreters. Unless, as you mentioned, they use > global C state, but only in external C code, e.g. wrapped libraries. Cython > should be able to handle most of the module internal global state on a > per-interpreter basis itself, without too much user code impact. Great. > > I'm totally +1 for the idea. I hope that I'll find the time (well, and > money) to work on PEP 489 in Cython soon, so that I can prove it right for > actual real-world code in Python 3.5. We'll then see about subinterpreter > support. That's certainly the next step. That would be super. -eric From ericsnowcurrently at gmail.com Wed Jun 24 06:21:33 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 22:21:33 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> <20150621124105.52c194c9@fsol> Message-ID: On Sun, Jun 21, 2015 at 4:57 AM, Nick Coghlan wrote: > I'd want us to eventually aim for zero-copy speed for at least known > immutable values (int, str, float, etc), immutable containers of > immutable values (tuple, frozenset), and for types that support both > publishing and consuming data via the PEP 3118 buffer protocol without > making a copy. > > For everything else I'd be fine with a starting point that was at > least no slower than multiprocessing (which shouldn't be difficult, > since we'll at least save the IPC overhead even if there are cases > where communication between subinterpreters falls back to > serialisation rather than doing something more CPU and memory > efficient). Makes sense. -eric From storchaka at gmail.com Wed Jun 24 06:59:12 2015 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 24 Jun 2015 07:59:12 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150623021530.74ce1ebe@x230> References: <20150623021530.74ce1ebe@x230> Message-ID: On 23.06.15 02:15, Paul Sokolovsky wrote: > Hello from MicroPython, a lean Python implementation > scaling down to run even on microcontrollers > (https://github.com/micropython/micropython). > > Our target hardware base oftentimes lacks floating point support, and > using software emulation is expensive. So, we would like to have > versions of some timing functions, taking/returning millisecond and/or > microsecond values as integers. What about returning decimals or special fixed-precision numbers (internally implemented as 64-bit integer with constant scale 1000 or 1000000)? From ericsnowcurrently at gmail.com Wed Jun 24 07:01:24 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:01:24 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 5:41 AM, Sturla Molden wrote: > From the perspective of software design, it would be good it the CPython > interpreter provided an environment instead of using global objects. It > would mean that all functions in the C API would need to take the > environment pointer as their first variable, which will be a major rewrite. > It would also allow the "one interpreter per thread" design similar to tcl > and .NET application domains. While perhaps a worthy goal, I don't know that it fits in well with my goals. I'm aiming for an improved multi-core story with a minimum of change in the interpreter. > > However, from the perspective of multi-core parallel computing, I am not > sure what this offers over using multiple processes. > > Yes, you avoid the process startup time, but on POSIX systems a fork is very > fast. An certainly, forking is much more efficient than serializing Python > objects. You still need the mechanism to safely and efficiently share (at least some) objects between interpreters after forking. I expect this will be simpler within the same process. > It then boils down to a workaround for the fact that Windows cannot > fork, which makes it particularly bad for running CPython. We cannot leave Windows out in the cold. > You also have to > start up a subinterpreter and a thread, which is not instantaneous. So I am > not sure there is a lot to gain here over calling os.fork. One key difference is that with a subinterpreter you are basically starting with a clean slate. The isolation between interpreters extends to the initial state. That level of isolation is a desirable feature because you can more clearly reason about the state of the running tasks. > > A non-valid argument for this kind of design is that only code which uses > threads for parallel computing is "real" multi-core code. So Python does not > support multi-cores because multiprocessing or os.fork is just faking it. > This is an argument that belongs in the intellectual junk yard. It stems > from the abuse of threads among Windows and Java developers, and is rooted > in the absence of fork on Windows and the formerly slow fork on Solaris. And > thus they are only able to think in terms of threads. If threading.Thread > does not scale the way they want, they think multicores are out of reach. Well, perception is 9/10ths of the law. :) If the multi-core problem is already solved in Python then why does it fail in the court of public opinion. The perception that Python lacks a good multi-core story is real, leads organizations away from Python, and will not improve without concrete changes. Contrast that with Go or Rust or many other languages that make it simple to leverage multiple cores (even if most people never need to). > > So the question is, how do you want to share objects between > subinterpreters? And why is it better than IPC, when your idea is to isolate > subinterpreters like application domains? In return, my question is, what is the level of effort to get fork+IPC to do what we want vs. subinterpreters? Note that we need to accommodate Windows as more than an afterthought (or second-class citizen), as well as other execution environments (e.g. embedded) where we may not be able to fork. > > If you think avoiding IPC is clever, you are wrong. IPC is very fast, in > fact programs written to use MPI tends to perform and scale better than > programs written to use OpenMP in parallel computing. I'd love to learn more about that. I'm sure there are some great lessons on efficiently and safely sharing data between isolated execution environments. That said, how does IPC compare to passing objects around within the same process? > Not only is IPC fast, > but you also avoid an issue called "false sharing", which can be even more > detrimental than the GIL: You have parallel code, but it seems to run in > serial, even though there is no explicit serialization anywhere. And by > since Murphy's law is working against us, Python reference counts will be > false shared unless we use multiple processes. Solving reference counts in this situation is a separate issue that will likely need to be resolved, regardless of which machinery we use to isolate task execution. > The reason IPC in multiprocessing is slow is due to calling pickle, it is > not the IPC in itself. A pipe or an Unix socket (named pipe on Windows) have > the overhead of a memcpy in the kernel, which is equal to a memcpy plus some > tiny constant overhead. And if you need two processes to share memory, there > is something called shared memory. Thus, we can send data between processes > just as fast as between subinterpreters. IPC sounds great, but how well does it interact with Python's memory management/allocator? I haven't looked closely but I expect that multiprocessing does not use IPC anywhere. > > All in all, I think we are better off finding a better way to share Python > objects between processes. I expect that whatever solution we would find for subinterpreters would have a lot in common with the same thing for processes. > P.S. Another thing to note is that with sub-interpreters, you can forget > about using ctypes or anything else that uses the simplified GIL API (e.g. > certain Cython generated extensions). On the one hand there are some rough edges with subinterpreters that need to be fixed. On the other hand, we will have to restrict the subinterpreter model (at least initially) in ways that would likely preclude operation of existing extension modules. -eric From ericsnowcurrently at gmail.com Wed Jun 24 07:26:08 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:26:08 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Sun, Jun 21, 2015 at 5:55 AM, Devin Jeanpierre wrote: > On Sat, Jun 20, 2015 at 4:16 PM, Eric Snow wrote: >> >> On Jun 20, 2015 4:55 PM, "Devin Jeanpierre" wrote: >>> >>> It's worthwhile to consider fork as an alternative. IMO we'd get a >>> lot out of making forking safer, easier, and more efficient. (e.g. >>> respectively: adding an atfork registration mechanism; separating out >>> the bits of multiprocessing that use pickle from those that d, I still disagreeon't; >>> moving the refcount to a separate page, or allowing it to be frozen >>> prior to a fork.) >> >> So leverage a common base of code with the multiprocessing module? > > What is this question in response to? I don't understand. It sounded like you were suggesting that we factor out a common code base that could be used by multiprocessing and the other machinery and that only multiprocessing would keep the pickle-related code. > >> I would expect subinterpreters to use less memory. Furthermore creating >> them would be significantly faster. Passing objects between them would be >> much more efficient. And, yes, cross-platform. > > Maybe I don't understand how subinterpreters work. AIUI, the whole > point of independent subinterpreters is that they share no state. So > if I have a web server, each independent serving thread has to do all > of the initialization (import HTTP libraries, etc.), right? Yes. However, I expect that we could mitigate that cost to some extent. > Compare > with forking, where the initialization is all done and then you fork, > and you are immediately ready to serve, using the data structures > shared with all the other workers, which is only copied when it is > written to. So forking starts up faster and uses less memory (due to > shared memory.) But we are aiming for a share-nothing model with an efficient object-passing mechanism. Furthermore, subinterpreters do not have to be single-use. My proposal includes running tasks in an existing subinterpreter (e.g. executor pool), so that start-up cost is mitigated in cases where it matters. Note that ultimately my goal is to make it obvious and undeniable that Python (3.6+) has a good multi-core story. In my proposal, subinterpreters are a means to an end. If there's a better solution then great! As long as the real goal is met I'll be satisfied. :) For now I'm still confident that the subinterpreter approach is the best option for meeting the goal. > > Re passing objects, see below. > > I do agree it's cross-platform, but right now that's the only thing I > agree with. > >>> Note: I don't count the IPC cost of forking, because at least on >>> linux, any way to efficiently share objects between independent >>> interpreters in separate threads can also be ported to independent >>> interpreters in forked subprocesses, >> >> How so? Subinterpreters are in the same process. For this proposal each >> would be on its own thread. Sharing objects between them through channels >> would be more efficient than IPC. Perhaps I've missed something? > > You might be missing that memory can be shared between processes, not > just threads, but I don't know. > > The reason passing objects between processes is so slow is currently > *nearly entirely* the cost of serialization. That is, it's the fact > that you are passing an object to an entirely separate interpreter, > and need to serialize the whole object graph and so on. If you can > make that fast without serialization, That is a worthy goal! > for shared memory threads, then > all the serialization becomes unnecessary, and you can either write to > a pipe (fast, if it's a non-container), or used shared memory from the > beginning (instantaneous). This is possible on any POSIX OS. Linux > lets you go even further. And this is faster than passing objects around within the same process? Does it play well with Python's memory model? -eric From ericsnowcurrently at gmail.com Wed Jun 24 07:30:13 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:30:13 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 6:13 AM, Devin Jeanpierre wrote: > The solution has threads that are remarkably like > processes, so I think it's really important to be careful about the > differences and why this solution has the advantage. I'm not seeing > that. Good point. I still think there are some significant differences (as already explained). > > And remember that we *do* have many examples of people using > parallelized Python code in production. Are you sure you're satisfying > their concerns, or whose concerns are you trying to satisfy? Another good point. What would you suggest is the best way to find out? -eric From ericsnowcurrently at gmail.com Wed Jun 24 07:33:23 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:33:23 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel wrote: > Nick Coghlan schrieb am 21.06.2015 um 03:28: >> * there may be restrictions on some extension modules that limit them >> to "main interpreter only" (e.g. if the extension module itself isn't >> thread-safe, then it will need to remain fully protected by the GIL) > > Just an idea, but C extensions could opt-in to this. Calling into them has > to go through some kind of callable type, usually PyCFunction. We could > protect all calls to extension types and C functions with a global runtime > lock (per process, not per interpreter) and Extensions could set a flag on > their functions and methods (or get it inherited from their extension types > etc.) that says "I don't need the lock". That allows for a very > fine-grained transition. Exactly. PEP 489 helps facilitate opting in as well, right? -eric From ericsnowcurrently at gmail.com Wed Jun 24 07:48:00 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:48:00 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> References: <5BBAF248-7669-4277-A067-0257BECE50AB@yahoo.com> Message-ID: On Sun, Jun 21, 2015 at 3:08 PM, Andrew Barnert wrote: > First, a minor question: instead of banning fork entirely within subinterpreters, why not just document that it is illegal to do anything between fork and exec in a subinterpreters, except for a very small (but possibly extensible) subset of Python? For example, after fork, you can no longer access any channels, and you also can't use signals, threads, fork again, imports, assignments to builtins, raising exceptions, or a whole host of other things (but of course if you exec an entirely new Python interpreter, it can do any of those things). Sure. I expect the quickest approach, though, will be to initially have blanket restrictions and then ease them once the core functionality is complete. > C extension modules could just have a flag that marks whether the whole module is fork-safe or not (defaulting to not). That may make sense independently from my proposal. > So, this allows a subinterpreter to use subprocess (or even multiprocessing, as long as you use the forkserver or spawn mechanism), and it gives code that intentionally wants to do tricky/dangerous things a way to do them, but it avoids all of the problems with accidentally breaking a subinterpreter by forking it and then doing bad things. > > Second, a major question: In this proposal, are builtins and the modules map shared, or copied? > > If they're copied, it seems like it would be hard to do that even as efficiently as multiprocessing, much less more efficiently. Of course you could fake this with CoW, but I'm not sure how you'd do that, short of CoWing the entire heap (by using clone instead of pthreads on Linux, or by doing a bunch of explicit mmap and related calls on other POSIX systems), at which point you're pretty close to just implementing fork or vfork yourself to avoid calling fork or vfork, and unlikely to get it as efficient or as robust as what's already there. > > If they're shared, on the other hand, then it seems like it becomes very difficult to implement subinterpreter-safe code, because it's no longer safe to import a module, set a flag, call a registration function, etc. > > I expect that ultimately the builtins will be shared in some fashion. To some extent they already are. sys.modules (and the rest of the import machinery) will mostly not be shared, though I expect that likewise we will have some form of sharing where we can get away with it. -eric From ericsnowcurrently at gmail.com Wed Jun 24 07:51:00 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 23 Jun 2015 23:51:00 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <3572BE97-5A2B-4A0F-B593-2BE6605E40DA@yahoo.com> References: <3572BE97-5A2B-4A0F-B593-2BE6605E40DA@yahoo.com> Message-ID: On Sun, Jun 21, 2015 at 3:24 PM, Andrew Barnert via Python-ideas wrote: > On Jun 21, 2015, at 06:09, Nick Coghlan wrote: >> >> Avoiding object serialisation is indeed the main objective. With >> subinterpreters, we have a lot more options for that than we do with >> any form of IPC, including shared references to immutable objects, and >> the PEP 3118 buffer API. > > It seems like you could provide a way to efficiently copy and share deeper objects than integers and buffers without sharing everything, assuming the user code knows, at the time those objects are created, that they will be copied or shared. Basically, you allocate the objects into a separate arena (along with allocating their refcounts on a separate page, as already mentioned). You can't add a reference to an outside object in an arena-allocated object, although you can copy that outside object into the arena. And then you just pass or clone (possibly by using CoW memory-mapping calls, only falling back to memcpy on platforms that can't do that) entire arenas instead of individual objects (so you don't need the fictitious memdeepcpy function that someone ridiculed earlier in this thread, but you get 90% of the benefits of having one). Yeah, I've been thinking of something along these lines. However, it's not the #1 issue to address so I haven't gotten too far into it. -eric > > This has the same basic advantages of forking, but it's doable efficiently on Windows, and doable less efficiently (but still better than spawn and pass) on even weird embedded platforms, and it forces code to be explicit about what gets shared and copied without forcing it to work through less-natural queue-like APIs. > > Also, it seems like you could fake this entire arena API on top of pickle/copy for a first implementation, then just replace the underlying implementation separately. From ericsnowcurrently at gmail.com Wed Jun 24 08:01:31 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:01:31 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Sun, Jun 21, 2015 at 7:47 PM, Nick Coghlan wrote: > It occurred to me in the context of another conversation that you (or > someone else!) may be able to prototype some of the public API ideas > for this using Jython and Vert.x: http://vertx.io/ I'll take a look. > > That idea and some of the initial feedback in this thread also made me > realise that it is going to be essential to keep in mind that there > are key goals at two different layers here: > > * design a compelling implementation independent public API for CSP > style programming in Python > * use subinterpreters to implement that API efficiently in CPython > > There's a feedback loop between those two goals where limitations on > what's feasible in CPython may constrain the design of the public API, > and the design of the API may drive enhancements to the existing > subinterpreter capability, but we shouldn't lose sight of the fact > that they're *separate* goals. Yep. I've looked at it that way from the beginning. When I get to the point of writing an actual PEP, I'm thinking it will actually be multiple PEPs covering the different pieces. I've also been considering how to implement that high-level API in terms of a low-level API (threading vs. _thread) and it it make sense to focus less on subinterpreters in that context. At this point it makes sense to me to expose subinterpreters in Python, so for now I was planning on that for the low-level API. -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:11:16 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:11:16 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Mon, Jun 22, 2015 at 5:59 PM, Nathaniel Smith wrote: > On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith wrote: >> ... > > One possibility would be for subinterpreters to copy modules from the > main interpreter -- I guess your average module is mostly dicts, > strings, type objects, and functions; strings and functions are > already immutable and could be shared without copying, and I guess > copying the dicts and type objects into the subinterpreter is much > cheaper than hitting the disk etc. to do a real import. (Though > certainly not free.) Yeah, I think there are a number of mechanisms we can explore to improve the efficiency of subinterpreter startup (and sharing). > > This would have interesting semantic implications -- it would give > similar effects to fork(), with subinterpreters starting from a > snapshot of the main interpreter's global state. > >> I'm not entirely sold on this overall proposal, but I think a result of it >> could be to make our subinterpreter support better which would be a good >> thing. >> >> We have had to turn people away from subinterpreters in the past for use as >> part of their multithreaded C++ server where they wanted to occasionally run >> some Python code in embedded interpreters as part of serving some requests. >> Doing that would suddenly single thread their application (GIIIIIIL!) for >> all requests currently executing Python code despite multiple >> subinterpreters. > > I've also talked to HPC users who discovered this problem the hard way > (e.g. http://www-atlas.lbl.gov/, folks working on the Large Hadron > Collider) -- they've been using Python as an extension language in > some large physics codes but are now porting those bits to C++ because > of the GIL issues. (In this context startup overhead should be easily > amortized, but switching to an RPC model is not going to happen.) Would this proposal make a difference for them? -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:12:37 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:12:37 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Mon, Jun 22, 2015 at 9:52 PM, Nick Coghlan wrote: > On 23 June 2015 at 10:03, Chris Angelico wrote: >> On Tue, Jun 23, 2015 at 9:59 AM, Nathaniel Smith wrote: >>> One possibility would be for subinterpreters to copy modules from the >>> main interpreter -- I guess your average module is mostly dicts, >>> strings, type objects, and functions; strings and functions are >>> already immutable and could be shared without copying, and I guess >>> copying the dicts and type objects into the subinterpreter is much >>> cheaper than hitting the disk etc. to do a real import. (Though >>> certainly not free.) >> >> FWIW, functions aren't immutable, but code objects are. > > Anything we come up with for optimised data sharing via channels could > be applied to passing a prebuilt sys.modules dictionary through to > subinterpreters. > > The key for me is to start from a well-defined "shared nothing" > semantic model, but then look for ways to exploit the fact that we > actually *are* running in the same address space to avoid copy > objects. Exactly. > > The current reference-counts-embedded-in-the-object-structs memory > layout also plays havoc with the all-or-nothing page level > copy-on-write semantics used by the fork() syscall at the operating > system layer, so some of the ideas we've been considering > (specifically, those related to moving the reference counter > bookkeeping out of the object structs themselves) would potentially > help with that as well (but would also have other hard to predict > performance consequences). > > There's a reason Eric announced this as the *start* of a research > project, rather than as a finished proposal - while it seems > conceptually sound overall, there are a vast number of details to be > considered that will no doubt hold a great many devils :) And they keep multiplying! :) -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:15:58 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:15:58 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150623120118.333922bb@anarchist.wooz.org> References: <5585E37F.4060403@gmail.com> <20150623120118.333922bb@anarchist.wooz.org> Message-ID: On Tue, Jun 23, 2015 at 10:01 AM, Barry Warsaw wrote: > A crazy offshoot idea would be something like Emacs' unexec, where during the > build process you could preload a bunch of always-used immutable modules, then > freeze the state in such a way that starting up again later would be much > faster, because the imports (and probably more importantly, the searching) > could be avoided. +1 -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:18:39 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:18:39 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> <55895BB1.30609@sotecware.net> Message-ID: On Tue, Jun 23, 2015 at 5:32 PM, Devin Jeanpierre wrote: > A coworker of mine wrote a patch to Python that allows you to freeze > refcounts for all existing objects before forking, if the correct > compile options are set. This adds overhead to incref/decref, but > dramatically changes the python+fork memory usage story. (I haven't > personally played with it much, but it sounds decent.) If there's any > interest I can try to upstream this change, guarded behind a compiler > flag. > > We've also tried moving refcounts to their own pages, like you and > Nick suggest, but it breaks a *lot* of third-party code. I can try to > upstream it. If it's guarded by a compiler flag it is probably still > useful, just any users would have to grep through their dependencies > to make sure nothing directly accesses the refcount. (The stdlib can > be made to work.) It sounds like it would also be useful for the main > project in the topic of this thread, so I imagine there's more > momentum behind it. I'd be interested in more info on both the refcount freezing and the sepatate refcounts pages. -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:19:54 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:19:54 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> <323935056456705826.434603sturla.molden-gmail.com@news.gmane.org> <55895BB1.30609@sotecware.net> Message-ID: On Tue, Jun 23, 2015 at 5:32 PM, Devin Jeanpierre wrote: > We've also tried moving refcounts to their own pages, like you and > Nick suggest, but it breaks a *lot* of third-party code. I can try to > upstream it. If it's guarded by a compiler flag it is probably still > useful, just any users would have to grep through their dependencies > to make sure nothing directly accesses the refcount. (The stdlib can > be made to work.) It sounds like it would also be useful for the main > project in the topic of this thread, so I imagine there's more > momentum behind it. Any indication of the performance impact? -eric From ericsnowcurrently at gmail.com Wed Jun 24 08:21:42 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 00:21:42 -0600 Subject: [Python-ideas] PyParallel update (was: solving multi-core Python) In-Reply-To: <20150623135257.GA94530@trent.me> References: <20150623135257.GA94530@trent.me> Message-ID: On Tue, Jun 23, 2015 at 7:53 AM, Trent Nelson wrote: > On Sat, Jun 20, 2015 at 03:42:33PM -0600, Eric Snow wrote: >> Furthermore, removing the GIL is perhaps an obvious solution but not >> the only one. Others include Trent Nelson's PyParallels, STM, and >> other Python implementations.. > > So, I've been sprinting relentlessly on PyParallel since Christmas, and > recently reached my v0.0 milestone of being able to handle all the TEFB > tests, plus get the "instantaneous wiki search" thing working too. Thanks for the update, Trent. I've skimmed through it and will be reading more in-depth when I get a chance. I'm sure I'll have more questions for you. :) -eric From njs at pobox.com Wed Jun 24 09:19:47 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 24 Jun 2015 00:19:47 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Tue, Jun 23, 2015 at 11:11 PM, Eric Snow wrote: > On Mon, Jun 22, 2015 at 5:59 PM, Nathaniel Smith wrote: >> On Mon, Jun 22, 2015 at 10:37 AM, Gregory P. Smith wrote: >>> >>> We have had to turn people away from subinterpreters in the past for use as >>> part of their multithreaded C++ server where they wanted to occasionally run >>> some Python code in embedded interpreters as part of serving some requests. >>> Doing that would suddenly single thread their application (GIIIIIIL!) for >>> all requests currently executing Python code despite multiple >>> subinterpreters. >> >> I've also talked to HPC users who discovered this problem the hard way >> (e.g. http://www-atlas.lbl.gov/, folks working on the Large Hadron >> Collider) -- they've been using Python as an extension language in >> some large physics codes but are now porting those bits to C++ because >> of the GIL issues. (In this context startup overhead should be easily >> amortized, but switching to an RPC model is not going to happen.) > > Would this proposal make a difference for them? I'm not sure -- it was just a conversation, so I've never seen their actual code. I'm pretty sure they're still on py2, for one thing :-). But putting that aside, I *think* it potentially could help -- my guess is that at a high level they have an API where they basically want to register a callback once, and then call it in parallel from multiple threads. This kind of usage would require some extra machinery, I guess, to spawn a subinterpreter for each thread and import the relevant libraries so the callback could run, but I can't see any reason one couldn't build that on top of the mechanisms you're talking about. -n -- Nathaniel J. Smith -- http://vorpus.org From solipsis at pitrou.net Wed Jun 24 09:19:55 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Jun 2015 09:19:55 +0200 Subject: [Python-ideas] millisecond and microsecond times without floats References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> Message-ID: <20150624091955.2efef148@fsol> On Tue, 23 Jun 2015 23:25:00 +0300 Paul Sokolovsky wrote: > > Well, that's one of examples of that "desktop" thinking ;-). > Consider for example that 2^32 microseconds is just over an hour, so > expressing everything in microseconds would require arbitrary-precision > integers, which may be just the same kind of burden for an embedded > system as floats. I'd like to suggest micropython first acquire the ability to handle 64-bit numbers (or something close to that, e.g. 60-bit, if it likes to use tags for typing), if it wants to become appropriate for precise datetime computations. That should be less of a heavy requirement than arbitrary-precision ints. Regards Antoine. From mal at egenix.com Wed Jun 24 09:50:16 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Jun 2015 09:50:16 +0200 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150623021530.74ce1ebe@x230> References: <20150623021530.74ce1ebe@x230> Message-ID: <558A6138.4010101@egenix.com> On 23.06.2015 01:15, Paul Sokolovsky wrote: > > > Hello from MicroPython, a lean Python implementation > scaling down to run even on microcontrollers > (https://github.com/micropython/micropython). > > Our target hardware base oftentimes lacks floating point support, and > using software emulation is expensive. So, we would like to have > versions of some timing functions, taking/returning millisecond and/or > microsecond values as integers. > > The most functionality we're interested in: > > 1. Delays > 2. Relative time (from an arbitrary starting point, expected to be > wrapped) > 3. Calculating time differences, with immunity to wrap-around. > > The first presented assumption is to use "time.sleep()" for delays, > "time.monotonic()" for relative time as the base. Would somebody gave > alternative/better suggestions? > > Second question is how to modify their names for > millisecond/microsecond versions. For sleep(), "msleep" and "usleep" > would be concise possibilities, but that doesn't map well to > monotonic(), leading to "mmonotonic". So, better idea is to use "_ms" > and "_us" suffixes: > > sleep_ms() > sleep_us() > monotonic_ms() > monotonic_us() > > Point 3 above isn't currently addressed by time module at all. > https://www.python.org/dev/peps/pep-0418/ mentions some internal > workaround for overflows/wrap-arounds on some systems. Due to > lean-ness of our hardware base, we'd like to make this matter explicit > to the applications and avoid internal workarounds. Proposed solution > is to have time.elapsed(time1, time2) function, which can take values > as returned by monotonic_ms(), monotonic_us(). Assuming that results of > both functions are encoded and wrap consistently (this is reasonable > assumption), there's no need for 2 separate elapsed_ms(), elapsed_us() > function. > > > So, the above are rough ideas we (well, I) have. We'd like to get wider > Python community feedback on them, see if there're better/alternative > ideas, how Pythonic it is, etc. To clarify, this should not be construed > as proposal to add the above functions to CPython. You may want to use a similar approach as I have used in mxDateTime to express date/time values: http://www.egenix.com/products/python/mxBase/mxDateTime/ It uses an integer to represent days and a float to represent seconds since midnight (i.e. time of day). The concept has worked out really well and often makes date/time calculations a lot easier than trying to stuff everything into a single number and then having to deal things like leap seconds and rounding errors. In your case you'd use integers for both and nanoseconds as basis for the time of day integer. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 24 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78 2015-07-20: EuroPython 2015, Bilbao, Spain ... 26 days to go 2015-07-29: Python Meeting Duesseldorf ... 35 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Wed Jun 24 10:22:42 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 24 Jun 2015 18:22:42 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 24 June 2015 at 15:33, Eric Snow wrote: > On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel wrote: >> Nick Coghlan schrieb am 21.06.2015 um 03:28: >>> * there may be restrictions on some extension modules that limit them >>> to "main interpreter only" (e.g. if the extension module itself isn't >>> thread-safe, then it will need to remain fully protected by the GIL) >> >> Just an idea, but C extensions could opt-in to this. Calling into them has >> to go through some kind of callable type, usually PyCFunction. We could >> protect all calls to extension types and C functions with a global runtime >> lock (per process, not per interpreter) and Extensions could set a flag on >> their functions and methods (or get it inherited from their extension types >> etc.) that says "I don't need the lock". That allows for a very >> fine-grained transition. > > Exactly. PEP 489 helps facilitate opting in as well, right? Yep, as PEP 489 requires subinterpreter compatibility as a precondition for using multi-phase initialisation :) Cheers, Nick. P.S. Technically, what it actually requires is support for "multiple instances of the module existing in the same process at the same time", as it really recreates the module if you remove it from sys.modules and import it again, unlike single phase initialisation. But that's a mouthful, so "must support subinterpreters" is an easier shorthand. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From drekin at gmail.com Wed Jun 24 11:00:18 2015 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Wed, 24 Jun 2015 11:00:18 +0200 Subject: [Python-ideas] Are there asynchronous generators? Message-ID: Hello, I had a generator producing pairs of values and wanted to feed all the first members of the pairs to one consumer and all the second members to another consumer. For example: def pairs(): for i in range(4): yield (i, i ** 2) biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) The point is I wanted the consumers to be suspended and resumed in a coordinated manner: The first producer is invoked, it wants the first element. The coordinator implemented by biconsumer function invokes pairs(), gets the first pair and yields its first member to the first consumer. Then it wants the next element, but now it's the second consumer's turn, so the first consumer is suspended and the second consumer is invoked and fed with the second member of the first pair. Then the second producer wants the next element, but it's the first consumer's turn? and so on. In the end, when the stream of pairs is exhausted, StopIteration is thrown to both consumers and their results are combined. The cooperative asynchronous nature of the execution reminded me asyncio and coroutines, so I thought that biconsumer may be implemented using them. However, it seems that it is imposible to write an "asynchronous generator" since the "yielding pipe" is already used for the communication with the scheduler. And even if it was possible to make an asynchronous generator, it is not clear how to feed it to a synchronous consumer like sum() or list() function. With PEP 492 the concepts of generators and coroutines were separated, so asyncronous generators may be possible in theory. An ordinary function has just the returning pipe ? for returning the result to the caller. A generator has also a yielding pipe ? used for yielding the values during iteration, and its return pipe is used to finish the iteration. A native coroutine has a returning pipe ? to return the result to a caller just like an ordinary function, and also an async pipe ? used for communication with a scheduler and execution suspension. An asynchronous generator would just have both yieling pipe and async pipe. So my question is: was the code like the following considered? Does it make sense? Or are there not enough uses cases for such code? I found only a short mention in https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so possibly these coroutine-generators are the same idea. async def f(): number_string = await fetch_data() for n in number_string.split(): yield int(n) async def g(): result = async/await? sum(f()) return result async def h(): the_sum = await g() As for explanation about the execution of h() by an event loop: h is a native coroutine called by the event loop, having both returning pipe and async pipe. The returning pipe leads to the end of the task, the async pipe is used for cummunication with the scheduler. Then, g() is called asynchronously ? using the await keyword means the the access to the async pipe is given to the callee. Then g() invokes the asyncronous generator f() and gives it the access to its async pipe, so when f() is yielding values to sum, it can also yield a future to the scheduler via the async pipe and suspend the whole task. Regards, Adam Barto? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Wed Jun 24 12:13:49 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 24 Jun 2015 13:13:49 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150624091955.2efef148@fsol> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> Message-ID: <20150624131349.01ee7634@x230> Hello, On Wed, 24 Jun 2015 09:19:55 +0200 Antoine Pitrou wrote: > On Tue, 23 Jun 2015 23:25:00 +0300 > Paul Sokolovsky wrote: > > > > Well, that's one of examples of that "desktop" thinking ;-). > > Consider for example that 2^32 microseconds is just over an hour, so > > expressing everything in microseconds would require > > arbitrary-precision integers, which may be just the same kind of > > burden for an embedded system as floats. > > I'd like to suggest micropython first acquire the ability to handle > 64-bit numbers (or something close to that, e.g. 60-bit, if it likes > to use tags for typing), if it wants to become appropriate for precise > datetime computations. MicroPython has such support. Long integers can be implemented either as variable-size arbitrary-precisions integers or and C long long type. But that doesn't change the fact that 64-bit values still overflow, or that we don't want to force need for any kind of long integer on any particular implementation. We don't even want to mask the fact that fixed-size (time) counters overflow - for various reasons, including the fact that we want to follow Python's tradition of being nice teaching/learning language, and learning embedded programming means learning to deal with timer, etc. overflows. So, the question is not how to "appropriate for precise datetime computations" - MicroPython inherits that ability by being a Python, but how to scale into the opposite direction, how to integrate into stdlib "realtime" time handling, which is simple, fast (getting timing value itself is low-overhead) and modular-arithmetic by its nature. > > That should be less of a heavy requirement than arbitrary-precision > ints. > > Regards > > Antoine. -- Best regards, Paul mailto:pmiscml at gmail.com From andrew.svetlov at gmail.com Wed Jun 24 12:13:53 2015 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Wed, 24 Jun 2015 13:13:53 +0300 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: Your idea is clean and maybe we will allow `yield` inside `async def` in Python 3.6. For PEP 492 it was too big change. On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? wrote: > Hello, > > I had a generator producing pairs of values and wanted to feed all the first > members of the pairs to one consumer and all the second members to another > consumer. For example: > > def pairs(): > for i in range(4): > yield (i, i ** 2) > > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) > > The point is I wanted the consumers to be suspended and resumed in a > coordinated manner: The first producer is invoked, it wants the first > element. The coordinator implemented by biconsumer function invokes pairs(), > gets the first pair and yields its first member to the first consumer. Then > it wants the next element, but now it's the second consumer's turn, so the > first consumer is suspended and the second consumer is invoked and fed with > the second member of the first pair. Then the second producer wants the next > element, but it's the first consumer's turn? and so on. In the end, when the > stream of pairs is exhausted, StopIteration is thrown to both consumers and > their results are combined. > > The cooperative asynchronous nature of the execution reminded me asyncio and > coroutines, so I thought that biconsumer may be implemented using them. > However, it seems that it is imposible to write an "asynchronous generator" > since the "yielding pipe" is already used for the communication with the > scheduler. And even if it was possible to make an asynchronous generator, it > is not clear how to feed it to a synchronous consumer like sum() or list() > function. > > With PEP 492 the concepts of generators and coroutines were separated, so > asyncronous generators may be possible in theory. An ordinary function has > just the returning pipe ? for returning the result to the caller. A > generator has also a yielding pipe ? used for yielding the values during > iteration, and its return pipe is used to finish the iteration. A native > coroutine has a returning pipe ? to return the result to a caller just like > an ordinary function, and also an async pipe ? used for communication with a > scheduler and execution suspension. An asynchronous generator would just > have both yieling pipe and async pipe. > > So my question is: was the code like the following considered? Does it make > sense? Or are there not enough uses cases for such code? I found only a > short mention in > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so possibly > these coroutine-generators are the same idea. > > async def f(): > number_string = await fetch_data() > for n in number_string.split(): > yield int(n) > > async def g(): > result = async/await? sum(f()) > return result > > async def h(): > the_sum = await g() > > As for explanation about the execution of h() by an event loop: h is a > native coroutine called by the event loop, having both returning pipe and > async pipe. The returning pipe leads to the end of the task, the async pipe > is used for cummunication with the scheduler. Then, g() is called > asynchronously ? using the await keyword means the the access to the async > pipe is given to the callee. Then g() invokes the asyncronous generator f() > and gives it the access to its async pipe, so when f() is yielding values to > sum, it can also yield a future to the scheduler via the async pipe and > suspend the whole task. > > Regards, Adam Barto? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -- Thanks, Andrew Svetlov From solipsis at pitrou.net Wed Jun 24 12:40:10 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Jun 2015 12:40:10 +0200 Subject: [Python-ideas] millisecond and microsecond times without floats References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> Message-ID: <20150624124010.24cd3613@fsol> On Wed, 24 Jun 2015 13:13:49 +0300 Paul Sokolovsky wrote: > > So, the question is not how to "appropriate for precise datetime > computations" - MicroPython inherits that ability by being a Python, > but how to scale into the opposite direction, how to integrate into > stdlib "realtime" time handling, which is simple, fast (getting timing > value itself is low-overhead) and modular-arithmetic by its nature. I'm sorry, I don't understand. If you have 64-bit ints then why would you use anything smaller for timestamps? Regards Antoine. From pmiscml at gmail.com Wed Jun 24 12:59:08 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 24 Jun 2015 13:59:08 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150624124010.24cd3613@fsol> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> Message-ID: <20150624135908.4f85d415@x230> Hello, On Wed, 24 Jun 2015 12:40:10 +0200 Antoine Pitrou wrote: > On Wed, 24 Jun 2015 13:13:49 +0300 > Paul Sokolovsky wrote: > > > > So, the question is not how to "appropriate for precise datetime > > computations" - MicroPython inherits that ability by being a Python, > > but how to scale into the opposite direction, how to integrate into > > stdlib "realtime" time handling, which is simple, fast (getting > > timing value itself is low-overhead) and modular-arithmetic by its > > nature. > > I'm sorry, I don't understand. If you have 64-bit ints then why would > you use anything smaller for timestamps? Because MicroPython stays close (== may stay close) to hardware and does not depend on any OS (even those smaller embedded OSes, which are called RTOS'es). Then, it's usual case for embedded hardware to have hardware timers of the same size or smaller as the architecture machine word. For example, on a 32-bit CPU, timers are usually 32-, 24-, or 16- bit. On 16-bit CPUs, timers are 16- or 8-bit. Put it otherwise way, there's simply nowhere to get 64-bit time value from, except by building software abstractions, and MicroPython does not *require* them (if they exist - good, they will be helpful for other things, if not - MicroPython can still run and do a large subset of useful things). Another reason is that MicroPython exactly uses tagged pointers scheme, and small integers are value, not reference, objects. Dealing with them is largely faster (MicroPython easily beats CPython on (small) integer performance), and doesn't require memory allocation (the latter is another important feature for embedded systems). > > Regards > > Antoine. -- Best regards, Paul mailto:pmiscml at gmail.com From solipsis at pitrou.net Wed Jun 24 13:03:38 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 24 Jun 2015 13:03:38 +0200 Subject: [Python-ideas] millisecond and microsecond times without floats References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> <20150624135908.4f85d415@x230> Message-ID: <20150624130338.0b222ca3@fsol> On Wed, 24 Jun 2015 13:59:08 +0300 Paul Sokolovsky wrote: > Hello, > > On Wed, 24 Jun 2015 12:40:10 +0200 > Antoine Pitrou wrote: > > > On Wed, 24 Jun 2015 13:13:49 +0300 > > Paul Sokolovsky wrote: > > > > > > So, the question is not how to "appropriate for precise datetime > > > computations" - MicroPython inherits that ability by being a Python, > > > but how to scale into the opposite direction, how to integrate into > > > stdlib "realtime" time handling, which is simple, fast (getting > > > timing value itself is low-overhead) and modular-arithmetic by its > > > nature. > > > > I'm sorry, I don't understand. If you have 64-bit ints then why would > > you use anything smaller for timestamps? > > Because MicroPython stays close (== may stay close) to hardware and does > not depend on any OS (even those smaller embedded OSes, which are > called RTOS'es). Then, it's usual case for embedded hardware to have > hardware timers of the same size or smaller as the architecture machine > word. For example, on a 32-bit CPU, timers are usually 32-, 24-, or 16- > bit. On 16-bit CPUs, timers are 16- or 8-bit. I don't think such timers have a place in the CPython standard library, though. Don't you have an additional namespace for micropython-specific features? Regards Antoine. From pmiscml at gmail.com Wed Jun 24 13:38:08 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Wed, 24 Jun 2015 14:38:08 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150624130338.0b222ca3@fsol> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol> Message-ID: <20150624143808.29844019@x230> Hello, On Wed, 24 Jun 2015 13:03:38 +0200 Antoine Pitrou wrote: [] > > Because MicroPython stays close (== may stay close) to hardware and > > does not depend on any OS (even those smaller embedded OSes, which > > are called RTOS'es). Then, it's usual case for embedded hardware to > > have hardware timers of the same size or smaller as the > > architecture machine word. For example, on a 32-bit CPU, timers are > > usually 32-, 24-, or 16- bit. On 16-bit CPUs, timers are 16- or > > 8-bit. > > I don't think such timers have a place in the CPython standard > library, though. They don't, that was said in the very first message. They do have their place in MicroPython's stdlib and arguably in any other embedded Python's stdlib. There're number of embedded Python ports, I don't know if they tried to address to wider Python community regarding aspects peculiar to them. As you can see, we try to do the homework on our side. > Don't you have an additional namespace for micropython-specific > features? I treat it as a good sign that it's ~8th message in the thread and it's only the first time we get a hint that we should get out with our stuff into a separate namespace ;-). But of course, digging own hole and putting random stuff in there is everyone's first choice. And MicroPython has its "catch-all" module for random stuff imaginatively called "pyb", and in (user-friendly) embedded, the de-facto API standard is Arduino's, so that's what taken as a base for function names. So, MicroPython currently has: pyb.delay(ms) pyb.udelay(us) pyb.millis() pyb.micros() pyb.elapsed_millis() pyb.elapsed_micros() As can be seen, while these deal with time measurement/delays, they have little in common with how Python does it. And the main question we seek to answer is - what's more beneficial: to keep digging own hole or try to take Python's API as a close affinity (while still adhering to requirements posed by embedded platforms). > > Regards > > Antoine. -- Best regards, Paul mailto:pmiscml at gmail.com From mal at egenix.com Wed Jun 24 13:43:55 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Jun 2015 13:43:55 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <558A97FB.9070908@egenix.com> On 24.06.2015 10:22, Nick Coghlan wrote: > On 24 June 2015 at 15:33, Eric Snow wrote: >> On Sun, Jun 21, 2015 at 7:06 AM, Stefan Behnel wrote: >>> Nick Coghlan schrieb am 21.06.2015 um 03:28: >>>> * there may be restrictions on some extension modules that limit them >>>> to "main interpreter only" (e.g. if the extension module itself isn't >>>> thread-safe, then it will need to remain fully protected by the GIL) >>> >>> Just an idea, but C extensions could opt-in to this. Calling into them has >>> to go through some kind of callable type, usually PyCFunction. We could >>> protect all calls to extension types and C functions with a global runtime >>> lock (per process, not per interpreter) and Extensions could set a flag on >>> their functions and methods (or get it inherited from their extension types >>> etc.) that says "I don't need the lock". That allows for a very >>> fine-grained transition. >> >> Exactly. PEP 489 helps facilitate opting in as well, right? > > Yep, as PEP 489 requires subinterpreter compatibility as a > precondition for using multi-phase initialisation :) > > Cheers, > Nick. > > P.S. Technically, what it actually requires is support for "multiple > instances of the module existing in the same process at the same > time", as it really recreates the module if you remove it from > sys.modules and import it again, unlike single phase initialisation. > But that's a mouthful, so "must support subinterpreters" is an easier > shorthand. Note that extension modules often interface to other C libraries which typically use some setup logic that is not thread safe, but is used to initialize the other thread safe parts. E.g. setting up locks and shared memory for all threads to use is a typical scenario you find in such libs. A requirement to be able to import modules multiple times would pretty much kill the idea for those modules. That said, I don't think this is really needed. Modules would only have to be made aware that there is a global first time setup phase and a later shutdown/reinit phase. As a result, the module DLL would load only once, but then use the new module setup logic to initialize its own state multiple times. That said, I still think the multiple-process is a better one (more robust, more compatible, fewer problems). We'd just need a way more efficient approach to sharing objects between the Python processes than using pickle and shared memory or pipes :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 24 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78 2015-07-20: EuroPython 2015, Bilbao, Spain ... 26 days to go 2015-07-29: Python Meeting Duesseldorf ... 35 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From jonathan at slenders.be Wed Jun 24 13:54:28 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Wed, 24 Jun 2015 13:54:28 +0200 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: In my experience, it's much easier to use asyncio Queues for this. Instead of yielding, push to a queue. The consumer can then use "await queue.get()". I think the semantics of the generator become too complicated otherwise, or maybe impossible. Maybe have a look at this article: http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return Jonathan 2015-06-24 12:13 GMT+02:00 Andrew Svetlov : > Your idea is clean and maybe we will allow `yield` inside `async def` > in Python 3.6. > For PEP 492 it was too big change. > > On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? wrote: > > Hello, > > > > I had a generator producing pairs of values and wanted to feed all the > first > > members of the pairs to one consumer and all the second members to > another > > consumer. For example: > > > > def pairs(): > > for i in range(4): > > yield (i, i ** 2) > > > > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) > > > > The point is I wanted the consumers to be suspended and resumed in a > > coordinated manner: The first producer is invoked, it wants the first > > element. The coordinator implemented by biconsumer function invokes > pairs(), > > gets the first pair and yields its first member to the first consumer. > Then > > it wants the next element, but now it's the second consumer's turn, so > the > > first consumer is suspended and the second consumer is invoked and fed > with > > the second member of the first pair. Then the second producer wants the > next > > element, but it's the first consumer's turn? and so on. In the end, when > the > > stream of pairs is exhausted, StopIteration is thrown to both consumers > and > > their results are combined. > > > > The cooperative asynchronous nature of the execution reminded me asyncio > and > > coroutines, so I thought that biconsumer may be implemented using them. > > However, it seems that it is imposible to write an "asynchronous > generator" > > since the "yielding pipe" is already used for the communication with the > > scheduler. And even if it was possible to make an asynchronous > generator, it > > is not clear how to feed it to a synchronous consumer like sum() or > list() > > function. > > > > With PEP 492 the concepts of generators and coroutines were separated, so > > asyncronous generators may be possible in theory. An ordinary function > has > > just the returning pipe ? for returning the result to the caller. A > > generator has also a yielding pipe ? used for yielding the values during > > iteration, and its return pipe is used to finish the iteration. A native > > coroutine has a returning pipe ? to return the result to a caller just > like > > an ordinary function, and also an async pipe ? used for communication > with a > > scheduler and execution suspension. An asynchronous generator would just > > have both yieling pipe and async pipe. > > > > So my question is: was the code like the following considered? Does it > make > > sense? Or are there not enough uses cases for such code? I found only a > > short mention in > > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so > possibly > > these coroutine-generators are the same idea. > > > > async def f(): > > number_string = await fetch_data() > > for n in number_string.split(): > > yield int(n) > > > > async def g(): > > result = async/await? sum(f()) > > return result > > > > async def h(): > > the_sum = await g() > > > > As for explanation about the execution of h() by an event loop: h is a > > native coroutine called by the event loop, having both returning pipe and > > async pipe. The returning pipe leads to the end of the task, the async > pipe > > is used for cummunication with the scheduler. Then, g() is called > > asynchronously ? using the await keyword means the the access to the > async > > pipe is given to the callee. Then g() invokes the asyncronous generator > f() > > and gives it the access to its async pipe, so when f() is yielding > values to > > sum, it can also yield a future to the scheduler via the async pipe and > > suspend the whole task. > > > > Regards, Adam Barto? > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > -- > Thanks, > Andrew Svetlov > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From oreilldf at gmail.com Wed Jun 24 16:01:45 2015 From: oreilldf at gmail.com (Dan O'Reilly) Date: Wed, 24 Jun 2015 14:01:45 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 2:01 AM Eric Snow wrote: > On Sun, Jun 21, 2015 at 7:47 PM, Nick Coghlan wrote: > > It occurred to me in the context of another conversation that you (or > > someone else!) may be able to prototype some of the public API ideas > > for this using Jython and Vert.x: http://vertx.io/ > > I'll take a look. > > Note that Vert.x 3 was just released today, which (at least for now) drops support for Python. There is work underway to support it under version 3, but it's using CPython and Py4J, not Jython. You'd need to use Vert.x 2 to get Jython support: http://vertx.io/vertx2 -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Jun 24 15:07:59 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 24 Jun 2015 16:07:59 +0300 Subject: [Python-ideas] natively logging sys.path modifications Message-ID: Hi, sys.path is kind of important thing for troubleshooting. It may worth to ship it with a logging mechanism that allows to quickly dump who added what and (optionally) why. I see the log as a circular memory buffer of limited size, to say 256 entries, that contains tuples in the following format: path, who, where, why path -- actual path added to sys.path who -- the context - package.module:function or - package.module:class.method or - package.module:__toplevel__ where -- full filename and line number to the instruction why -- advanced API may allow to set this field -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Jun 24 15:19:46 2015 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 24 Jun 2015 16:19:46 +0300 Subject: [Python-ideas] web API to get a list of all module in stdlib Message-ID: Hi, People from core-workflow are not too active about the idea, so I finally found a time to repost it here. The original is here: http://comments.gmane.org/gmane.comp.python.devel.core-workflow/130 The idea is that docs.python.org site should export the list of Python modules shipped in stdlib for particular Python version in a machine readable format. There are recipes like these to get the list of modules: https://stackoverflow.com/questions/6463918/how-can-i-get-a-list-of-all-the-python-standard-library-modules But they give only the modules enabled for specific interpreter/platform. Not the list of modules that is included in de-facto standard for this stdlib version. This is need for processing information, for all Python versions, so instead parsing HTML tables, it would be more useful to directly fetch csv or json. That way anybody can quickly validate the processing algorithm without wasting time on extracting and normalizing the data. I see the data as the necessary step to organize a work around "externally evolving standard library", so a way to query it should be somewhat sustainable and obvious. Docs looks like an obvious way yo do so, like: https://docs.python.org/2.7.2/dataset/modules.json -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jun 24 16:59:34 2015 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 24 Jun 2015 23:59:34 +0900 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150623120118.333922bb@anarchist.wooz.org> References: <5585E37F.4060403@gmail.com> <20150623120118.333922bb@anarchist.wooz.org> Message-ID: <87zj3pw3s9.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > A crazy offshoot idea would be something like Emacs' unexec, where > during the build process you could preload a bunch of always-used > immutable modules, XEmacs doesn't do this any more if it can avoid it, we now have a portable dumper that we use on almost all platforms. And everybody at GNU Emacs who works with the unexec code wants to get rid of it. XEmacs's legacy unexec requires defeating address space randomization as well as certain optimizations that combine segments. I believe Emacs's does too. From a security standpoint, Emacsen are a child's garden of diseases and it will take decades, maybe centuries, to fix that, so those aren't huge problems for us. But I suppose Python needs to be able to work and play nicely with high-security environments, and would like to take advantage of security-oriented OS facilities like base address randomization. That kind of thing hugely complicates unexec -- last I heard it wasn't just "way too much work to be worth it", the wonks who created the portable dumper didn't know how to do it and weren't sure it could be done. XEmacs's default "portable dumper" is a poor man's relocating loader. I don't know exactly how it works, can't give details. Unlike the unexecs of some Lisps, however, this is a "do it once per build process" design. There's no explicit provision for keeping multiple dumpfiles around, although I believe it can be done "by hand" by someone with a little bit of knowledge. The reason for this is that the dumpfile is actually added to the executable. Regarding performance, the dumper itself is fast enough to be imperceptible to humans at load time, and doesn't take very long to build the dump file containing the "frozen" objects when building. I suspect Python has applications where it would be like to be faster than that, but I don't have benchmarks so don't know if this approach would be fast enough. This approach has the feature (disadvantage?) that some objects can't be dumped including editing buffers, network connections, and processes. I suppose those restrictions are very similar to the restrictions imposed by pickle. If somebody wants to know more about the portable dumper, I can probably connect them with the authors of that feature. From sturla.molden at gmail.com Wed Jun 24 17:26:59 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 24 Jun 2015 17:26:59 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 24/06/15 07:01, Eric Snow wrote: > In return, my question is, what is the level of effort to get fork+IPC > to do what we want vs. subinterpreters? Note that we need to > accommodate Windows as more than an afterthought Windows is really the problem. The absence of fork() is especially hurtful for an interpreted language like Python, in my opinion. >> If you think avoiding IPC is clever, you are wrong. IPC is very fast, in >> fact programs written to use MPI tends to perform and scale better than >> programs written to use OpenMP in parallel computing. > > I'd love to learn more about that. I'm sure there are some great > lessons on efficiently and safely sharing data between isolated > execution environments. That said, how does IPC compare to passing > objects around within the same process? There are two major competing standards for parallel computing in science and engineering: OpenMP and MPI. OpenMP is based on a shared memory model. MPI is based on a distributed memory model and use message passing (hence its name). The common implementations of OpenMP (GNU, Intel, Microsoft) are all implemented with threads. There are also OpenMP implementations for clusters (e.g. Intel), but from the programmer's perspective OpenMP is a shared memory model. The common implementations of MPI (MPICH, OpenMPI, Microsoft MPI) use processes instead of threads. Processes can run on the same computer or on different computers (aka "clusters"). On localhost shared memory is commonly used for message passing, on clusters MPI implementations will use networking protocols. The take-home message is that OpenMP is conceptually easier to use, but programs written to use MPI tend to be faster and scale better. This is even true when using a single computer, e.g. a laptop with one multicore CPU. Here is tl;dr explanation: As for ease of programming, it is easier to create a deadlock or livelock with MPI than OpenMP, even though programs written to use MPI tend to need fewer synchronization points. There is also less boilerplate code to type when using OpenMP, because we do not have to code object serialization, message passing, and object deserialization. For performance, programs written to use MPI seems to have a larger overhead because they require object serialization and message passing, whereas OpenMP threads can just share the same objects. The reality is actually the opposite, and is due to the internals of modern CPU, particularly hierarchichal memory, branch prediction and long pipelines. Because of hierarchichal memory, the cache used by CPUs and CPU cores must be kept in synch. Thus when using OpenMP (threads) there will be a lot of synchronization going on that the programmer does not see, but which the hardware will do behind the scenes. There will also be a lot of data passing between various cache levels on the CPU and RAM. If a core writes to a pice of memory it keeps in a cache line, a cascade of data traffic and synchronization can be triggered across all CPUs and cores. Not only will this stop the CPUs and prompt them to synchronize cache with RAM, it also invalidates their branch prediction and they must flush their pipelines and throw away work they have already done. The end result is a program that does not scale or perform very well, even though it does not seem to have any explicit synchronization points that could explain this. The term "false sharing" is often used to describe this problem. Programs written to use MPI are the opposite. There every instance of synchronization and message passing is visible. When a CPU core writes to memory kept in a cache line, it will never trigger synchronization and data traffic across all the CPUs. The scalability is as the program predicts. And even though memory and objects are not shared, there is actually much less data traffic going on. Which to use? Most people find it easier to use OpenMP, and it does not require a big runtime environment to be installed. But programs using MPI tend to be the faster and more scalable. If you need to ensure scalability on multicores, multiple processes are better than multiple threads. The scalability of MPI also applies to Python's multiprocessing. It is the isolated virtual memory of each process that allows the cores to run at full speed. Another thing to note is that Windows is not a second-class citizen when using MPI. The MPI runtime (usually an executable called mpirun or mpiexec) starts and manages a group of processes. It does not matter if they are started by fork() or CreateProcess(). > Solving reference counts in this situation is a separate issue that > will likely need to be resolved, regardless of which machinery we use > to isolate task execution. As long as we have a GIL, and we need the GIL to update a reference count, it does not hurt so much as it otherwise would. The GIL hides most of the scalability impact by serializing flow of execution. > IPC sounds great, but how well does it interact with Python's memory > management/allocator? I haven't looked closely but I expect that > multiprocessing does not use IPC anywhere. multiprocessing does use IPC. Otherwise the processes could not communicate. One example is multiprocessing.Queue, which uses a pipe and a semaphore. Sturla From rosuav at gmail.com Wed Jun 24 17:47:25 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Jun 2015 01:47:25 +1000 Subject: [Python-ideas] natively logging sys.path modifications In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 11:07 PM, anatoly techtonik wrote: > sys.path is kind of important thing for troubleshooting. It may worth > to ship it with a logging mechanism that allows to quickly dump > who added what and (optionally) why. > > I see the log as a circular memory buffer of limited size, to say > 256 entries, that contains tuples in the following format: > > path, who, where, why > > path -- actual path added to sys.path > who -- the context - package.module:function or > - package.module:class.method or > - package.module:__toplevel__ > where -- full filename and line number to the instruction > why -- advanced API may allow to set this field > It should be possible for you to replace sys.path with an object of your own invention, a subclass of list that records the above information whenever it's modified. Install that early, then let all the other changes get logged. Or have you tried this and found that it breaks something? ChrisA From breamoreboy at yahoo.co.uk Wed Jun 24 17:58:06 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 24 Jun 2015 16:58:06 +0100 Subject: [Python-ideas] natively logging sys.path modifications In-Reply-To: References: Message-ID: On 24/06/2015 14:07, anatoly techtonik wrote: > Hi, > > sys.path is kind of important thing for troubleshooting. It may worth > to ship it with a logging mechanism that allows to quickly dump > who added what and (optionally) why. > > I see the log as a circular memory buffer of limited size, to say > 256 entries, that contains tuples in the following format: > > path, who, where, why > > path -- actual path added to sys.path > who -- the context - package.module:function or > - package.module:class.method or > - package.module:__toplevel__ > where -- full filename and line number to the instruction > why -- advanced API may allow to set this field > You see the log and somebody else does the work for you as you refuse to sign the CLA. Do you want your bread buttered on both sides, or will one side suffice? -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From breamoreboy at yahoo.co.uk Wed Jun 24 18:03:10 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 24 Jun 2015 17:03:10 +0100 Subject: [Python-ideas] web API to get a list of all module in stdlib In-Reply-To: References: Message-ID: On 24/06/2015 14:19, anatoly techtonik wrote: > Hi, > > People from core-workflow are not too active about the idea, > If you want to resurrect this please sign the CLA and provide some code. Otherwise please go away permanently, thank you. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From rymg19 at gmail.com Wed Jun 24 18:15:49 2015 From: rymg19 at gmail.com (Ryan Gonzalez) Date: Wed, 24 Jun 2015 11:15:49 -0500 Subject: [Python-ideas] web API to get a list of all module in stdlib In-Reply-To: References: Message-ID: <4D0D7292-C3AE-4644-A27F-2D8E898E161B@gmail.com> On June 24, 2015 11:03:10 AM CDT, Mark Lawrence wrote: >On 24/06/2015 14:19, anatoly techtonik wrote: >> Hi, >> >> People from core-workflow are not too active about the idea, >> > >If you want to resurrect this please sign the CLA and provide some >code. > Otherwise please go away permanently, thank you. FYI, there are nicer ways to say that... -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From sturla.molden at gmail.com Wed Jun 24 18:28:54 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 24 Jun 2015 18:28:54 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 24/06/15 07:01, Eric Snow wrote: > Well, perception is 9/10ths of the law. :) If the multi-core problem > is already solved in Python then why does it fail in the court of > public opinion. The perception that Python lacks a good multi-core > story is real, leads organizations away from Python, and will not > improve without concrete changes. I think it is a combination of FUD and the lack of fork() on Windows. There is a lot of utterly wrong information about CPython and its GIL. The reality is that Python is used on even the largest supercomputers. The scalability problem that is seen on those systems is not the GIL, but the module import. If we have 1000 CPython processes importing modules like NumPy simultaneously, they will do a "denial of service attack" on the file system. This happens when the module importer generates a huge number of failed open() calls while trying to locate the module files. There is even described in a paper on how to avoid this on an IBM Blue Brain: "As an example, on Blue Gene P just starting up Python and importing NumPy and GPAW with 32768 MPI tasks can take 45 minutes!" http://www.cs.uoregon.edu/research/paracomp/papers/iccs11/iccs_paper_final.pdf And while CPython is being used for massive parallel computing to e.g. model the global climate system, there is this FUD that CPython does not even scale up on a laptop with a single multicore CPU. I don't know where it is coming from, but it is more FUD than truth. The main answers to FUD about the GIL and Python in scientific computing are these: 1. Python in itself generates a 200x to 2000x performance hit compared to C or Fortran. Do not write compute kernels in Python, unless you can compile with Cython or Numba. If you have need for speed, start by moving the performance critical parts to Cython instead of optimizing for a few CPU cores. 2. If you can release the GIL, e.g. in Cython code, Python threads scale like any other native OS thread. They are real threads, not fake threads in the interpreter. 3. The 80-20, 90-10, or 99-1 rule: The majority of the code accounts for a small portion of the runtime. It is wasteful to optimize "everything". The more speed you need, the stronger this asymmetry will be. Identify the bottlenecks with a profiler and optimize those. 4. Using C or Java does not give you ha faster hard-drive or faster network connection. You cannot improve on network access by using threads in C or Java instead of threads in Python. If your code is i/o bound, Python's GIL does not matter. Python threads do execute i/o tasks in parallel. (This is the major misunderstanding.) 5. Computational intensive parts of a program is usually taken case of in libraries like BLAS, LAPACK, and FFTW. The Fortran code in LAPACK does not care if you called it from Python. It will be as fast as it can be, independent of Python. The Fortran code in LAPACK also have no concept of Python's GIL. LAPACK libraries like Intel MKL can use threads internally without asking Python for permission. 6. The scalability problem when using Python on a massive supercomputer is not the GIL but the module import. 7. When using OpenCL we write kernels as plain text. Python is excellent at manipulating text, more so than C. This also applies to using OpenGL for computer graphics with GLSL shaders and vetexbuffer objects. If you need the GPU, you can just as well use Python on the CPU. Sturla From sturla.molden at gmail.com Wed Jun 24 18:58:01 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 24 Jun 2015 18:58:01 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <558A97FB.9070908@egenix.com> References: <558A97FB.9070908@egenix.com> Message-ID: On 24/06/15 13:43, M.-A. Lemburg wrote: > That said, I still think the multiple-process is a better one (more > robust, more compatible, fewer problems). We'd just need a way more > efficient approach to sharing objects between the Python processes > than using pickle and shared memory or pipes :-) It is hard to get around shared memory, Unix domain sockets, or pipes. There must be some sort of IPC, regardless. One idea I have played with is to use a specialized queue instead of the current multiprocessing.Queue. In scientific computing we often need to pass arrays, so it would make sense to have a queue that could bypass pickle for NumPy arrays, scalars and dtypes, simply by using the NumPy C API to process the data. It could also have specialized code for a number of other objects -- at least str, int, float, complex, and PEP 3118 buffers, but perhaps also simple lists, tuples and dicts with these types. I think it should be possible to make a queue that would avoid the pickle issue for 99 % of scientific computing. It would be very easy to write such a queue with Cython and e.g. have it as a part of NumPy or SciPy. One thing I did some years ago was to have NumPy arrays that would store the data in shared memory. And when passed to multiprocessing.Queue they would not pickle the data buffer, only the metadata. However this did not improve on performance, because the pickle overhead was still there, and passing a lot of binary data over a pipe was not comparably expensive. So while it would save memory, it did not make programs using multiprocessing and NumPy more efficient. Sturla From phd at phdru.name Wed Jun 24 21:08:48 2015 From: phd at phdru.name (Oleg Broytman) Date: Wed, 24 Jun 2015 21:08:48 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <20150624190848.GA21178@phdru.name> Hi! On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden wrote: > The absence of fork() is especially > hurtful for an interpreted language like Python, in my opinion. I don't think fork is of major help for interpreted languages. When most of your "code" is actually data most of your data pages are prone to copy-on-write slowdown. > Sturla Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From greg at krypto.org Wed Jun 24 21:31:56 2015 From: greg at krypto.org (Gregory P. Smith) Date: Wed, 24 Jun 2015 19:31:56 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 8:27 AM Sturla Molden wrote: > On 24/06/15 07:01, Eric Snow wrote: > > > In return, my question is, what is the level of effort to get fork+IPC > > to do what we want vs. subinterpreters? Note that we need to > > accommodate Windows as more than an afterthought > > Windows is really the problem. The absence of fork() is especially > hurtful for an interpreted language like Python, in my opinion. > You cannot assume that fork() is safe on any OS as a general solution for anything. This isn't a Windows specific problem, It simply cannot be relied upon in a general purpose library at all. It is incompatible with threads. The ways fork() can be used safely are in top level application decisions: There must be a guarantee of no threads running before all forking is done. (thus the impossibility of relying on it as a mechanism to do anything useful in a generic library - you are a library, you don't know what the whole application is doing or when you were called as part of it) A concurrency model that assumes that it is fine to fork() and let child processes continue to execute is not usable by everyone. (ie: multiprocessing until http://bugs.python.org/issue8713 was implemented). -gps > > >> If you think avoiding IPC is clever, you are wrong. IPC is very fast, in > >> fact programs written to use MPI tends to perform and scale better than > >> programs written to use OpenMP in parallel computing. > > > > I'd love to learn more about that. I'm sure there are some great > > lessons on efficiently and safely sharing data between isolated > > execution environments. That said, how does IPC compare to passing > > objects around within the same process? > > There are two major competing standards for parallel computing in > science and engineering: OpenMP and MPI. OpenMP is based on a shared > memory model. MPI is based on a distributed memory model and use message > passing (hence its name). > > The common implementations of OpenMP (GNU, Intel, Microsoft) are all > implemented with threads. There are also OpenMP implementations for > clusters (e.g. Intel), but from the programmer's perspective OpenMP is a > shared memory model. > > The common implementations of MPI (MPICH, OpenMPI, Microsoft MPI) use > processes instead of threads. Processes can run on the same computer or > on different computers (aka "clusters"). On localhost shared memory is > commonly used for message passing, on clusters MPI implementations will > use networking protocols. > > The take-home message is that OpenMP is conceptually easier to use, but > programs written to use MPI tend to be faster and scale better. This is > even true when using a single computer, e.g. a laptop with one multicore > CPU. > > > Here is tl;dr explanation: > > As for ease of programming, it is easier to create a deadlock or > livelock with MPI than OpenMP, even though programs written to use MPI > tend to need fewer synchronization points. There is also less > boilerplate code to type when using OpenMP, because we do not have to > code object serialization, message passing, and object deserialization. > > For performance, programs written to use MPI seems to have a larger > overhead because they require object serialization and message passing, > whereas OpenMP threads can just share the same objects. The reality is > actually the opposite, and is due to the internals of modern CPU, > particularly hierarchichal memory, branch prediction and long pipelines. > > Because of hierarchichal memory, the cache used by CPUs and CPU cores > must be kept in synch. Thus when using OpenMP (threads) there will be a > lot of synchronization going on that the programmer does not see, but > which the hardware will do behind the scenes. There will also be a lot > of data passing between various cache levels on the CPU and RAM. If a > core writes to a pice of memory it keeps in a cache line, a cascade of > data traffic and synchronization can be triggered across all CPUs and > cores. Not only will this stop the CPUs and prompt them to synchronize > cache with RAM, it also invalidates their branch prediction and they > must flush their pipelines and throw away work they have already done. > The end result is a program that does not scale or perform very well, > even though it does not seem to have any explicit synchronization points > that could explain this. The term "false sharing" is often used to > describe this problem. > > Programs written to use MPI are the opposite. There every instance of > synchronization and message passing is visible. When a CPU core writes > to memory kept in a cache line, it will never trigger synchronization > and data traffic across all the CPUs. The scalability is as the program > predicts. And even though memory and objects are not shared, there is > actually much less data traffic going on. > > Which to use? Most people find it easier to use OpenMP, and it does not > require a big runtime environment to be installed. But programs using > MPI tend to be the faster and more scalable. If you need to ensure > scalability on multicores, multiple processes are better than multiple > threads. The scalability of MPI also applies to Python's > multiprocessing. It is the isolated virtual memory of each process that > allows the cores to run at full speed. > > Another thing to note is that Windows is not a second-class citizen when > using MPI. The MPI runtime (usually an executable called mpirun or > mpiexec) starts and manages a group of processes. It does not matter if > they are started by fork() or CreateProcess(). > > > > > Solving reference counts in this situation is a separate issue that > > will likely need to be resolved, regardless of which machinery we use > > to isolate task execution. > > As long as we have a GIL, and we need the GIL to update a reference > count, it does not hurt so much as it otherwise would. The GIL hides > most of the scalability impact by serializing flow of execution. > > > > > IPC sounds great, but how well does it interact with Python's memory > > management/allocator? I haven't looked closely but I expect that > > multiprocessing does not use IPC anywhere. > > multiprocessing does use IPC. Otherwise the processes could not > communicate. One example is multiprocessing.Queue, which uses a pipe and > a semaphore. > > > > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Wed Jun 24 22:11:29 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 24 Jun 2015 21:11:29 +0100 Subject: [Python-ideas] web API to get a list of all module in stdlib In-Reply-To: <4D0D7292-C3AE-4644-A27F-2D8E898E161B@gmail.com> References: <4D0D7292-C3AE-4644-A27F-2D8E898E161B@gmail.com> Message-ID: On 24/06/2015 17:15, Ryan Gonzalez wrote: > > > On June 24, 2015 11:03:10 AM CDT, Mark Lawrence wrote: >> On 24/06/2015 14:19, anatoly techtonik wrote: >>> Hi, >>> >>> People from core-workflow are not too active about the idea, >>> >> >> If you want to resurrect this please sign the CLA and provide some >> code. >> Otherwise please go away permanently, thank you. > > FYI, there are nicer ways to say that... > > Regretably to the OP there aren't, he has no concept of taking other people into account. Possibly he's autistic the same as me, who knows? All I do know is that he's driven a highly respected member of the community as in Nick Coghlan away from the core workflow mailing list. Still like one of the Piranha brothers he used to buy his mother flowers and things, so that's okay. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mal at egenix.com Wed Jun 24 22:16:56 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Jun 2015 22:16:56 +0200 Subject: [Python-ideas] web API to get a list of all module in stdlib In-Reply-To: References: <4D0D7292-C3AE-4644-A27F-2D8E898E161B@gmail.com> Message-ID: <558B1038.4090807@egenix.com> Please keep discussions on topic and avoid heading off into the woods - there are snakes out there and those are not the kinds we're discussing here :-) Thank you, -- Marc-Andre Lemburg Director Python Software Foundation http://www.python.org/psf/ From breamoreboy at yahoo.co.uk Wed Jun 24 22:49:53 2015 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Wed, 24 Jun 2015 21:49:53 +0100 Subject: [Python-ideas] web API to get a list of all module in stdlib In-Reply-To: <558B1038.4090807@egenix.com> References: <4D0D7292-C3AE-4644-A27F-2D8E898E161B@gmail.com> <558B1038.4090807@egenix.com> Message-ID: On 24/06/2015 21:16, M.-A. Lemburg wrote: > Please keep discussions on topic and avoid heading off into the > woods - there are snakes out there and those are not the kinds > we're discussing here :-) > > Thank you, > Thank you for bringing me so gently back to earth, I most seriously appreciate it. Should any of you ever head into Mudeford, Christchurch, Dorset, UK, beers are on me, at the inaugural meeting of the local Python Users Group. This would obviously have to be called MudPy or MudePy :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence From mal at egenix.com Wed Jun 24 22:50:58 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 24 Jun 2015 22:50:58 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <558A97FB.9070908@egenix.com> Message-ID: <558B1832.60505@egenix.com> On 24.06.2015 18:58, Sturla Molden wrote: > On 24/06/15 13:43, M.-A. Lemburg wrote: > >> That said, I still think the multiple-process is a better one (more >> robust, more compatible, fewer problems). We'd just need a way more >> efficient approach to sharing objects between the Python processes >> than using pickle and shared memory or pipes :-) > > It is hard to get around shared memory, Unix domain sockets, or pipes. There must be some sort of > IPC, regardless. Sure, but the current approach of pickling Python objects for communication is just too much overhead in many cases - it also duplicates the memory requirements when using the multiple process approach since you eventually end up having n copies of the same data in memory (with n = number of parallel workers). > One idea I have played with is to use a specialized queue instead of the current > multiprocessing.Queue. In scientific computing we often need to pass arrays, so it would make sense > to have a queue that could bypass pickle for NumPy arrays, scalars and dtypes, simply by using the > NumPy C API to process the data. It could also have specialized code for a number of other objects > -- at least str, int, float, complex, and PEP 3118 buffers, but perhaps also simple lists, tuples > and dicts with these types. I think it should be possible to make a queue that would avoid the > pickle issue for 99 % of scientific computing. It would be very easy to write such a queue with > Cython and e.g. have it as a part of NumPy or SciPy. The tricky part is managing pointers in those data structures, e.g. a container types for other Python objects will have to store all referenced objects in the shared memory segment as well. For NumPy arrays using simple types this is a lot easier, since you don't have to deal with pointers to other objects. > One thing I did some years ago was to have NumPy arrays that would store the data in shared memory. > And when passed to multiprocessing.Queue they would not pickle the data buffer, only the metadata. > However this did not improve on performance, because the pickle overhead was still there, and > passing a lot of binary data over a pipe was not comparably expensive. So while it would save > memory, it did not make programs using multiprocessing and NumPy more efficient. When saying "passing a lot of binary data over a pipe" you mean the meta-data ? I had discussed the idea of Python object sharing with Larry Hastings back in 2013, but decided that trying to get all references of containers managed in the shared memory would be too fragile an approach to pursue further. Still, after some more research later that year, I found that someone already had investigated the idea in 2003: http://poshmodule.sourceforge.net/ Reading the paper on this: http://poshmodule.sourceforge.net/posh/posh.pdf made me wonder why this idea never received more attention in all these years. His results are clearly positive and show that the multiple process approach can provide better scalability than using threads when combined with shared memory object storage. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 24 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78 2015-07-20: EuroPython 2015, Bilbao, Spain ... 26 days to go 2015-07-29: Python Meeting Duesseldorf ... 35 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From sturla.molden at gmail.com Wed Jun 24 23:41:02 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 24 Jun 2015 23:41:02 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <558B1832.60505@egenix.com> References: <558A97FB.9070908@egenix.com> <558B1832.60505@egenix.com> Message-ID: On 24/06/15 22:50, M.-A. Lemburg wrote: > The tricky part is managing pointers in those data structures, > e.g. a container types for other Python objects will have to > store all referenced objects in the shared memory segment as > well. If a container type for Python objects contains some unknown object type we would have to use pickle as fallback. > For NumPy arrays using simple types this is a lot easier, > since you don't have to deal with pointers to other objects. The objects we deal with in scientific computing are usually arrays with a rather regular structure, not deeply nested Python objects. Even a more complex object like scipy.spatial.cKDTree is just a collection of a few contiguous arrays under the hood. So we could for most parts squash the pickle overhead that anyone will encounter by specializing a queue that has knowledge about a small set of Python types. > When saying "passing a lot of binary data over a pipe" you mean > the meta-data ? No, I mean the buffer pointed to by PyArray_DATA(obj) when using the NumPy C API. We have to send a lot of raw bytes over an IPC mechanism before this communication compares to the pickle overhead. Sturla From sturla.molden at gmail.com Wed Jun 24 23:48:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 24 Jun 2015 23:48:42 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <558A97FB.9070908@egenix.com> <558B1832.60505@egenix.com> Message-ID: On 24/06/15 23:41, Sturla Molden wrote: > So we could for most parts squash > the pickle overhead that anyone will encounter by specializing a queue > that has knowledge about a small set of Python types. But this would be very domain specific for scientific and numerical computing, it would not be a general improvement for multiprocessing with Python. Sturla From wes.turner at gmail.com Wed Jun 24 23:57:35 2015 From: wes.turner at gmail.com (Wes Turner) Date: Wed, 24 Jun 2015 16:57:35 -0500 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <558A97FB.9070908@egenix.com> <558B1832.60505@egenix.com> Message-ID: On Jun 24, 2015 4:49 PM, "Sturla Molden" wrote: > > On 24/06/15 23:41, Sturla Molden wrote: > >> So we could for most parts squash >> the pickle overhead that anyone will encounter by specializing a queue >> that has knowledge about a small set of Python types. > > > But this would be very domain specific for scientific and numerical computing, it would not be a general improvement for multiprocessing with Python. Basically C structs like Thrift or Protocol Buffers? > > > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Thu Jun 25 00:02:04 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 24 Jun 2015 15:02:04 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 12:31 PM, Gregory P. Smith wrote: > You cannot assume that fork() is safe on any OS as a general solution for > anything. This isn't a Windows specific problem, It simply cannot be relied > upon in a general purpose library at all. It is incompatible with threads. > > The ways fork() can be used safely are in top level application decisions: > There must be a guarantee of no threads running before all forking is done. > (thus the impossibility of relying on it as a mechanism to do anything > useful in a generic library - you are a library, you don't know what the > whole application is doing or when you were called as part of it) > > A concurrency model that assumes that it is fine to fork() and let child > processes continue to execute is not usable by everyone. (ie: > multiprocessing until http://bugs.python.org/issue8713 was implemented). Another way of looking at it is that a concurrency model that assumes it is fine to thread and let child threads continue to execute is not usable by everyone. IMO the lesson here is don't start threads *or* fork processes behind the scenes without explicitly allowing your callers to override you, so that the top level app can orchestrate everything appropriately. This is especially important in Python, where forking is one of the best ways of getting single-machine multicore processing. Interestingly, the worker threads in OP can probably be made fork-safe. Not sure that's especially useful, but I can imagine. -- Devin From jeanpierreda at gmail.com Thu Jun 25 00:10:48 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 24 Jun 2015 15:10:48 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: I'm going to break mail client threading and also answer some of your other emails here. On Tue, Jun 23, 2015 at 10:26 PM, Eric Snow wrote: > It sounded like you were suggesting that we factor out a common code > base that could be used by multiprocessing and the other machinery and > that only multiprocessing would keep the pickle-related code. Yes, I like that idea a lot. >> Compare >> with forking, where the initialization is all done and then you fork, >> and you are immediately ready to serve, using the data structures >> shared with all the other workers, which is only copied when it is >> written to. So forking starts up faster and uses less memory (due to >> shared memory.) > > But we are aiming for a share-nothing model with an efficient > object-passing mechanism. Furthermore, subinterpreters do not have to > be single-use. My proposal includes running tasks in an existing > subinterpreter (e.g. executor pool), so that start-up cost is > mitigated in cases where it matters. > > Note that ultimately my goal is to make it obvious and undeniable that > Python (3.6+) has a good multi-core story. In my proposal, > subinterpreters are a means to an end. If there's a better solution > then great! As long as the real goal is met I'll be satisfied. :) > For now I'm still confident that the subinterpreter approach is the > best option for meeting the goal. Ahead of time: the following is my opinion. My opinions are my own, and bizarre, unlike the opinions of my employer and coworkers. (Who are also reading this maybe.) So there's two reasons I can think of to use threads for CPU parallelism: - My thing does a lot of parallel work, and so I want to save on memory by sharing an address space This only becomes an especially pressing concern if you start running tens of thousands or more of workers. Fork also allows this. - My thing does a lot of communication, and so I want fast communication through a shared address space This can become a pressing concern immediately, and so is a more visible issue. However, it's also a non-problem for many kinds of tasks which just take requests in and put output back out, without talking with other members of the pool (e.g. writing an RPC server or HTTP server.) I would also speculate that once you're on many machines, unless you're very specific with your design, RPC costs dominate IPC costs to the point where optimizing IPC doesn't do a lot for you. On Unix, IPC can be free or cheap due to shared memory. Threads really aren't all that important, and if we need them, we have them. When people tell me in #python that multicore in Python is bad because of the GIL, I point them at fork and at C extensions, but also at PyPy-STM and Jython. Everything has problems, but then so does this proposal, right? > And this is faster than passing objects around within the same > process? Does it play well with Python's memory model? As far as whether it plays with the memory model, multiprocessing.Value() just works, today. To make it even lower overhead (not construct an int PyObject* on the fly), you need to change things, e.g. the way refcounts work. I think it's possibly feasible. If not, at least the overhead would be negligible. Same applies to strings and other non-compound datatypes. Compound datatypes are hard even for the subinterpreter case, just because the objects you're referring to are not likely to exist on the other end, so you need a real copy. I'm sure you've thought about this. multiprocessing.Array has a solution for this, which is to unbox the contained values. It won't work with tuples. > I'd be interested in more info on both the refcount freezing and the > sepatate refcounts pages. I can describe the patches: - separate refcounts replaces refcount with a pointer to refcount, and changes incref/decref. - refcount freezing lets you walk all objects and set the reference count to a magic value. incref/decref check if the refcount is frozen before working. With freezing, unlike this approach to separate refcounts, anyone that touches the refcount manually will just dirty the page and unfreeze the refcount, rather than crashing the process. Both of them will decrease performance for non-forking python code, but for forking code it can be made up for e.g. by increased worker lifetime and decreased rate of page copying, plus the whole CPU vs memory tradeoff. I legitimately don't remember the difference in performance, which is good because I'm probably not allowed to say what it was, as it was tested on our actual app and not microbenchmarks. ;) >> And remember that we *do* have many examples of people using >> parallelized Python code in production. Are you sure you're satisfying >> their concerns, or whose concerns are you trying to satisfy? > > Another good point. What would you suggest is the best way to find out? I don't necessarily mean that. I mean that this thread feels like you posed an answer and I'm not sure what the question is. Is it about solving a real technical problem? What is that, and who does it affect? A new question I didn't ask before: is the problem with Python as a whole, or just CPython? -- Devin From ericsnowcurrently at gmail.com Thu Jun 25 00:19:27 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 16:19:27 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 9:26 AM, Sturla Molden wrote: > On 24/06/15 07:01, Eric Snow wrote: > There are two major competing standards for parallel computing in science > and engineering: OpenMP and MPI. OpenMP is based on a shared memory model. > MPI is based on a distributed memory model and use message passing (hence > its name). > [snip] Thanks for the great explanation! >> Solving reference counts in this situation is a separate issue that >> will likely need to be resolved, regardless of which machinery we use >> to isolate task execution. > > As long as we have a GIL, and we need the GIL to update a reference count, > it does not hurt so much as it otherwise would. The GIL hides most of the > scalability impact by serializing flow of execution. It does hurt in COW situations, e.g. forking. My expectation is that we'll at least need to take a serious look into the matter in the short term (i.e. Python 3.6). >> IPC sounds great, but how well does it interact with Python's memory >> management/allocator? I haven't looked closely but I expect that >> multiprocessing does not use IPC anywhere. > > multiprocessing does use IPC. Otherwise the processes could not communicate. > One example is multiprocessing.Queue, which uses a pipe and a semaphore. Right. I don't know quite what I was thinking. :) -eric From ericsnowcurrently at gmail.com Thu Jun 25 00:56:17 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 16:56:17 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden wrote: > On 24/06/15 07:01, Eric Snow wrote: > >> Well, perception is 9/10ths of the law. :) If the multi-core problem >> is already solved in Python then why does it fail in the court of >> public opinion. The perception that Python lacks a good multi-core >> story is real, leads organizations away from Python, and will not >> improve without concrete changes. > > > I think it is a combination of FUD and the lack of fork() on Windows. There > is a lot of utterly wrong information about CPython and its GIL. Thanks for a clear summary of the common misunderstandings. While I agreed with your points, they are mostly the same things we have been communicating for many years, to no avail. They are also oriented toward larger-scale parallelism (which I don't mean to discount). That makes it easier to misunderstand. Why? Because there are enough caveats and performance downsides (see Dave Beazley's PyCon 2015 talk) that most folks stop trying to rationalize, throw their hands up, and say "Python concurrency stinks" and "you can't *really* do multicore on Python". I have personal experience with high-profile decision makers where this is exactly what happened, with adverse consequences to support for Python within the organizations. To change this perception we need to give folks a simpler, performant concurrency model that takes advantage of multiple cores. My proposal is all about doing at least *something* that makes Python's multi-core story obvious and undeniable. *That* is my entire goal with this proposal. Clearly I have opinions on the best approach to achieve that in the 3.6 timeframe. :) However, I am quite willing to investigate all the options (as I hope this thread demonstrates). So, again, thanks for the feedback and insight. You've provided me with plenty of food for thought. -eric From sturla.molden at gmail.com Thu Jun 25 01:30:21 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 01:30:21 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On 25/06/15 00:10, Devin Jeanpierre wrote: > So there's two reasons I can think of to use threads for CPU parallelism: > > - My thing does a lot of parallel work, and so I want to save on > memory by sharing an address space > > This only becomes an especially pressing concern if you start running > tens of thousands or more of workers. Fork also allows this. This might not be a valid concern. Sharing address space means sharing *virtual memory*. Presumably what they really want is to save *physical memory*. Two processes can map the same physical memory into virtual memory. > - My thing does a lot of communication, and so I want fast > communication through a shared address space > > This can become a pressing concern immediately, and so is a more > visible issue. This is a valid argument. It is mainly a concern for those who use deeply nested Python objects though. > On Unix, IPC can be free or cheap due to shared memory. This is also the case on Windows. IPC mechanisms like pipes, fifos, Unix domain sockets are also very cheap on Unix. Pipes are also very cheap on Windows, as are tcp sockets on localhost. Windows named pipes are similar to Unix domain sockets in performance. > Same applies to strings and other non-compound datatypes. Compound > datatypes are hard even for the subinterpreter case, just because the > objects you're referring to are not likely to exist on the other end, > so you need a real copy. Yes. With a "share nothing" message-passing approach, one will have to make deep copies of any mutable object. And even though a tuple can be immutable, it could still contain mutable objects. It is really hard to get around the pickle overhead with subinterpreters. Since the pickle overhead is huge compared to the low-level IPC, there is very little to save in this manner. > - separate refcounts replaces refcount with a pointer to refcount, and > changes incref/decref. > - refcount freezing lets you walk all objects and set the reference > count to a magic value. incref/decref check if the refcount is frozen > before working. > > With freezing, unlike this approach to separate refcounts, anyone that > touches the refcount manually will just dirty the page and unfreeze > the refcount, rather than crashing the process. > > Both of them will decrease performance for non-forking python code, Freezing has little impact on a modern CPU with branch prediction. On GCC we can also use __builtin_expect to make sure the optimal code is generated. This is a bit similar to using typed memoryviews and NumPy arrays in Cython with and without bounds checking. A pragma like @cython.boundscheck(False) have little benefit for the performance because of the CPU's branch prediction. The CPU knows it can expect the bounds check to pass, and only if it fails will it have to flush the pipeline. But if the bounds check passes the pipeline need not be flushed, and performance wise it will be as if the test were never there. This has greatly improved the last decade, particularly because processors have been optimized for running languages like Java and .NET efficiently. A check for a thawed refcount would be similarly cheap. Keeping reference counts in extra pages could impair performance, but mostly if multiple threads are allowed to access the same page. Because of hierachical memory, the extra pointer lookup should not matter much. Modern CPUs have evolved to solve the aliasing problem that formerly made Fortran code run faster than similar C code. Today C code tends to be faster than similar Fortran. This helps if we keep refcounts in a separate page, and the compiler cannot know what the pointer actually refers to and what it might alias. 10 or 15 years ago it would have been a performance killer, but not today. Sturla From sturla.molden at gmail.com Thu Jun 25 01:47:07 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 01:47:07 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 25/06/15 00:56, Eric Snow wrote: > Why? Because there are enough caveats and performance downsides (see > Dave Beazley's PyCon 2015 talk) that most folks stop trying to > rationalize, throw their hands up, and say "Python concurrency stinks" > and "you can't *really* do multicore on Python". Yes, that seems to be the case. > To change this perception we need to give folks a simpler, performant > concurrency model that takes advantage of multiple cores. My proposal > is all about doing at least *something* that makes Python's multi-core > story obvious and undeniable. I think the main issue with subinterpreters and a message-passing model is that it will be very difficult to avoid deep copies of Python objects. And in that case all we have achieved compared to multiprocessing is less scalability. Also you have not removed the GIL, so the FUD about the dreaded GIL will still be around. Clearly introducing multiprocessing in the standard library did nothing to reduce this. Sturla From njs at pobox.com Thu Jun 25 01:55:31 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 24 Jun 2015 16:55:31 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre wrote: > So there's two reasons I can think of to use threads for CPU parallelism: > > - My thing does a lot of parallel work, and so I want to save on > memory by sharing an address space > > This only becomes an especially pressing concern if you start running > tens of thousands or more of workers. Fork also allows this. Not necessarily true... e.g., see two threads from yesterday (!) on the pandas mailing list, from users who want to perform queries against a large data structure shared between threads/processes: https://groups.google.com/d/msg/pydata/Emkkk9S9rUk/eh0nfiGR7O0J https://groups.google.com/forum/#!topic/pydata/wOwe21I65-I ("Are we just screwed on windows?") -n -- Nathaniel J. Smith -- http://vorpus.org From sturla.molden at gmail.com Thu Jun 25 02:02:05 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 02:02:05 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 25/06/15 00:19, Eric Snow wrote: >>> Solving reference counts in this situation is a separate issue that >>> will likely need to be resolved, regardless of which machinery we use >>> to isolate task execution. >> >> As long as we have a GIL, and we need the GIL to update a reference count, >> it does not hurt so much as it otherwise would. The GIL hides most of the >> scalability impact by serializing flow of execution. > > It does hurt in COW situations, e.g. forking. My expectation is that > we'll at least need to take a serious look into the matter in the > short term (i.e. Python 3.6). Yes. It hurts performance after forking as reference counting will trigger a lot of page copies. Keeping reference counts in separate pages and replacing the field in the PyObject struct would reduce this problem by a factor of up to 512 (64 bit) or 1024 (32 bit). It does not hurt performance with multi-threading, as Python threads are serialized by the GIL. But if the GIL was removed it would result in a lot of false sharing. That is a major reason we need a tracing garbage collector instead of reference counting if we shall be able to remove the GIL. Sturla From jeanpierreda at gmail.com Thu Jun 25 02:09:55 2015 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 24 Jun 2015 17:09:55 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Wed, Jun 24, 2015 at 4:30 PM, Sturla Molden wrote: > On 25/06/15 00:10, Devin Jeanpierre wrote: > >> So there's two reasons I can think of to use threads for CPU parallelism: >> >> - My thing does a lot of parallel work, and so I want to save on >> memory by sharing an address space >> >> This only becomes an especially pressing concern if you start running >> tens of thousands or more of workers. Fork also allows this. > > > This might not be a valid concern. Sharing address space means sharing > *virtual memory*. Presumably what they really want is to save *physical > memory*. Two processes can map the same physical memory into virtual memory. Yeah, physical memory. I agree, processes with shared memory can be made to work in practice. Although, threads are better for memory usage, by defaulting to sharing even on write. (Good for memory, maybe not so good for bug-freedom...) So from my perspective, this is the hard problem in multicore python. My views may be skewed by the peculiarities of the one major app I've worked on. >> Same applies to strings and other non-compound datatypes. Compound >> datatypes are hard even for the subinterpreter case, just because the >> objects you're referring to are not likely to exist on the other end, >> so you need a real copy. > > > Yes. > > With a "share nothing" message-passing approach, one will have to make deep > copies of any mutable object. And even though a tuple can be immutable, it > could still contain mutable objects. It is really hard to get around the > pickle overhead with subinterpreters. Since the pickle overhead is huge > compared to the low-level IPC, there is very little to save in this manner. I think this is giving up too easily. Here's a stupid idea for sharable interpreter-specific objects: You keep a special heap for immutable object refcounts, where each thread/process has its own region in the heap. Refcount locations are stored as offsets into the thread local heap, and incref does ++*(threadlocal_refcounts + refcount_offset); Then for the rest of a pyobject's memory, we share by default and introduce a marker for which thread originated it. Any non-threadsafe operations can check if the originating thread id is the same as the current thread id, and raise an exception if not, before even reading the memory at all. So it introduces an overhead to accessing mutable objects. Also, this won't work with extension objects that don't check, those just get shared and unsafely mutate and crash. This also introduces the possibility of sharing mutable objects between interpreters, if the objects themselves choose to implement fine-grained locking. And it should work fine with fork if we change how the refcount heap is allocated, to use mmap or whatever. This is probably not acceptable for real, but I just mean to show with a straw man that the problem can be attacked. -- Devin From rosuav at gmail.com Thu Jun 25 02:12:07 2015 From: rosuav at gmail.com (Chris Angelico) Date: Thu, 25 Jun 2015 10:12:07 +1000 Subject: [Python-ideas] natively logging sys.path modifications In-Reply-To: References: Message-ID: On Thu, Jun 25, 2015 at 4:26 AM, anatoly techtonik wrote: > That object will be broken if somebody decides to use assignment: > > sys.path = [] > > And as far as I know it is not possible to prevent this case or guard > against this replacement. So what you want is for the sys module to log all assignments to a particular attribute, AND for all mutations of that attribute to be logged as well. That sounds like two completely separate problems to be solved, but neither is fundamentally impossible (although you'd need to fiddle with the sys module itself to do the other). I suggest you investigate ways of solving this that require zero core code changes, as those ways will work on all existing Python versions. Then once you run up against an actual limitation, you'll have a better argument for code changes. ChrisA From sturla.molden at gmail.com Thu Jun 25 02:45:24 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 02:45:24 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On 25/06/15 02:09, Devin Jeanpierre wrote: > Although, threads are better for memory > usage, by defaulting to sharing even on write. (Good for memory, maybe > not so good for bug-freedom...) I am not sure. Code written to use OpenMP tend to have less bugs than code written to use MPI. This suggests that shared memory is easier than message-passing, which is contrary to the common belief. My own experience with OpenMP and MPI suggests it is easier to create a deadlock with message-passing than accidentally have threads access the same address concurrently. This is also what I hear from other people who writes code for scientific computing. I see a lot of claims that message-passing is supposed to be "safer" than a shared memory model, but that is not what we see with OpenMP and MPI. With MPI, the programmer must make sure that the send and receive commands are passed in the right order at the right time, in each process. This leaves plenty of room for messing up or creating unmaintainable spaghetti code, particularly in a complex algorithm. It is easier to make sure all shared objects are protected with mutexes than to make sure a spaghetti of send and receive messages are in correct order. It might be that Python's queue method of passing messages leave less room for deadlocking than the socket-like MPI_send and MPI_recv functions. But I think message-passing are sometimes overrated as "the safe solution" to multi-core programming (cf. Go and Erlang). Sturla From njs at pobox.com Thu Jun 25 03:05:17 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 24 Jun 2015 18:05:17 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <5585E37F.4060403@gmail.com> Message-ID: On Wed, Jun 24, 2015 at 5:45 PM, Sturla Molden wrote: > On 25/06/15 02:09, Devin Jeanpierre wrote: > >> Although, threads are better for memory >> usage, by defaulting to sharing even on write. (Good for memory, maybe >> not so good for bug-freedom...) > > I am not sure. Code written to use OpenMP tend to have less bugs than code > written to use MPI. This suggests that shared memory is easier than > message-passing, which is contrary to the common belief. OpenMP is an *extremely* structured and constrained subset of shared memory multithreading, and not at all comparable to pthreads/threading.py/whatever. -n -- Nathaniel J. Smith -- http://vorpus.org From sturla.molden at gmail.com Thu Jun 25 03:31:51 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 01:31:51 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: Message-ID: <894209507456887675.977940sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > OpenMP is an *extremely* structured and constrained subset of shared > memory multithreading, and not at all comparable to > pthreads/threading.py/whatever. If you use "parallel section" it is almost as free as using pthreads directly. But if you stick to "parallel for", which most do, you have a rather constrained and more well-behaved subset. I am quite sure MPI can even be a source of more errors than pthreads used directly. Getting message passing right inside a complex algorithm is not funny. I would rather keep my mind focused on which objects to protect with a lock or when to signal a condition. Sturla From ericsnowcurrently at gmail.com Thu Jun 25 03:57:19 2015 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 24 Jun 2015 19:57:19 -0600 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden wrote: > The reality is that Python is used on even the largest supercomputers. The > scalability problem that is seen on those systems is not the GIL, but the > module import. If we have 1000 CPython processes importing modules like > NumPy simultaneously, they will do a "denial of service attack" on the file > system. This happens when the module importer generates a huge number of > failed open() calls while trying to locate the module files. > > There is even described in a paper on how to avoid this on an IBM Blue > Brain: "As an example, on Blue Gene P just starting up Python and importing > NumPy and GPAW with 32768 MPI tasks can take 45 minutes!" I'm curious what difference there is under Python 3.4 (or even 3.3). Along with being almost entirely pure Python, the import system now has some optimizations that help mitigate filesystem access (particularly stats). Regardless, have there been any attempts to address this situation? I'd be surprised if there haven't. :) Is the solution described in the cited paper sufficient? Earlier Barry brought up Emac's unexec as at least an inspiration for a solution. I expect there are a number of approaches. It would be nice to address this somehow (though unrelated to my multi-core proposal). I would expect that it could also have bearing on interpreter start-up time. If it's worth pursuing then consider posting something to import-sig. -eric From trent at snakebite.org Thu Jun 25 08:50:52 2015 From: trent at snakebite.org (Trent Nelson) Date: Thu, 25 Jun 2015 02:50:52 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <20150625065050.GA15018@snakebite.org> On Wed, Jun 24, 2015 at 04:55:31PM -0700, Nathaniel Smith wrote: > On Wed, Jun 24, 2015 at 3:10 PM, Devin Jeanpierre > wrote: > > So there's two reasons I can think of to use threads for CPU parallelism: > > > > - My thing does a lot of parallel work, and so I want to save on > > memory by sharing an address space > > > > This only becomes an especially pressing concern if you start running > > tens of thousands or more of workers. Fork also allows this. > > Not necessarily true... e.g., see two threads from yesterday (!) on > the pandas mailing list, from users who want to perform queries > against a large data structure shared between threads/processes: > > https://groups.google.com/d/msg/pydata/Emkkk9S9rUk/eh0nfiGR7O0J > https://groups.google.com/forum/#!topic/pydata/wOwe21I65-I > ("Are we just screwed on windows?") Ironically (not knowing anything about Pandas' implementation details other than... "Cython... and NumPy"), there should be no difference between getting a Pandas DataFrame available to PyParallel and a NumPy ndarray or Cythonized C-struct (like datrie). The situation Ryan describes is literally the exact situation that PyParallel excels at: large reference data structures accessible in parallel contexts. Trent. From trent at snakebite.org Thu Jun 25 06:59:04 2015 From: trent at snakebite.org (Trent Nelson) Date: Thu, 25 Jun 2015 00:59:04 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <20150625045904.GA844@trent.me> On Tue, Jun 23, 2015 at 11:01:24PM -0600, Eric Snow wrote: > On Sun, Jun 21, 2015 at 5:41 AM, Sturla Molden wrote: > > From the perspective of software design, it would be good it the CPython > > interpreter provided an environment instead of using global objects. It > > would mean that all functions in the C API would need to take the > > environment pointer as their first variable, which will be a major rewrite. > > It would also allow the "one interpreter per thread" design similar to tcl > > and .NET application domains. > > While perhaps a worthy goal, I don't know that it fits in well with my > goals. I'm aiming for an improved multi-core story with a minimum of > change in the interpreter. This slide and the following two are particularly relevant: https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=4 I elicit three categories of contemporary problems where efficient use of multiple cores would be desirable: 1) Computationally-intensive work against large data sets (the traditional "parallel" HPC/science/engineering space, and lately, to today's "Big Data" space). 2a) Serving tens/hundreds of thousands of network clients with non-trivial computation required per-request (i.e. more than just buffer copying between two sockets); best example being the modern day web server, or: 2b) Serving far fewer clients, but striving for the lowest latency possible in an environment with "maximum permitted latency" restrictions (or percentile targets, 99s etc). In all three problem domains, there is a clear inflection point at which multiple cores would overtake a single core in either: 1) Reducing the overall computation time. 2a|b) Serving a greater number of clients (or being able to perform more complex computation per request) before hitting maximum permitted latency limits. For PyParallel, I focused on 2a and 2b. More specifically, a TCP/IP socket server that had the ability to dynamically adjust its behavior (low latency vs concurrency vs throughput[1]), whilst maintaining optimal usage of underlying hardware[2]. That is: given sufficient load, you should be able to saturate all I/O channels (network and disk), or all cores, or both, with *useful* work. (The next step after saturation is sustained saturation (given sufficient load), which can be even harder to achieve, as you need to factor in latencies for "upcoming I/O" ahead of time if your computation is driven by the results of a disk read (or database cursor fetch).) (Sturla commented on the "import-DDoS" that you can run into on POSIX systems, which is a good example. You're saturating your underlying hardware, sure, but you're not doing useful work -- it's important to distinguish the two.) Dynamically adjusting behavior based on low latency vs concurrency vs throughput: [1]: https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores?slide=115 https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-exploited-all-cores?slide=120 Optimal hardware use: [2]: https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=6 So, with the focus of PyParallel established (socket server that could exploit all cores), my hypothesis was that I could find a new way of doing things that was more performant than the status quo. (In particular, I wanted to make sure I had an answer for "why not just use multiprocessing?" -- which is an important question.) https://speakerdeck.com/trent/parallelism-and-concurrency-with-python?slide=22 So, I also made the decision to leverage threads for parallelism and not processes+IPC, which it sounds like you're leaning toward as well. Actually, other than the subinterpreter implementation aspect, everything you've described is basically on par with PyParallel, more or less. Now, going back to your original comment: > While perhaps a worthy goal, I don't know that it fits in well with my > goals. I'm aiming for an improved multi-core story with a minimum of > change in the interpreter. That last sentence is very vague as multi-core means different things to different people. What is the problem domain you're going to try and initially target? Computationally-intensive parallel workloads like in 1), or the network I/O-driven socket server stuff like in 2a/2b? I'd argue it should be the latter. Reason being is that you'll rarely see the former problem tackled solely by pure Python -- e.g. Python may be gluing everything together, but the actual computation will be handled by something like NumPy/Numba/Fortran/Cython or custom C stuff, and, as Sturla's mentioned, OpenMP and MPI usually gets involved to manage the parallel aspect. For the I/O-driven socket server stuff, though, you already have this nice delineation of what would be run serially versus what would be ideal to run in parallel: import datrie import numpy as np import pyodbc import async from collections import defaultdict from async.http.server import ( router, make_routes, HttpServer, RangedRequest, ) # Tell PyParallel to invoke the tp_dealloc method explicitly # for these classes when rewinding a heap after a parallel # callback has finished. (Implementation detail: this toggles # the Py_TPFLAGS_PX_DEALLOC flag in the TypeObject's tp_flags; # when PyParallel intercepts PyObject_NEW/INIT (init_object), # classes (PyTypeObject *tp) with this flag set will be tracked # in a linked-list that is local to the parallel context being # used to service this client. When the context has its heaps # rewound back to the initial state at the time of the snapshot, # it will call tp_dealloc() explicitly against all objects of # this type that were encountered.) async.register_dealloc(pyodbc.Connection) async.register_dealloc(pyodbc.Cursor) async.register_dealloc(pyodbc.Row) # Load 29 million titles. RSS += ~9.5GB. TITLES = datrie.Trie.load('titles.trie') # Load 15 million 64-bit offsets. RSS += ~200MB. OFFSETS = np.load('offsets.npy') XML = 'enwiki-20150205-pages-articles.xml' class WikiServer(HttpServer): # All of these methods are automatically invoked in # parallel. HttpServer implements a data_received() # method which prepares the request object and then # calls the relevant method depending on the URL, e.g. # http://localhost/user/foo will call the user(request, # name='foo'). If we want to "write" to the client, # we return a bytes, bytearray or unicode object from # our callback (that is, we don't expose a socket.write() # to the user). # # Just before the PyParallel machinery invokes the # callback (via a simple PyObject_CallObject), though, # it takes a snapshot of its current state, such that # the exact state can be rolled back to (termed a socket # "rewind") when this callback is complete. If we don't # return a sendable object back, this rewind happens # immediately, and then we go straight into a read call. # If we do return something sendable, we send it. When # that send completes, *then* we do the rewind, then we # issue the next read/recv call. # # This approach is particularly well suited to parallel # callback execution because none of the objects we create # as part of the callback are needed when the callback # completes. No garbage can accumulate because nothing # can live longer than that callback. That obviates the # need for two things: reference counting against any object # in a parallel context, and garbage collection. Those # things are useful for the main thread, but not parallel # contexts. # # What if you do want to keep something around after # the callback? If it's a simple scalar type, the # following will work: # class Server: # name = None # @route # def set_name(self, request, name): # self.name = name.upper() # ^^^^^^^^^ we intercept that setattr and make # a copy of (the result of) name.upper() # using memory allocation from a different # heap that persists as long as the client # stays connnected. (There's actually # support for alternatively persisting # the entire heap that the object was # allocated from, which we could use if # we were persisting complex, external, # or container types where simply doing # a memcpy() of a *base + size_t wouldn't # be feasible. However, I haven't wired # up this logic to the socket context # logic yet.) # @route # def name(self, request): # return json_serialization(request, self.name) # ^^^^^^^^^ # This will return whatever # was set in the call above. # Once the client disconnects, # the value disappears. # # (Actually I think if you wanted to persist the object # for the lifetime of the server, you could probably # do `request.transport.parent.name = xyz`; or at least, # if that doesn't currently work, the required mechanics # definitely exist, so it would just need to be wired # up.) # # If you want to keep an object around past the lifetime of # the connected client and the server, then send it to the main # thread where it can be tracked like a normal Python object: # # USERS = async.dict() # ^^^^^^^^^^^^ shortcut for: # foo = {} # async.protect(foo) # or just: # foo = async.protect({}) # (On the backend, this instruments[3] the object such # that PyParallel can intercept setattr/setitem and # getattr/getitem calls and "do stuff"[4], depending # on the context.) [3]: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-1796 [4]: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-1632 # # class MyServer(HttpServer): # @route # ^^^^^^ Ignore the mechanics of this, it's just a helper # decorator I used to translate a HTTP GET for # /login/foo to a function call of `login(name='foo')`. # (see the bowls of async.http.server for details). # def login(self, request, name): # @call_from_main_thread # def _save_name(n): # USERS[n] = async.rdtsc() # return len(USERS) # count = _save_name(name) # return json_serialization(request, {'count': count}) # # The @call_from_main_thread decorator will enqueue a work # item to the main thread, and then wait on the main thread's # response. The main thread executes the callback and notifies # the parallel thread that the call has been completed and the # return value (in this case the value of `len(USERS)`). The # parallel thread resumes and finishes the client request. # Note that this will implicitly serialize execution; any number # of parallel requests can submit main thread work, but the # main thread can only call them one at a time. So, you'd # usually try and avoid this, or at least remove it from your # application's hot code path. connect_string = None all_users_sql = 'select * from user' one_user_sql = 'select * from user where login = ?' secret_key = None @route def wiki(self, request, name): # http://localhost/wiki/Python: name = Python if name not in TITLES: self.error(request, 404) # log(n) lookup against a trie with 29 million keys. offset = TITLES[name][0] # log(n) binary search against a numpy array with 15 # million int64s. ix = OFFSETS.searchsorted(offset, side='right') # OFFSETS[ix] = what's the offset after this? (start, end) = (ix-7, OFFSETS[ix]-11) # -7, +11 = adjust for the fact that all of the offsets # were calculated against the '<' of 'Foo'. range_request = '%d-%d' % (start, end) request.range = RangedRequest(range_request) request.response.content_type = 'text/xml; charset=utf-8' return self.sendfile(request, XML) @route def users(self, request): # ODBC driver managers that implement connection pooling # behind the scenes play very nicely with our # pyodbc.connect() call here, returning a connection # from the pool (when able) without blocking. con = pyodbc.connect(self.connect_string) # The next three odbc calls would all block (in the # traditional sense), so this current thread would # not be able to serve any other requests whilst # waiting for completion -- however, this is far # less of a problem for PyParallel than single-threaded # land as other threads will keep servicing requests # in the mean time. (ODBC 3.8/SQL Server 2012/Windows 8 # did introduce async notification, such that we could # request that an event be set when the cursor/query/call # has completed, which we'd tie in to PyParallel by # submitting a threadpool wait (much like we do for async # DNS lookup[5], also added in Windows 8), however, it was # going to require a bit of modification to the pyodbc # module to support the async calling style, so, all the # calls stay synchronous for now.) [5]: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-7616 cur = con.cursor() cur.execute(self.all_users_sql) return json_serialization(request, cur.fetchall()) @route def user(self, request, login): con = pyodbc.connect(self.connect_string) cur = con.cursor() cur.execute(self.one_user_sql, (login,)) return json_serialization(request, cur.fetchall()) @route def set_secret_key(self, request, key): # http://localhost/set_secret_key/foobar # An example of persisting a scalar for the lifetime # of the thread (that is, until it disconects or EOFs). try: self.secret_key = [ key, ] except ValueError: # This would be hit, because we've got guards in place # to assess the "clonability" of an object at this # point[6]. (Ok, after reviewing the code, we don't, # but at least we'd crash.) [6]: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-4944 # However, this would work fine, essentially memcpy'ing # the key object at the time of assignment using a different # heap to the one that automatically gets reset at the end # of the callback. self.secret_key = key @route def secret_key(self, request): # http://localhost/secret_key -> 'foobar' return json_serialization(request, {'key': self.secret_key}) @route def stats(self, request): # Handy little json representation of various system stats; # active parallel contexts, I/O hogs, memory load, etc. stats = { 'system': dict(sys_stats()), 'server': dict(socket_stats(request.transport.parent)), 'memory': dict(memory_stats()), 'contexts': dict(context_stats()), 'elapsed': request.transport.elapsed(), 'thread': async.thread_seq_id(), } return json_serialization(request, stats) @route def debug(self, request): # Don't call print() or any of the sys.std(err|out) # methods in a parallel context. If you want to do some # poor man's debugging with print statements in lieu of not # being able to attach a pdb debugger (tracing is disabled # in parallel threads), then use async.debug(). (On # Windows, this writes the message to the debug stream, # which you'd monitor via dbgview or VS.) async.debug("received request: %s" % request.data) # Avoid repr() at the moment in parallel threads; it uses # PyThreadState_SetDictItem() to control recursion depths, # which I haven't made safe to call from a parallel context. # If you want to attach Visual Studio debugger at this point # though, you can do so via: async.debugbreak() # (That literally just generates an INT 3.) @route def shutdown(self, request): # Handy helper for server shutdown (stop listening on the # bound IP:PORT, wait for all running client callbacks to # complete, then return. Totally almost works at the # moment[7].) [7]: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-11818 request.transport.shutdown() def main(): server = async.server('0.0.0.0', port) protocol = HttpServer protocol.connect_string = 'Driver={SQL Server}...' async.register(transport=server, protocol=protocol) ^^^^^^^^^^^^^^ this will create a special 'server' instance of the protocol, which will issue the bind() call. It then creates a configurable number (currently ncpu * 2) of parallel contexts and triggers parallel AcceptEx() invocation (you can prime "pre-accepted" sockets on Windows, which removes the serialization limits of accept() on POSIX). # If an exception occurs in a parallel thread, it is queued # to a special list the main thread has. The main thread # checks this list each time async.run_once() is called, so, # we call it here just to propagate any exceptions that # may have already occurred (like attempting to bind to an # invalid IP, or submitting a protocol that had an error). async.run_once() return server # (This also facilitates interactive console usage whilst # serving request in parallel.) if __name__ == '__main__': main() # Run forever. Returns when there are no active contexts # or ctrl-c is pressed. async.run() All of that works *today* with PyParallel. The main thread preps everything, does the importing, loads the huge data structures, establishes all the code objects and then, once async.run() is called, sits there dormant waiting for feedback from the parallel threads. It's not perfect; I haven't focused on clean shutdown yet, so you will 100% crash if you ctrl-C it currently. That's mainly an issue with interpreter finalization destroying the GIL, which clears our Py_MainThreadId, which makes all the instrumented macros like Py_INCREF/Py_DECREF think they're in a parallel context when they're not, which... well, you can probably guess what happens after that if you've got 8 threads still running at the time pointer dereferencing things that aren't what they think they are. None of the problems are showstoppers though, it's just a matter of prioritization and engineering effort. My strategic priorities to date have been: a) no changes to semantics of CPython API b) high performance c) real-world examples Now, given that this has been something I've mostly worked on in my own time, my tactical priority each development session (often started after an 8 hour work day where I'm operating at reduced brain power) is simply: a) forward progress at any cost The quickest hack I can think of that'll address the immediate problem is the one that gets implemented. That hack will last until it stops working, at which point, the quickest hack I can think of to replace it wins, and so on. At no time do I consider the maintainability, quality or portability of the hack -- as long as it moves the overall needle forward, perfect; it can be made elegant later. I think it's important to mention that, because if you're reviewing the source code, it helps explain things like how I implemented the persistence of an object within a client session (e.g. intercepting the setattr/setitem and doing the alternate heap memcpy dance alluded to above): https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/diffs/Objects/dictobject.c.patch?at=3.3-px#cl-28 Without that bit of code, you'll leak memory, with it, you won't. I attacked pyodbc a few weeks ago -- it was also leaking memory when called from parallel callbacks because tp_dealloc wasn't being called on any of the Connection, Cursor or Row objects, so handles that were allocated (i.e. SQLAllocHandle()) were never paired with a SQLFreeHandle() (because we don't refcount in a parallel context, which means there's never a Py_DECREF that hits 0, which means Py_Dealloc() never gets called for that object (which works fine for everything that allocates via PyObject/PyMem facilities, because we intercept those and roll them back in bulk)), and thus, leak. Quickest fix I could think of at the time: async.register_dealloc(pyodbc.Connection) async.register_dealloc(pyodbc.Cursor) async.register_dealloc(pyodbc.Row) Which facilitates this during our interception of PyObject_NEW/INIT: https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-3387 Which allows us to do this for each heap... https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-873 ....that we encounter as part of "socket rewinding": https://bitbucket.org/tpn/pyparallel/src/8528b11ba51003a9821ceb75683ee96ed33db28a/Python/pyparallel.c?at=3.3-px#cl-793 Absolutely horrendous hack from a software engineering perspective, but is surprisingly effective at solving the problem. Regards, Trent. From njs at pobox.com Thu Jun 25 10:58:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 25 Jun 2015 01:58:25 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150625045904.GA844@trent.me> References: <20150625045904.GA844@trent.me> Message-ID: On Wed, Jun 24, 2015 at 9:59 PM, Trent Nelson wrote: > (Sturla commented on the "import-DDoS" that you can run into on POSIX > systems, which is a good example. You're saturating your underlying > hardware, sure, but you're not doing useful work -- it's important > to distinguish the two.) To be clear, AFAIU the "import-DDoS" that supercomputers classically run into has nothing to do with POSIX, it has to do running systems that were designed for simulation workloads that go like: generate a bunch of data from scratch in memory, crunch on it for a while, and then spit out some summaries. So you end up with $1e11 spent on increasing the FLOP count, and the absolute minimum spent on the storage system -- basically just enough to let you load a single static binary into memory at the start of your computation, and there might even be some specific hacks in the linker to minimize cost of distributing that single binary load. (These are really weird architectures; they usually do not even have shared library support.) And the result is that when you try spinning up a Python program instead, the startup sequence produces (number of imports) * (number of entries in sys.path) * (hundreds of thousands of nodes) simultaneous stat calls hammering some poor NFS server somewhere and it falls over and dies. (I think often the network connection to the NFS server is not even using the ridiculously-fast interconnect mesh, but rather some plain-old-ethernet that gets saturated.) I could be wrong, I don't actually work with these systems myself, but that's what I've picked up. Continuing my vague and uninformed impressions, I suspect that this would actually be relatively easy to fix by hooking the import system to do something more intelligent, like nominate one node as the leader and have it do the file lookups and then tell everyone else what it found (via the existing message-passing systems). Though there is an interesting problem of how you bootstrap the hook code. But as to whether the new import hook stuff actually helps with this... I'm pretty sure most HPC centers haven't noticed that Python 3 exists yet. See above re: extremely weird architectures -- many of us are familiar with "clinging to RHEL 5" levels of conservatism, but that's nothing on "look there's only one person who ever knew how to get a working python and numpy using our bespoke compiler toolchain on this architecture that doesn't support extension module loading (!!), and they haven't touched it in years either"... There are lots of smart people working on this stuff right now. But they are starting from a pretty different place from those of us in the consumer computing world :-). -n -- Nathaniel J. Smith -- http://vorpus.org From sturla.molden at gmail.com Thu Jun 25 11:35:35 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 09:35:35 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: <20150625065050.GA15018@snakebite.org> Message-ID: <1216801053456916539.123326sturla.molden-gmail.com@news.gmane.org> Trent Nelson wrote: > The situation Ryan describes is literally the exact situation > that PyParallel excels at: large reference data structures > accessible in parallel contexts. Back in 2009 I solved this for multiprocessing using a NumPy array that used shared memory as backend (Sys V IPC, not BSD mmap, on mac and Linux). By monkey-patching the pickling of numpy.ndarray, the contents of the shared memory buffer was not pickled, only the metadata needed to reopen the shared memory. After a while it stopped working on Mac (I haven't had time to fix it -- maybe I should), but it still works on Windows. :( Anyway, there is another library that does something similar called joblib. It is used for parallel computing in scikit-learn. It creates shared memory by mmap from /tmp, which means it is only shared memory on Linux. On Mac and Window there is no tmpfs so it ends up using a physical file on disk instead :-( Sturla From sturla.molden at gmail.com Thu Jun 25 11:35:34 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 09:35:34 +0000 (UTC) Subject: [Python-ideas] solving multi-core Python References: <20150625045904.GA844@trent.me> Message-ID: <1121002233456917217.179423sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > Continuing my vague and uninformed impressions, I suspect that this > would actually be relatively easy to fix by hooking the import system > to do something more intelligent, like nominate one node as the leader > and have it do the file lookups and then tell everyone else what it > found (via the existing message-passing systems). There are two known solutions. One is basically what you describe. The other, which at least works on IBM blue brain, is to import modules from a ramdisk. It seems to be sufficient to make sure whatever is serving the shared disk can deal with the 100k client DDoS. Sturla From mal at egenix.com Thu Jun 25 12:00:34 2015 From: mal at egenix.com (M.-A. Lemburg) Date: Thu, 25 Jun 2015 12:00:34 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <1121002233456917217.179423sturla.molden-gmail.com@news.gmane.org> References: <20150625045904.GA844@trent.me> <1121002233456917217.179423sturla.molden-gmail.com@news.gmane.org> Message-ID: <558BD142.5040700@egenix.com> On 25.06.2015 11:35, Sturla Molden wrote: > Nathaniel Smith wrote: > >> Continuing my vague and uninformed impressions, I suspect that this >> would actually be relatively easy to fix by hooking the import system >> to do something more intelligent, like nominate one node as the leader >> and have it do the file lookups and then tell everyone else what it >> found (via the existing message-passing systems). > > There are two known solutions. One is basically what you describe. The > other, which at least works on IBM blue brain, is to import modules from a > ramdisk. It seems to be sufficient to make sure whatever is serving the > shared disk can deal with the 100k client DDoS. Another way to solve this problem may be to use our eGenix PyRun which embeds modules right in the binary. As a result, all reading is done from the mmap'ed binary and automatically shared between processes by the OS: http://www.egenix.com/products/python/PyRun/ I don't know whether this actually works on an IBM Blue Brain with 100k clients - we are not fortunate enough to have access to one of those machines :-) Note: Even though the data reading is shared, the resulting code and modules objects are, of course, not shared, so you still have the overhead of using up memory for this, unless you init your process cluster using fork() after you've imported all necessary modules (then you benefit from the copy-on-write provided by the OS - code objects usually don't change after they have been created). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 25 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-06-25: Released mxODBC 3.3.3 ... http://egenix.com/go79 2015-06-16: Released eGenix pyOpenSSL 0.13.10 ... http://egenix.com/go78 2015-07-20: EuroPython 2015, Bilbao, Spain ... 25 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Thu Jun 25 14:56:59 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Jun 2015 22:56:59 +1000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150624143808.29844019@x230> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol> <20150624143808.29844019@x230> Message-ID: On 24 June 2015 at 21:38, Paul Sokolovsky wrote: > On Wed, 24 Jun 2015 13:03:38 +0200 > Antoine Pitrou wrote: >> Don't you have an additional namespace for micropython-specific >> features? > > I treat it as a good sign that it's ~8th message in the thread and it's > only the first time we get a hint that we should get out with our stuff > into a separate namespace ;-). We hadn't previously gotten to the fact that part of your motivation was helping folks learn the intricacies of low level fixed width time measurement, though. That's actually a really cool idea - HC11 assembly programming and TI C5420 DSP programming are still two of my favourite things I've ever done, and it would be nice if folks could more easily start exploring the mindset of the embedded microprocessor world without having to first deal with the incidental complexity of emulators or actual embedded hardware (even something like programming an Arduino directly is more hassle than remote controlling one from a Raspberry Pi or PC). Unfortunately, I can't think of a good alternative name that isn't ambiguous at the CPython layer - embedded CPython is very different from an embedded microprocessor, utime is taken, and microtime is confusable with microseconds. I'm tempted to suggest calling it "qtime", and using TI's Q notation to denote the formats of numbers: https://en.wikipedia.org/wiki/Q_%28number_format%29 That would conflict with your notion of making the APIs agnostic as to the exact bitwidth used, though, as well as with the meaning of the "q" prefix in qmath: https://pypi.python.org/pypi/qmath Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 25 15:24:54 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 25 Jun 2015 23:24:54 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <558A97FB.9070908@egenix.com> References: <558A97FB.9070908@egenix.com> Message-ID: On 24 June 2015 at 21:43, M.-A. Lemburg wrote: > > Note that extension modules often interface to other C libraries > which typically use some setup logic that is not thread safe, > but is used to initialize the other thread safe parts. E.g. > setting up locks and shared memory for all threads to > use is a typical scenario you find in such libs. > > A requirement to be able to import modules multiple times > would pretty much kill the idea for those modules. Yep, that's the reason earlier versions of PEP 489 included the notion of "singleton modules". We ended up deciding to back that out for the time being, and instead leave those modules using the existing single phase initialisation model. > That said, I don't think this is really needed. Modules > would only have to be made aware that there is a global > first time setup phase and a later shutdown/reinit phase. > > As a result, the module DLL would load only once, but then > use the new module setup logic to initialize its own state > multiple times. Aye, buying more time to consider alternative designs was the reason we dropped the "singleton module" idea from multi-phase initialisation until 3.6 at the earliest. I think your idea here has potential - it should just require a new Py_mod_setup slot identifier, and a bit of additional record keeping to track which modules had already had their setup slots invoked. (It's conceivable there could also be a process-wide Py_mod_teardown slot, but that gets messy in the embedded interpreter case where we might have multiple Py_Initialize/Py_Finalize cycles) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 25 16:08:07 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Jun 2015 00:08:07 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 25 June 2015 at 02:28, Sturla Molden wrote: > On 24/06/15 07:01, Eric Snow wrote: > >> Well, perception is 9/10ths of the law. :) If the multi-core problem >> is already solved in Python then why does it fail in the court of >> public opinion. The perception that Python lacks a good multi-core >> story is real, leads organizations away from Python, and will not >> improve without concrete changes. > > > I think it is a combination of FUD and the lack of fork() on Windows. There > is a lot of utterly wrong information about CPython and its GIL. > > The reality is that Python is used on even the largest supercomputers. The > scalability problem that is seen on those systems is not the GIL, but the > module import. If we have 1000 CPython processes importing modules like > NumPy simultaneously, they will do a "denial of service attack" on the file > system. This happens when the module importer generates a huge number of > failed open() calls while trying to locate the module files. Slight tangent, but folks hitting this issue on 2.7 may want to investigate Eric's importlib2: https://pypi.python.org/pypi/importlib2 It switches from stat-based searching for files to the Python 3.3+ model of directory listing based searches, which can (anecdotally) lead to a couple of orders of magnitude of improvement in startup for code loading modules from NFS mounts. > And while CPython is being used for massive parallel computing to e.g. model > the global climate system, there is this FUD that CPython does not even > scale up on a laptop with a single multicore CPU. I don't know where it is > coming from, but it is more FUD than truth. Like a lot of things in the vast sprawling Python ecosystem, I think there are aspects of this that are a discoverabiilty problem moreso than a capability problem. When you're first experimenting with parallel execution, a lot of the time folks start with computational problems like executing multiple factorials at once. That's trivial to do across multiple cores even with a threading model like JavaScript's worker threads, but can't be done in CPython without reaching for the multiprocessing module. This is the one place where I'll concede that folks learning to program on Windows or the JVM and hence getting the idea that "creating threads is fast, creating processes is slow" causes problems: folks playing this kind of thing are far more likely to go "import threading" than they are "import multiprocessing" (and likewise for the ThreadPoolExecutor vs the ProcessPoolExecutor if using concurrent.futures), and their reaction when it doesn't work is far more likely to be "Python can't do this" than it is "I need to do this differently in Python from the way I do it in C/C++/Java/JavaScript". > The main answers to FUD about the GIL and Python in scientific computing are > these: It generally isn't scientific programmers I personally hit problems with (although we have to allow for the fact many of the scientists I know I met *because* they're Pythonistas). For that use case, there's not only HPC to point to, but a number of papers that talking about Cython and Numba in the same breath as C, C++ and FORTRAN, which is pretty spectacular company to be in when it comes to numerical computation. Being the fourth language Nvidia supported directly for CUDA doesn't hurt either. Instead, the folks that I think have a more valid complaint are the games developers, and the folks trying to use games development as an educational tool. They're not doing array based programming the way numeric programmers are (so the speed of the NumPy stack isn't any help), and they're operating on shared game state and frequently chattering back and forth between threads of control, so high overhead message passing poses a major performance problem. That does suggest to me a possible "archetypal problem" for the work Eric is looking to do here: a 2D canvas with multiple interacting circles bouncing around. We'd like each circle to have its own computational thread, but still be able to deal with the collision physics when they run into each other. We'll assume it's a teaching exercise, so "tell the GPU to do it" *isn't* the right answer (although it might be an interesting entrant in a zoo of solutions). Key performance metric: frames per second Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Thu Jun 25 16:31:47 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Jun 2015 00:31:47 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 26 June 2015 at 00:08, Nick Coghlan wrote: > That does suggest to me a possible "archetypal problem" for the work > Eric is looking to do here: a 2D canvas with multiple interacting > circles bouncing around. We'd like each circle to have its own > computational thread, but still be able to deal with the collision > physics when they run into each other. We'll assume it's a teaching > exercise, so "tell the GPU to do it" *isn't* the right answer > (although it might be an interesting entrant in a zoo of solutions). > Key performance metric: frames per second The more I think about it, the more I think this (or at least something along these lines) makes sense as the archetypal problem to solve here. 1. It avoids any temptation to consider the problem potentially IO bound, as the only IO is rendering the computational results to the screen 2. Scaling across multiple machines clearly isn't relevant, since we're already bound to a single machine due to the fact we're rendering to a local display 3. The potential for collisions between objects means it isn't an embarrassingly parallel problem where the different computational threads can entirely ignore the existence of the other threads 4. "Frames per second" is a nice simple metric that can be compared across threading, multiprocessing, PyParallel, subinterpreters, mpi4py and perhaps even the GPU (which will no doubt thump the others soundly, but the comparison may still be interesting) 5. It's a problem domain where we know Python isn't currently a popular choice, and there are valid technical reasons (including this one) for that lack of adoption 6. It's a problem domain we know folks in the educational community are interested in seeing Python get better at, as building simple visual animations is often a good way to introduce programming in general (just look at the design of Scratch) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From sturla.molden at gmail.com Thu Jun 25 17:18:10 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 17:18:10 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 25/06/15 16:08, Nick Coghlan wrote: > It generally isn't scientific programmers I personally hit problems > with (although we have to allow for the fact many of the scientists I > know I met *because* they're Pythonistas). For that use case, there's > not only HPC to point to, but a number of papers that talking about > Cython and Numba in the same breath as C, C++ and FORTRAN, which is > pretty spectacular company to be in when it comes to numerical > computation. Cython can sometimes give the same performance as C or Fortran, but as soon as you start to use classes in the Cython code you run into GIL issues. It is not that the GIL is a problem per se, but because Cython compiles to C, the GIL is not released until the Cython function returns. That is, unless you manually release it inside Cython. This e.g. means that the interpreter might be locked for longer durations, and if you have a GUI it becomes unresponsive. The GIL is more painful in Cython than in Python. Personally I often end up writing a mix of Cython and C or C++. Numba is impressive but still a bit immature. It is an LLVM based JIT compiler for CPython that for simple computational tasks can give performance similar to C. It can also run Python code on Nvidia GPUs. Numba is becoming what the dead swallow should have been. > Instead, the folks that I think have a more valid complaint are the > games developers, and the folks trying to use games development as an > educational tool. I have not developed games myself, but for computer graphics with OpenGL there is certainly no reason to complain. NumPy arrays are great for storing vertex and texture data. OpenGL with NumPy is just as fast as OpenGL with C arrays. GLSL shaders are just plain text, Python is great for that. Cython and Numba are both great if you call glVertex* functions the old way, doing this as fast as C. Display lists are also equally fast from Python and C. But if you start to call glVertex* multiple times from a Python loop, then you're screwed. > That does suggest to me a possible "archetypal problem" for the work > Eric is looking to do here: a 2D canvas with multiple interacting > circles bouncing around. We'd like each circle to have its own > computational thread, but still be able to deal with the collision > physics when they run into each other. There are people doing Monte Carlo simulations with thousands or millions of particles, but not with one thread per particle. :-) Sturla From sturla.molden at gmail.com Thu Jun 25 17:25:41 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 25 Jun 2015 17:25:41 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 25/06/15 16:31, Nick Coghlan wrote: > 3. The potential for collisions between objects means it isn't an > embarrassingly parallel problem where the different computational > threads can entirely ignore the existence of the other threads Well, you can have a loop that updates all particles, e.g. by calling a coroutine associated with each particle, and then this loop is an embarrassingly parallel problem. You don't need to associate each particle with its own thread. It is bad to teach students to use one thread per particle anyway. Suddenly they write a system that have thousands of threads. Sturla From random832 at fastmail.us Thu Jun 25 20:06:38 2015 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 25 Jun 2015 14:06:38 -0400 Subject: [Python-ideas] millisecond and microsecond times without floats Message-ID: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote: > > > Hello from MicroPython, a lean Python implementation > scaling down to run even on microcontrollers > (https://github.com/micropython/micropython). > > Our target hardware base oftentimes lacks floating point support, and > using software emulation is expensive. So, we would like to have > versions of some timing functions, taking/returning millisecond and/or > microsecond values as integers. What about having a fixed-point decimal numeric type to be used for this purpose? Allowing time (and stat, and the relevant functions of the datetime module) to return any real numeric type rather than being required to use float would be a useful extension. From stefan_ml at behnel.de Thu Jun 25 21:00:47 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Thu, 25 Jun 2015 21:00:47 +0200 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: <20150621114846.06bc8dc8@fsol> Message-ID: Eric Snow schrieb am 24.06.2015 um 06:15: > On Sun, Jun 21, 2015 at 4:40 AM, Stefan Behnel wrote: >> If objects can make it explicit that they support sharing (and preferably >> are allowed to implement the exact details themselves), I'm sure we'll find >> ways to share NumPy arrays across subinterpreters. That feature alone tends >> to be a quick way to make a lot of people happy. > > Are you thinking of something along the lines of a dunder method (e.g. > __reduce__)? Sure. Should not be the first problem to tackle here, but dunder methods would be the obvious way to interact with whatever "share/move/copy between subinterpreters" protocol there will be. Stefan From trent at snakebite.org Thu Jun 25 09:01:19 2015 From: trent at snakebite.org (Trent Nelson) Date: Thu, 25 Jun 2015 03:01:19 -0400 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <20150625070119.GB15018@snakebite.org> On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden wrote: > On 24/06/15 07:01, Eric Snow wrote: > > >In return, my question is, what is the level of effort to get fork+IPC > >to do what we want vs. subinterpreters? Note that we need to > >accommodate Windows as more than an afterthought > > Windows is really the problem. The absence of fork() is especially hurtful > for an interpreted language like Python, in my opinion. UNIX is really the problem. The absence of tiered interrupt request levels, memory descriptor lists, I/O request packets (Irps), thread agnostic I/O, non-paged kernel memory, non-overcommitted memory management, universal page/buffer cache, better device driver architecture and most importantly, a kernel architected around waitable events, not processes, is harmful for efficiently solving contemporary optimally with modern hardware. VMS got it right from day one. UNIX did not. :-) Trent. From wes.turner at gmail.com Thu Jun 25 23:51:48 2015 From: wes.turner at gmail.com (Wes Turner) Date: Thu, 25 Jun 2015 16:51:48 -0500 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Thu, Jun 25, 2015 at 10:25 AM, Sturla Molden wrote: > On 25/06/15 16:31, Nick Coghlan wrote: > > 3. The potential for collisions between objects means it isn't an >> embarrassingly parallel problem where the different computational >> threads can entirely ignore the existence of the other threads >> > > Well, you can have a loop that updates all particles, e.g. by calling a > coroutine associated with each particle, and then this loop is an > embarrassingly parallel problem. You don't need to associate each particle > with its own thread. > > It is bad to teach students to use one thread per particle anyway. > Suddenly they write a system that have thousands of threads. Understood that this is merely an example re: threading, but BSP seems to be the higher-level algorithm for iterative graphs with topology: * https://en.wikipedia.org/wiki/Bulk_synchronous_parallel * http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html * https://giraph.apache.org/ * https://spark.apache.org/docs/latest/ * https://spark.apache.org/docs/latest/graphx-programming-guide.html#pregel-api (BSP) * https://spark.apache.org/news/spark-wins-daytona-gray-sort-100tb-benchmark.html * https://spark.apache.org/docs/latest/api/python/ (no graphx BSP yet, unfortunately) * https://github.com/xslogic/phoebus (Erlang, HDFS, Thrift) * https://github.com/mnielsen/Pregel/blob/master/pregel.py (Python) Intra-machine optimization could also be useful. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jun 26 00:11:40 2015 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 25 Jun 2015 15:11:40 -0700 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: <558C7C9C.8060606@stoneleaf.us> On 06/25/2015 08:25 AM, Sturla Molden wrote: > On 25/06/15 16:31, Nick Coghlan wrote: > >> 3. The potential for collisions between objects means it isn't an >> embarrassingly parallel problem where the different computational >> threads can entirely ignore the existence of the other threads > > Well, you can have a loop that updates all particles, e.g. by calling a coroutine associated with each particle, and then this loop is an embarrassingly parallel problem. You don't need to associate > each particle with its own thread. > > It is bad to teach students to use one thread per particle anyway. Suddenly they write a system that have thousands of threads. Speaking as a novice to this area, I do understand that what we learn with may not be (and usually isn't) production-ready code, I do see Nick's suggestion as being one that is easy to understand, easy to measure, and good for piquing interest. At least, I'm now interested. :) (look ma! bowling for circles!) -- ~Ethan~ From ncoghlan at gmail.com Fri Jun 26 10:00:44 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Jun 2015 18:00:44 +1000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> References: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> Message-ID: On 26 June 2015 at 04:06, wrote: > On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote: >> >> >> Hello from MicroPython, a lean Python implementation >> scaling down to run even on microcontrollers >> (https://github.com/micropython/micropython). >> >> Our target hardware base oftentimes lacks floating point support, and >> using software emulation is expensive. So, we would like to have >> versions of some timing functions, taking/returning millisecond and/or >> microsecond values as integers. > > What about having a fixed-point decimal numeric type to be used for this > purpose? > > Allowing time (and stat, and the relevant functions of the datetime > module) to return any real numeric type rather than being required to > use float would be a useful extension. It isn't the data type that's the problem per se, it's the additional abstraction layers - the time module assumes it's dealing with operating system provided timing functionality, rather than accessing timer hardware directly. Folks tend to think that the os and time modules are low level, but there's a wonderful saying that asks "How do you tell the difference between a software developer and a computer systems engineer?" Answer: Software developer: "In a *low* level language like C..." Computer systems engineer: "In a *high* level language like C..." Paul, what do you think of the idea of trying to come up with a "hwclock" module for MicroPython that aims to expose very, very low level timing functionality as you describe, and reserving that name for independent distribution on PyPI? Such a module could even eventually grow a plugin system to provide access to various real time clock modules in addition to the basic counter based clocks you're interested in right now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jun 26 13:12:24 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Jun 2015 21:12:24 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On 26 Jun 2015 01:27, "Sturla Molden" wrote: > > On 25/06/15 16:31, Nick Coghlan wrote: > >> 3. The potential for collisions between objects means it isn't an >> embarrassingly parallel problem where the different computational >> threads can entirely ignore the existence of the other threads > > > Well, you can have a loop that updates all particles, e.g. by calling a coroutine associated with each particle, and then this loop is an embarrassingly parallel problem. You don't need to associate each particle with its own thread. > > It is bad to teach students to use one thread per particle anyway. Suddenly they write a system that have thousands of threads. And when they hit that scaling limit is when they should need to learn why this simple approach doesn't scale very well, just as purely procedural programming doesn't handle increasing structural complexity and just as the "c10m" problem (like the "c10k" problem before it) is teaching our industry as a whole some important lessons about scalable hardware and software design: http://c10m.robertgraham.com/p/manifesto.html There are limits to the degree that education can be front loaded before all the pre-emptive "you'll understand why this is important later" concerns become a barrier to learning the fundamentals, rather than a useful aid. Sometimes folks really do need to encounter a problem themselves in order to appreciate the value of the more complex solutions that make it possible to get past those barriers. Cheers, Nick. > > > > Sturla > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jun 26 13:20:10 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Jun 2015 21:20:10 +1000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: <20150625070119.GB15018@snakebite.org> References: <20150625070119.GB15018@snakebite.org> Message-ID: On 26 Jun 2015 05:37, "Trent Nelson" wrote: > > On Wed, Jun 24, 2015 at 05:26:59PM +0200, Sturla Molden wrote: > > On 24/06/15 07:01, Eric Snow wrote: > > > > >In return, my question is, what is the level of effort to get fork+IPC > > >to do what we want vs. subinterpreters? Note that we need to > > >accommodate Windows as more than an afterthought > > > > Windows is really the problem. The absence of fork() is especially hurtful > > for an interpreted language like Python, in my opinion. > > UNIX is really the problem. The absence of tiered interrupt request > levels, memory descriptor lists, I/O request packets (Irps), thread > agnostic I/O, non-paged kernel memory, non-overcommitted memory > management, universal page/buffer cache, better device driver > architecture and most importantly, a kernel architected around > waitable events, not processes, is harmful for efficiently solving > contemporary optimally with modern hardware. Platforms are what they are :) As a cross-platform, but still platform dependent, language runtime, we're actually in a pretty good position to help foster some productive competition between Windows and the *nix platforms. However, we'll only be able to achieve that if we approach their wildly divergent execution and development models with respect for their demonstrated success and seek to learn from their respective strengths, rather than dismissing them over their respective weaknesses :) Cheers, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Fri Jun 26 17:35:51 2015 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 26 Jun 2015 15:35:51 +0000 Subject: [Python-ideas] solving multi-core Python In-Reply-To: References: Message-ID: On Thu, 25 Jun 2015 at 02:57 Eric Snow wrote: > On Wed, Jun 24, 2015 at 10:28 AM, Sturla Molden > wrote: > > The reality is that Python is used on even the largest supercomputers. > The > > scalability problem that is seen on those systems is not the GIL, but the > > module import. If we have 1000 CPython processes importing modules like > > NumPy simultaneously, they will do a "denial of service attack" on the > file > > system. This happens when the module importer generates a huge number of > > failed open() calls while trying to locate the module files. > > > > There is even described in a paper on how to avoid this on an IBM Blue > > Brain: "As an example, on Blue Gene P just starting up Python and > importing > > NumPy and GPAW with 32768 MPI tasks can take 45 minutes!" > > I'm curious what difference there is under Python 3.4 (or even 3.3). > Along with being almost entirely pure Python, the import system now > has some optimizations that help mitigate filesystem access > (particularly stats). > >From the HPC setup that I use there does appear to be some difference. The number of syscalls required to import numpy is significantly lower with 3.3 than 2.7 in our setup (I don't have 3.4 in there and I didn't compile either of these myself): $ strace python3.3 -c "import numpy" 2>&1 | egrep -c '(open|stat)' 1315 $ strace python2.7 -c "import numpy" 2>&1 | egrep -c '(open|stat)' 4444 It doesn't make any perceptible difference when running "time python -c 'import numpy'" on the login node. I'm not going to request 1000 cores in order to test the difference properly. Also note that profiling in these setups is often complicated by the other concurrent users of the system. -- Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Fri Jun 26 21:14:56 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 26 Jun 2015 22:14:56 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol> <20150624143808.29844019@x230> Message-ID: <20150626221456.211042b4@x230> Hello, On Thu, 25 Jun 2015 22:56:59 +1000 Nick Coghlan wrote: > On 24 June 2015 at 21:38, Paul Sokolovsky wrote: > > On Wed, 24 Jun 2015 13:03:38 +0200 > > Antoine Pitrou wrote: > >> Don't you have an additional namespace for micropython-specific > >> features? > > > > I treat it as a good sign that it's ~8th message in the thread and > > it's only the first time we get a hint that we should get out with > > our stuff into a separate namespace ;-). > > We hadn't previously gotten to the fact that part of your motivation > was helping folks learn the intricacies of low level fixed width time > measurement, though. Well, Python is nice teaching language, and a lot of embedded programming can be done with just GPIO control (i.e. being able to control 1/0 digital signal) and (properly) timed delays (this approach is know as https://en.wikipedia.org/wiki/Bit_banging). Then if keeping it simple and performant we let people do more rather than less with that. So, yes, learning/experimentation is definitely in scope of this effort. People may ask what all that has to do with very high-level language Python, but I have 2 answers: 1. Languages like JavaScript or Lua have much more limited type model, e.g. they don't even have integer numeric type per se (only float), and yet their apologists don't feel too shy to push to use them for embedded hardware programming. Certainly, Python is not less, only more suited for that with its elaborated type model and stricter typedness overall. 2. When I started with Python 1.5, I couldn't imagine there would be e.g. memoryview's. And yet they're there. So, Python (and people behind it) do care about efficiency, so it shouldn't come as surprise special-purpose Python implementation cares about efficiency in its niche either. > That's actually a really cool idea - HC11 assembly programming and TI > C5420 DSP programming are still two of my favourite things I've ever > done, and it would be nice if folks could more easily start exploring > the mindset of the embedded microprocessor world without having to > first deal with the incidental complexity of emulators or actual > embedded hardware (even something like programming an Arduino directly > is more hassle than remote controlling one from a Raspberry Pi or PC). Thanks, and yes, that's the idea behind MicroPython - that people new embedded programming could starter easier, while having chance to learn really cool language, and be able to go into low-level details and optimize (in my list, Python is as friendly to that as VHLL may be). And yet another idea to make MicroPython friendly to people who know Python and wanted to play with embedded. Making it play well for these 2 user groups isn't exactly easy, but we'd like to try. > Unfortunately, I can't think of a good alternative name that isn't > ambiguous at the CPython layer - embedded CPython is very different > from an embedded microprocessor, utime is taken, and microtime is > confusable with microseconds. You mean POSIX utime() function, right? How we have namespacing structured currently is that we have "u" prefix for all important buildin modules, e.g. uos, utime, etc. They contain bare minimum, and then fuller standard modules can be coded in Python. So, formally speaking, on our side it will go into separate namespace anyway. It's just we treat "utime" just as an alias for "time", and wouldn't like to put there something which couldn't be with clean conscience submitted as a PEP (in some distant future). > I'm tempted to suggest calling it "qtime", and using TI's Q notation > to denote the formats of numbers: > https://en.wikipedia.org/wiki/Q_%28number_format%29 > > That would conflict with your notion of making the APIs agnostic as to > the exact bitwidth used, though, as well as with the meaning of the > "q" prefix in qmath: https://pypi.python.org/pypi/qmath Yes, exact fixed-point nature of "Q" numbers doesn't help here, but as a designator of a special format it's pretty close to the original idea to use "_ms" and "_us" suffixes, so I treat that as a sign that we're on the right track. I however was thinking about our exchange with Antoine, and his surprise that we don't want to use 64-bit value. I guess I nailed the issue: I selected "monotonic()" because it seemed the closest to what we need, and in my list, our stuff is still "monotonic" in a sense that it goes only forward at constant pace. It just wraps around because so is the physical nature of the underlying fixes-size counter. Apparently, such "extended" treatment of "monotonic" is confusing for people who know time.monotonic() and PEP418. So, looks like we'll need to call our stuff different, I'm going to propose ticks_ms() and ticks_us() for MicroPython (hopefully "ticks" it's a well-known embedded term, and intuitive enough for other folks, at the very least, it's better than Linux kernel's jiffies ;-) ). > Cheers, > Nick. -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Fri Jun 26 21:47:42 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 26 Jun 2015 22:47:42 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: References: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> Message-ID: <20150626224742.7954cc0b@x230> Hello, On Fri, 26 Jun 2015 18:00:44 +1000 Nick Coghlan wrote: > On 26 June 2015 at 04:06, wrote: > > On Mon, Jun 22, 2015, at 19:15, Paul Sokolovsky wrote: > >> > >> > >> Hello from MicroPython, a lean Python implementation > >> scaling down to run even on microcontrollers > >> (https://github.com/micropython/micropython). > >> > >> Our target hardware base oftentimes lacks floating point support, > >> and using software emulation is expensive. So, we would like to > >> have versions of some timing functions, taking/returning > >> millisecond and/or microsecond values as integers. > > > > What about having a fixed-point decimal numeric type to be used for > > this purpose? > > > > Allowing time (and stat, and the relevant functions of the datetime > > module) to return any real numeric type rather than being required > > to use float would be a useful extension. > > It isn't the data type that's the problem per se, it's the additional > abstraction layers - the time module assumes it's dealing with Just to clarify, I tried to pose problem exactly as capturing right level of abstraction, but implementation-wise, data type is important for us (MicroPython) too. Specifically small integer is different from most other types is that it's value type, not reference type, and so it doesn't require memory allocation (will never trigger garbage collection => no unpredictable pauses), and faster to work with (uPy has ahead-of-type machine code compiler -> operations on word-sized values approach (unoptimized) C performance). But those are implementation details hidden by formulation of the original task - we need integer-based time (Python has integers, so no problems with that), and that time may and will wrap around at implementation-specific intervals (so a particular implementation may choose to represent it with efficient "small integer" type if it have one). Note that I also don't try to bloat the problem space and say "Guys, why don't we have unsigned integers in Python?" or "Let's have a generic builtin modular arithmetics module". None of those needed here. > operating system provided timing functionality, rather than accessing > timer hardware directly. Folks tend to think that the os and time > modules are low level, but there's a wonderful saying that asks "How > do you tell the difference between a software developer and a computer > systems engineer?" Answer: > > Software developer: "In a *low* level language like C..." > Computer systems engineer: "In a *high* level language like C..." > > Paul, what do you think of the idea of trying to come up with a > "hwclock" module for MicroPython that aims to expose very, very low Well, so on MicroPython side, having extra modules is expensive (defining a module costs 100+ bytes, OMG! ;-)) That's why we have catch-all "pyb" module so far, and see ways to put sensible extra stuff into existing modules. So, my concern is function, not module, names and sensible semantics of those functions. To come up with general-purpose "hwclock" module, there would need to be bigger cooperation from various parties and stakeholders. Neither myself nor other MicroPython developers can lead that effort, unfortunately. But it's my hope that if someone starts that effort, they will grep Python lists first for prior art, and maybe fall into arguments presented here, and select compatible API, then for us, compatibility will be easy: --- hwclock.py --- from utime import * ------------------ And even if someone selects other API, we'll know that ours is the most efficient building blocks we can have on our side, and can implement compatibility layer in their terms. > level timing functionality as you describe, and reserving that name > for independent distribution on PyPI? Such a module could even > eventually grow a plugin system to provide access to various real time > clock modules in addition to the basic counter based clocks you're > interested in right now. > > Cheers, > Nick. [] -- Best regards, Paul mailto:pmiscml at gmail.com From pmiscml at gmail.com Fri Jun 26 22:48:30 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Fri, 26 Jun 2015 23:48:30 +0300 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> References: <1435255598.379244.307754081.065343C2@webmail.messagingengine.com> Message-ID: <20150626234830.02f74c60@x230> Hello, On Thu, 25 Jun 2015 14:06:38 -0400 random832 at fastmail.us wrote: > > Hello from MicroPython, a lean Python implementation > > scaling down to run even on microcontrollers > > (https://github.com/micropython/micropython). > > > > Our target hardware base oftentimes lacks floating point support, > > and using software emulation is expensive. So, we would like to have > > versions of some timing functions, taking/returning millisecond > > and/or microsecond values as integers. > > What about having a fixed-point decimal numeric type to be used for > this purpose? The problem is actually not even in floating point per se. For example, reference hardware board for MicroPython is built on a relatively powerful microcontroller which has hardware floating point. But only single-precision floating point (IEEE 32-bit). And you won't read it at https://docs.python.org/3/library/time.html or PEP418 - it's just implied - that you don't just need floating point for those functions, it should be floating point of specific mantissa requirements. Let's count: time.time() returns the time in seconds since the epoch (1 Jan 1970) as a floating point number. That means that mantissa already should be at least 32 bits. Single-precision FP has 23 mantissa bits, so it's already ruled out from suitable representation of time.time() value. Then, we need 10 extra mantissa bits for each SI decimal subunit (2^10 == 1024). Double-precision FP has 52 mantissa bits. We can store millisecond precision there, we can store microsecond precision there. But - oops - going further, we hit the same problem as MicroPython hit right away: it's not possible to represent the same calendar data range as Unix time() call, but with higher resolution than microsecond. Of course, PEP418 provides one direction to work that around - by using other epochs than Jan 1, 1970 with all these new functions like monotonic() (which is specified as "reference point of the returned value is undefined"). It still implicitly assumes there's enough bits so wrap-arounds can be ignored. My proposal works around issue in another direction - by embracing the fact that any fixed-size counter wraps around and preparing to deal with that. All that in turn enables to use just integer values for representing times. And why it's useful (implementation-wise) to be able to use integer values, I elaborated in another recent mail. [] -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Sat Jun 27 04:27:55 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Jun 2015 12:27:55 +1000 Subject: [Python-ideas] millisecond and microsecond times without floats In-Reply-To: <20150626221456.211042b4@x230> References: <20150623021530.74ce1ebe@x230> <20150623232500.45efccdf@x230> <20150624091955.2efef148@fsol> <20150624131349.01ee7634@x230> <20150624124010.24cd3613@fsol> <20150624135908.4f85d415@x230> <20150624130338.0b222ca3@fsol> <20150624143808.29844019@x230> <20150626221456.211042b4@x230> Message-ID: On 27 June 2015 at 05:14, Paul Sokolovsky wrote: > I however was thinking about our exchange with Antoine, and his > surprise that we don't want to use 64-bit value. I guess I nailed the > issue: I selected "monotonic()" because it seemed the closest to what we > need, and in my list, our stuff is still "monotonic" in a sense that it > goes only forward at constant pace. It just wraps around because so is > the physical nature of the underlying fixes-size counter. Apparently, > such "extended" treatment of "monotonic" is confusing for people who > know time.monotonic() and PEP418. > > So, looks like we'll need to call our stuff different, I'm going to > propose ticks_ms() and ticks_us() for MicroPython (hopefully "ticks" > it's a well-known embedded term, and intuitive enough for other folks, > at the very least, it's better than Linux kernel's jiffies ;-) ). I like it - as you say, ticks is already a common term for this, and it's clearly distinct from anything else in the time module if we ever decide to standardise it. It also doesn't hurt that "tick" is the term both LabVIEW (http://zone.ni.com/reference/en-XX/help/371361J-01/glang/tick_count_ms/) and Simulink (http://au.mathworks.com/help/stateflow/examples/using-absolute-time-temporal-logic.html) use for the concept. As a terminology/API suggestion, you may want to go with: tick_ms() - get the current tick with 1 millisecond between ticks tick_overflow_ms() - get the overflow period of the millisecond tick counter ticks_elapsed_ms(start, end) - get the number of millisecond ticks elapsed between two points in time (assuming at most one tick counter overflow between the start and end of the measurement) tick_us() - get the current tick with 1 microsecond between ticks tick_overflow_us() - get the overflow period of the microsecond tick counter ticks_elapsed_us(start, end) - get the number of microsecond ticks elapsed between two points in time (assuming at most one tick counter overflow between the start and end of the measurement) The problem I see with "ticks_ms()" and "ticks_us()" specifically is that the plural in the name implies "ticks elapsed since a given reference time". Since the tick counter can wrap around, there's no reference time - the current tick count is merely an opaque token allowing you to measure elapsed times up to the duration of the tick counter's overflow period. I also don't think you want to assume the overflow periods of the millisecond timer and the microsecond timer are going to be the same, hence the duplication of the other APIs as well. Something else you may want to consider is the idea of a "system tick", distinct from the fixed duration millisecond and microsecond ticks: tick() - get the current tick in system ticks tick_overflow() - get the overflow period of the system tick counter ticks_elapsed(start, end) - get the number of system ticks elapsed between two points in time (assuming at most one tick counter overflow between the start and end of the measurement) tick_duration() - get the system tick duration in seconds as a floating point number On platforms without a real time clock, the millisecond and microsecond ticks may then be approximations based on the system tick counter - that's actually the origin of my suggestion to expose completely separate APIs for the millisecond and microsecond versions, as if those are derived by dividing a fast system tick counter appropriately, they may wrap more frequently than every 2**32 or 2**64 ticks. Depending on use case, there may also be value in exposing the potential degree of jitter in the *_ms() and *_us() tick counters. I'm not sure if that would be best expressed in absolute or relative terms, though, so I'd suggest leaving that aspect undefined unless/until you have a specific use case in mind. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From drekin at gmail.com Sun Jun 28 12:02:01 2015 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Sun, 28 Jun 2015 12:02:01 +0200 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: Is there a way for a producer to say that there will be no more items put, so consumers get something like StopIteration when there are no more items left afterwards? There is also the problem that one cannot easily feed a queue, asynchronous generator, or any asynchronous iterator to a simple synchronous consumer like sum() or list() or "".join(). It would be nice if there was a way to wrap them to asynchronous ones when needed ? something like (async sum)(asynchronously_produced_numbers()). On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders wrote: > In my experience, it's much easier to use asyncio Queues for this. > Instead of yielding, push to a queue. The consumer can then use "await > queue.get()". > > I think the semantics of the generator become too complicated otherwise, > or maybe impossible. > Maybe have a look at this article: > http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return > > Jonathan > > > > > 2015-06-24 12:13 GMT+02:00 Andrew Svetlov : > >> Your idea is clean and maybe we will allow `yield` inside `async def` >> in Python 3.6. >> For PEP 492 it was too big change. >> >> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? wrote: >> > Hello, >> > >> > I had a generator producing pairs of values and wanted to feed all the >> first >> > members of the pairs to one consumer and all the second members to >> another >> > consumer. For example: >> > >> > def pairs(): >> > for i in range(4): >> > yield (i, i ** 2) >> > >> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) >> > >> > The point is I wanted the consumers to be suspended and resumed in a >> > coordinated manner: The first producer is invoked, it wants the first >> > element. The coordinator implemented by biconsumer function invokes >> pairs(), >> > gets the first pair and yields its first member to the first consumer. >> Then >> > it wants the next element, but now it's the second consumer's turn, so >> the >> > first consumer is suspended and the second consumer is invoked and fed >> with >> > the second member of the first pair. Then the second producer wants the >> next >> > element, but it's the first consumer's turn? and so on. In the end, >> when the >> > stream of pairs is exhausted, StopIteration is thrown to both consumers >> and >> > their results are combined. >> > >> > The cooperative asynchronous nature of the execution reminded me >> asyncio and >> > coroutines, so I thought that biconsumer may be implemented using them. >> > However, it seems that it is imposible to write an "asynchronous >> generator" >> > since the "yielding pipe" is already used for the communication with the >> > scheduler. And even if it was possible to make an asynchronous >> generator, it >> > is not clear how to feed it to a synchronous consumer like sum() or >> list() >> > function. >> > >> > With PEP 492 the concepts of generators and coroutines were separated, >> so >> > asyncronous generators may be possible in theory. An ordinary function >> has >> > just the returning pipe ? for returning the result to the caller. A >> > generator has also a yielding pipe ? used for yielding the values during >> > iteration, and its return pipe is used to finish the iteration. A native >> > coroutine has a returning pipe ? to return the result to a caller just >> like >> > an ordinary function, and also an async pipe ? used for communication >> with a >> > scheduler and execution suspension. An asynchronous generator would just >> > have both yieling pipe and async pipe. >> > >> > So my question is: was the code like the following considered? Does it >> make >> > sense? Or are there not enough uses cases for such code? I found only a >> > short mention in >> > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so >> possibly >> > these coroutine-generators are the same idea. >> > >> > async def f(): >> > number_string = await fetch_data() >> > for n in number_string.split(): >> > yield int(n) >> > >> > async def g(): >> > result = async/await? sum(f()) >> > return result >> > >> > async def h(): >> > the_sum = await g() >> > >> > As for explanation about the execution of h() by an event loop: h is a >> > native coroutine called by the event loop, having both returning pipe >> and >> > async pipe. The returning pipe leads to the end of the task, the async >> pipe >> > is used for cummunication with the scheduler. Then, g() is called >> > asynchronously ? using the await keyword means the the access to the >> async >> > pipe is given to the callee. Then g() invokes the asyncronous generator >> f() >> > and gives it the access to its async pipe, so when f() is yielding >> values to >> > sum, it can also yield a future to the scheduler via the async pipe and >> > suspend the whole task. >> > >> > Regards, Adam Barto? >> > >> > >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> >> >> >> -- >> Thanks, >> Andrew Svetlov >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Sun Jun 28 12:07:32 2015 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sun, 28 Jun 2015 13:07:32 +0300 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: I afraid the last will never possible -- you cannot push async coroutines into synchronous convention call. Your example should be converted into `await async_sum(asynchronously_produced_numbers())` which is possible right now. (asynchronously_produced_numbers should be *iterator* with __aiter__/__anext__ methods, not generator with yield expressions inside. On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? wrote: > Is there a way for a producer to say that there will be no more items put, > so consumers get something like StopIteration when there are no more items > left afterwards? > > There is also the problem that one cannot easily feed a queue, asynchronous > generator, or any asynchronous iterator to a simple synchronous consumer > like sum() or list() or "".join(). It would be nice if there was a way to > wrap them to asynchronous ones when needed ? something like (async > sum)(asynchronously_produced_numbers()). > > > > On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders > wrote: >> >> In my experience, it's much easier to use asyncio Queues for this. >> Instead of yielding, push to a queue. The consumer can then use "await >> queue.get()". >> >> I think the semantics of the generator become too complicated otherwise, >> or maybe impossible. >> Maybe have a look at this article: >> http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return >> >> Jonathan >> >> >> >> >> 2015-06-24 12:13 GMT+02:00 Andrew Svetlov : >>> >>> Your idea is clean and maybe we will allow `yield` inside `async def` >>> in Python 3.6. >>> For PEP 492 it was too big change. >>> >>> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? wrote: >>> > Hello, >>> > >>> > I had a generator producing pairs of values and wanted to feed all the >>> > first >>> > members of the pairs to one consumer and all the second members to >>> > another >>> > consumer. For example: >>> > >>> > def pairs(): >>> > for i in range(4): >>> > yield (i, i ** 2) >>> > >>> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) >>> > >>> > The point is I wanted the consumers to be suspended and resumed in a >>> > coordinated manner: The first producer is invoked, it wants the first >>> > element. The coordinator implemented by biconsumer function invokes >>> > pairs(), >>> > gets the first pair and yields its first member to the first consumer. >>> > Then >>> > it wants the next element, but now it's the second consumer's turn, so >>> > the >>> > first consumer is suspended and the second consumer is invoked and fed >>> > with >>> > the second member of the first pair. Then the second producer wants the >>> > next >>> > element, but it's the first consumer's turn? and so on. In the end, >>> > when the >>> > stream of pairs is exhausted, StopIteration is thrown to both consumers >>> > and >>> > their results are combined. >>> > >>> > The cooperative asynchronous nature of the execution reminded me >>> > asyncio and >>> > coroutines, so I thought that biconsumer may be implemented using them. >>> > However, it seems that it is imposible to write an "asynchronous >>> > generator" >>> > since the "yielding pipe" is already used for the communication with >>> > the >>> > scheduler. And even if it was possible to make an asynchronous >>> > generator, it >>> > is not clear how to feed it to a synchronous consumer like sum() or >>> > list() >>> > function. >>> > >>> > With PEP 492 the concepts of generators and coroutines were separated, >>> > so >>> > asyncronous generators may be possible in theory. An ordinary function >>> > has >>> > just the returning pipe ? for returning the result to the caller. A >>> > generator has also a yielding pipe ? used for yielding the values >>> > during >>> > iteration, and its return pipe is used to finish the iteration. A >>> > native >>> > coroutine has a returning pipe ? to return the result to a caller just >>> > like >>> > an ordinary function, and also an async pipe ? used for communication >>> > with a >>> > scheduler and execution suspension. An asynchronous generator would >>> > just >>> > have both yieling pipe and async pipe. >>> > >>> > So my question is: was the code like the following considered? Does it >>> > make >>> > sense? Or are there not enough uses cases for such code? I found only a >>> > short mention in >>> > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so >>> > possibly >>> > these coroutine-generators are the same idea. >>> > >>> > async def f(): >>> > number_string = await fetch_data() >>> > for n in number_string.split(): >>> > yield int(n) >>> > >>> > async def g(): >>> > result = async/await? sum(f()) >>> > return result >>> > >>> > async def h(): >>> > the_sum = await g() >>> > >>> > As for explanation about the execution of h() by an event loop: h is a >>> > native coroutine called by the event loop, having both returning pipe >>> > and >>> > async pipe. The returning pipe leads to the end of the task, the async >>> > pipe >>> > is used for cummunication with the scheduler. Then, g() is called >>> > asynchronously ? using the await keyword means the the access to the >>> > async >>> > pipe is given to the callee. Then g() invokes the asyncronous generator >>> > f() >>> > and gives it the access to its async pipe, so when f() is yielding >>> > values to >>> > sum, it can also yield a future to the scheduler via the async pipe and >>> > suspend the whole task. >>> > >>> > Regards, Adam Barto? >>> > >>> > >>> > _______________________________________________ >>> > Python-ideas mailing list >>> > Python-ideas at python.org >>> > https://mail.python.org/mailman/listinfo/python-ideas >>> > Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >>> >>> -- >>> Thanks, >>> Andrew Svetlov >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > -- Thanks, Andrew Svetlov From drekin at gmail.com Sun Jun 28 12:30:20 2015 From: drekin at gmail.com (=?UTF-8?B?QWRhbSBCYXJ0b8Wh?=) Date: Sun, 28 Jun 2015 12:30:20 +0200 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: I understand that it's impossible today, but I thought that if asynchronous generators were going to be added, some kind of generalized generator mechanism allowing yielding to multiple different places would be needed anyway. So in theory no special change to synchronous consumers would be needed ? when the asynschronous generator object is created, it gets a link to the scheduler from the caller, then it's given as an argument to sum(); when sum wants next item it calls next() and the asynchronous generator can either yield the next value to sum or it can yield a future to the scheduler and suspend execution of whole task. But since it's a good idea to be explicit and mark each asyncronous call, some wrapper like (async sum) would be used. On Sun, Jun 28, 2015 at 12:07 PM, Andrew Svetlov wrote: > I afraid the last will never possible -- you cannot push async > coroutines into synchronous convention call. > Your example should be converted into `await > async_sum(asynchronously_produced_numbers())` which is possible right > now. (asynchronously_produced_numbers should be *iterator* with > __aiter__/__anext__ methods, not generator with yield expressions > inside. > > On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? wrote: > > Is there a way for a producer to say that there will be no more items > put, > > so consumers get something like StopIteration when there are no more > items > > left afterwards? > > > > There is also the problem that one cannot easily feed a queue, > asynchronous > > generator, or any asynchronous iterator to a simple synchronous consumer > > like sum() or list() or "".join(). It would be nice if there was a way to > > wrap them to asynchronous ones when needed ? something like (async > > sum)(asynchronously_produced_numbers()). > > > > > > > > On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders > > > wrote: > >> > >> In my experience, it's much easier to use asyncio Queues for this. > >> Instead of yielding, push to a queue. The consumer can then use "await > >> queue.get()". > >> > >> I think the semantics of the generator become too complicated otherwise, > >> or maybe impossible. > >> Maybe have a look at this article: > >> http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return > >> > >> Jonathan > >> > >> > >> > >> > >> 2015-06-24 12:13 GMT+02:00 Andrew Svetlov : > >>> > >>> Your idea is clean and maybe we will allow `yield` inside `async def` > >>> in Python 3.6. > >>> For PEP 492 it was too big change. > >>> > >>> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? > wrote: > >>> > Hello, > >>> > > >>> > I had a generator producing pairs of values and wanted to feed all > the > >>> > first > >>> > members of the pairs to one consumer and all the second members to > >>> > another > >>> > consumer. For example: > >>> > > >>> > def pairs(): > >>> > for i in range(4): > >>> > yield (i, i ** 2) > >>> > > >>> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) > >>> > > >>> > The point is I wanted the consumers to be suspended and resumed in a > >>> > coordinated manner: The first producer is invoked, it wants the first > >>> > element. The coordinator implemented by biconsumer function invokes > >>> > pairs(), > >>> > gets the first pair and yields its first member to the first > consumer. > >>> > Then > >>> > it wants the next element, but now it's the second consumer's turn, > so > >>> > the > >>> > first consumer is suspended and the second consumer is invoked and > fed > >>> > with > >>> > the second member of the first pair. Then the second producer wants > the > >>> > next > >>> > element, but it's the first consumer's turn? and so on. In the end, > >>> > when the > >>> > stream of pairs is exhausted, StopIteration is thrown to both > consumers > >>> > and > >>> > their results are combined. > >>> > > >>> > The cooperative asynchronous nature of the execution reminded me > >>> > asyncio and > >>> > coroutines, so I thought that biconsumer may be implemented using > them. > >>> > However, it seems that it is imposible to write an "asynchronous > >>> > generator" > >>> > since the "yielding pipe" is already used for the communication with > >>> > the > >>> > scheduler. And even if it was possible to make an asynchronous > >>> > generator, it > >>> > is not clear how to feed it to a synchronous consumer like sum() or > >>> > list() > >>> > function. > >>> > > >>> > With PEP 492 the concepts of generators and coroutines were > separated, > >>> > so > >>> > asyncronous generators may be possible in theory. An ordinary > function > >>> > has > >>> > just the returning pipe ? for returning the result to the caller. A > >>> > generator has also a yielding pipe ? used for yielding the values > >>> > during > >>> > iteration, and its return pipe is used to finish the iteration. A > >>> > native > >>> > coroutine has a returning pipe ? to return the result to a caller > just > >>> > like > >>> > an ordinary function, and also an async pipe ? used for communication > >>> > with a > >>> > scheduler and execution suspension. An asynchronous generator would > >>> > just > >>> > have both yieling pipe and async pipe. > >>> > > >>> > So my question is: was the code like the following considered? Does > it > >>> > make > >>> > sense? Or are there not enough uses cases for such code? I found > only a > >>> > short mention in > >>> > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, so > >>> > possibly > >>> > these coroutine-generators are the same idea. > >>> > > >>> > async def f(): > >>> > number_string = await fetch_data() > >>> > for n in number_string.split(): > >>> > yield int(n) > >>> > > >>> > async def g(): > >>> > result = async/await? sum(f()) > >>> > return result > >>> > > >>> > async def h(): > >>> > the_sum = await g() > >>> > > >>> > As for explanation about the execution of h() by an event loop: h is > a > >>> > native coroutine called by the event loop, having both returning pipe > >>> > and > >>> > async pipe. The returning pipe leads to the end of the task, the > async > >>> > pipe > >>> > is used for cummunication with the scheduler. Then, g() is called > >>> > asynchronously ? using the await keyword means the the access to the > >>> > async > >>> > pipe is given to the callee. Then g() invokes the asyncronous > generator > >>> > f() > >>> > and gives it the access to its async pipe, so when f() is yielding > >>> > values to > >>> > sum, it can also yield a future to the scheduler via the async pipe > and > >>> > suspend the whole task. > >>> > > >>> > Regards, Adam Barto? > >>> > > >>> > > >>> > _______________________________________________ > >>> > Python-ideas mailing list > >>> > Python-ideas at python.org > >>> > https://mail.python.org/mailman/listinfo/python-ideas > >>> > Code of Conduct: http://python.org/psf/codeofconduct/ > >>> > >>> > >>> > >>> -- > >>> Thanks, > >>> Andrew Svetlov > >>> _______________________________________________ > >>> Python-ideas mailing list > >>> Python-ideas at python.org > >>> https://mail.python.org/mailman/listinfo/python-ideas > >>> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > >> > > > > > > -- > Thanks, > Andrew Svetlov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Jun 28 12:58:32 2015 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 28 Jun 2015 12:58:32 +0200 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: [Fixing the messed-up reply quoting order] Adam Barto? schrieb am 28.06.2015 um 12:30: > On Sun, Jun 28, 2015 at 12:07 PM, Andrew Svetlov wrote: >> On Sun, Jun 28, 2015 at 1:02 PM, Adam Barto? wrote: >>> There is also the problem that one cannot easily feed a queue, >>> asynchronous >>> generator, or any asynchronous iterator to a simple synchronous consumer >>> like sum() or list() or "".join(). It would be nice if there was a way to >>> wrap them to asynchronous ones when needed ? something like (async >>> sum)(asynchronously_produced_numbers()). >> >> I afraid the last will never possible -- you cannot push async >> coroutines into synchronous convention call. >> Your example should be converted into `await >> async_sum(asynchronously_produced_numbers())` which is possible right >> now. (asynchronously_produced_numbers should be *iterator* with >> __aiter__/__anext__ methods, not generator with yield expressions >> inside. > > I understand that it's impossible today, but I thought that if asynchronous > generators were going to be added, some kind of generalized generator > mechanism allowing yielding to multiple different places would be needed > anyway. So in theory no special change to synchronous consumers would be > needed ? when the asynschronous generator object is created, it gets a link > to the scheduler from the caller, then it's given as an argument to sum(); > when sum wants next item it calls next() and the asynchronous generator can > either yield the next value to sum or it can yield a future to the > scheduler and suspend execution of whole task. But since it's a good idea > to be explicit and mark each asyncronous call, some wrapper like (async > sum) would be used. Stackless might eventually support something like that. That being said, note that by design, the scheduler (or I/O loop, if that's what you're using) always lives *outside* of the whole asynchronous call chain, at its very end, but can otherwise be controlled by arbitrary code itself, and that is usually synchronous code. In your example, it could simply be moved between the first async function and its synchronous consumer ("sum" in your example). Doing that is entirely possible. What is not possible (unless you're using a design like Stackless) is that this scheduler controls its own controller, e.g. that it starts interrupting the execution of the synchronous code that called it. Stefan From gmludo at gmail.com Sun Jun 28 17:49:56 2015 From: gmludo at gmail.com (Ludovic Gasc) Date: Sun, 28 Jun 2015 17:49:56 +0200 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com> References: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com> Message-ID: 2015-05-25 22:08 GMT+02:00 Andrew Barnert : > On Monday, May 25, 2015 6:57 AM, Ludovic Gasc wrote: > > >2015-05-25 4:19 GMT+02:00 Steven D'Aprano : > > >>At the other extreme, there is the structlog module: > >> > >>https://structlog.readthedocs.org/en/stable/ > > > >Thank you for the link, it's an interesting project, it's like "logging" > module but on steroids, some good logging ideas inside. > > > >However, in fact, if I understand correctly, it's the same approach that > the previous recipe: Generate a log file with JSON content, use > logstash-forwarder to reparse the JSON content, to finally send the > structure to logstash, for the query part: > https://structlog.readthedocs.org/en/stable/standard-library.html#suggested-configuration > > >>How does your change compare to those? > >> > > > > > >In the use case of structlog, drop the logstash-forwarder step to > interconnect directly Python daemon with structured log daemon. > > >Even if logstash-forwarder should be efficient, why to have an additional > step to rebuild a structure you have at the beginning ? > > Sorry for the delay, I was very busy since one month. > You can't send a Python dictionary over the wire, or store a Python > dictionary in a database.You need to encode it to some transmission and/or > storage format; there's no way around that. And what's wrong with using > JSON as that format? > Maybe I should be more clear about my objective: I'm trying to build the simplest architecture for logging, based on the existing python logging features because all Python libraries use that, and with similar features that with ELK (Elasticsearch, Logstash, Kinbana). On the paper, the features ELK are very interesting for a sysadmin and, with my sysadmin hat, I'm strongly agree with that. The issue is that, based on my experience where I work, (sorry eventual ELK-lovers on this ML), but, it's very complicated to setup, to maintain and to keep scalable when you have a lot of logs: We passed a lot of time to have a working ELK, and finally we dropped that because the cost of maintenance was too important for us compare to use grep in rsyslog logs. Maybe we aren't enough smart to maintain ELK, it's possible. However, if we're not too smart to do that, certainly some people have the same issue as us. In fact, the issue shouldn't be our brains, but it was clearly a time consuming task, and we have too much directly paid-work to take care. Don't be wrong: I don't say that ELK doesn't work, only it's time consuming with a high level of logs. I'm pretty sure that a lot of people are happy with ELK, it's cool for them ;-) It's like Oracle and PostgreSQL databases: Where with Oracle you need a full-time DBA, with PostgreSQL: apt-get install postgresql With this last sentence, I'm totally caricatural, but only to show where I see an issue that should be fixed, at least for us. (FYI, in a previous professional life, I've maintained Oracle, MySQL and PostgreSQL servers for several clients, I know a little bit the subject). >From my point of view, the features in journald are enough to replace most usages of ELK, at least for us, and, contrary to ELK, journald is already installed in all latest Linux distributions, even in Debian Jessie. You have almost no maintenance cost. More importantly, when you drop logstash-forwarder, how are you intending > to get the messages to the upstream server? You don't want to make your log > calls synchronously wait for acknowledgement before returning. So you need > some kind of buffering. And just buffering in memory doesn't work: if your > service shuts down unexpectedly, you've lost the last batch of log messages > which would tell you why it went down (plus, if the network goes down > temporarily, your memory use becomes unbounded). You can of course buffer > to disk, but then you've just reintroduced the same need for some kind of > intermediate storage format you were trying to eliminate?and it doesn't > really solve the problem, because if your service shuts down, the last > messages won't get sent until it starts up again. So you could write a > separate simple store-and-forward daemon that either reads those file > buffers or listens on localhost UDP? but then you've just recreated > logstash-forwarder. > In the past, we used directly a local rsyslog to play this role on each VM, connected directly with the Python daemons via a datagram UNIX socket. See a logging config file example: https://github.com/Eyepea/cookiecutter-API-Hour/blob/master/%7B%7Bcookiecutter.app_name%7D%7D/etc/%7B%7Bcookiecutter.app_name%7D%7D/api_hour/logging.ini#L21 Now, it's journald that plays this role, also via a datagram UNIX socket. > And even if you wanted to do all that, I don't see why you couldn't do it > all with structlog. They recommend using an already-working workflow > instead of designing a different one from scratch, but it's just a > recommendation. > You're right: Don't reinvent the wheel. However, if I follow your argument in another context: instead of to create AsyncIO, Guido should integrate Twisted in Python ? As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be fancy that we migrated from Twisted to AsyncIO ;-) To me, the expression should be: "Don't reinvent the wheel, except if you can provide a more efficient wheel" Now, in the context of logging: Please let me to try another approach, maybe I'll waste my time, or maybe I'll find an interesting gold nugget, who knows before to dig ? You can think I'm trying to be only different from the "ELK" standard, and it's possible, who knows ? If I revive this thread, it isn't to troll you, but because I'm interested in by your opinion. I may found a better approach that doesn't need a CPython patch and it's more powerful. In the source code of logging package, I've found this: https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L269 BTW, this approach should have more promotion: I didn't know you can use a dict to replace text in a log message, I thought only strings. Now, instead of to use extra parameter, I use directly this feature. For the developer, instead to write this: logger.debug('Receive a create_or_update request from "%s" account', account_id) he writes this: logger.debug('Receive a create_or_update request from "%(account_id)s" account', {'request_id': request.request_id, 'account_id': account_id, 'aiohttp_request': request, 'payload': payload}) With that, you can write logs as usual in your source code, and use the handler you want. However, if you use the systemDream handler, all metadata with your log will be sent to journald: https://github.com/Eyepea/systemDream/blob/master/src/systemdream/journal/handler.py#L79 The another bonus of this approach is that you can use an element of your dict to improve your log message. With my previous approach with extra parameter, you must pass two times the values. The cherry on the cake is that extra can be used for something else. And the bonus of bonus, for the developers who already use this logging feature, they are already journald compliant without to know. I see no drawbacks of this approach, except that the developers who already use this feature: he must be consistent in the key names of the dict to be useful with journald. I'm very interested in by your feedbacks, maybe I've missed something. If anybody doesn't find an issue, I'll push this pattern also in the official Python binding of journald, systemDream is only my laboratory to experiment around systemd/journald (and secondarily, it's impossible to setup the official Python binding of systemd/journald in a pyvenv, at least to me). I'll publish also a step-by-step tutorial for the new comers on my blog. Thanks for your attention. -- Ludovic Gasc (GMLudo) http://www.gmludo.eu/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Sun Jun 28 17:52:49 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Sun, 28 Jun 2015 18:52:49 +0300 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: Message-ID: <20150628185249.61624b82@x230> Hello, On Sun, 28 Jun 2015 12:02:01 +0200 Adam Barto? wrote: > Is there a way for a producer to say that there will be no more items > put, so consumers get something like StopIteration when there are no > more items left afterwards? Sure, just designate sentinel value of your likes (StopIteration class value seems an obvious choice) and use it for that purpose. > There is also the problem that one cannot easily feed a queue, > asynchronous generator, or any asynchronous iterator to a simple > synchronous consumer like sum() or list() or "".join(). It would be > nice if there was a way to wrap them to asynchronous ones when needed > ? something like (async sum)(asynchronously_produced_numbers()). All that is easily achievable with classical Python coroutines, not with asyncio garden variety of coroutines, which lately were casted into a language level with async/await disablers: def coro1(): yield 1 yield 2 yield 3 def coro2(): yield from coro1() yield 4 yield 5 print(sum(coro2())) And back to your starter question, it's also possible - and also only with classical Python coroutines. I mentioned not just possibility, but necessity of that in my independent "reverse engineering" of how yield from works https://dl.dropboxusercontent.com/u/44884329/yield-from.pdf (point 9 there). That's simplistic presentation, and in the presence of "syscall main loop", example there would need to be: class MyValueWrapper: def __init__(self, v): self.v = v def pump(ins, outs): for chunk in gen(ins): if isinstance(chunk, MyValueWrapper): # if value we got from a coro is of # type we expect, process it yield from outs.write(chunk.v) else: # anything else is simply not for us, # re-yield it to higher levels (ultimately, mainloop) yield chunk def gen(ins): yield MyValueWrapper("") # Assume read_in_chunks() already yields MyValueWrapper objects yield from ins.read_in_chunks(1000*1000*1000) yield MyValueWrapper("") > > > On Wed, Jun 24, 2015 at 1:54 PM, Jonathan Slenders > wrote: > > > In my experience, it's much easier to use asyncio Queues for this. > > Instead of yielding, push to a queue. The consumer can then use > > "await queue.get()". > > > > I think the semantics of the generator become too complicated > > otherwise, or maybe impossible. > > Maybe have a look at this article: > > http://www.interact-sw.co.uk/iangblog/2013/11/29/async-yield-return > > > > Jonathan > > > > > > > > > > 2015-06-24 12:13 GMT+02:00 Andrew Svetlov > > : > > > >> Your idea is clean and maybe we will allow `yield` inside `async > >> def` in Python 3.6. > >> For PEP 492 it was too big change. > >> > >> On Wed, Jun 24, 2015 at 12:00 PM, Adam Barto? > >> wrote: > >> > Hello, > >> > > >> > I had a generator producing pairs of values and wanted to feed > >> > all the > >> first > >> > members of the pairs to one consumer and all the second members > >> > to > >> another > >> > consumer. For example: > >> > > >> > def pairs(): > >> > for i in range(4): > >> > yield (i, i ** 2) > >> > > >> > biconsumer(sum, list)(pairs()) -> (6, [0, 1, 4, 9]) > >> > > >> > The point is I wanted the consumers to be suspended and resumed > >> > in a coordinated manner: The first producer is invoked, it wants > >> > the first element. The coordinator implemented by biconsumer > >> > function invokes > >> pairs(), > >> > gets the first pair and yields its first member to the first > >> > consumer. > >> Then > >> > it wants the next element, but now it's the second consumer's > >> > turn, so > >> the > >> > first consumer is suspended and the second consumer is invoked > >> > and fed > >> with > >> > the second member of the first pair. Then the second producer > >> > wants the > >> next > >> > element, but it's the first consumer's turn? and so on. In the > >> > end, > >> when the > >> > stream of pairs is exhausted, StopIteration is thrown to both > >> > consumers > >> and > >> > their results are combined. > >> > > >> > The cooperative asynchronous nature of the execution reminded me > >> asyncio and > >> > coroutines, so I thought that biconsumer may be implemented > >> > using them. However, it seems that it is imposible to write an > >> > "asynchronous > >> generator" > >> > since the "yielding pipe" is already used for the communication > >> > with the scheduler. And even if it was possible to make an > >> > asynchronous > >> generator, it > >> > is not clear how to feed it to a synchronous consumer like sum() > >> > or > >> list() > >> > function. > >> > > >> > With PEP 492 the concepts of generators and coroutines were > >> > separated, > >> so > >> > asyncronous generators may be possible in theory. An ordinary > >> > function > >> has > >> > just the returning pipe ? for returning the result to the > >> > caller. A generator has also a yielding pipe ? used for yielding > >> > the values during iteration, and its return pipe is used to > >> > finish the iteration. A native coroutine has a returning pipe ? > >> > to return the result to a caller just > >> like > >> > an ordinary function, and also an async pipe ? used for > >> > communication > >> with a > >> > scheduler and execution suspension. An asynchronous generator > >> > would just have both yieling pipe and async pipe. > >> > > >> > So my question is: was the code like the following considered? > >> > Does it > >> make > >> > sense? Or are there not enough uses cases for such code? I found > >> > only a short mention in > >> > https://www.python.org/dev/peps/pep-0492/#coroutine-generators, > >> > so > >> possibly > >> > these coroutine-generators are the same idea. > >> > > >> > async def f(): > >> > number_string = await fetch_data() > >> > for n in number_string.split(): > >> > yield int(n) > >> > > >> > async def g(): > >> > result = async/await? sum(f()) > >> > return result > >> > > >> > async def h(): > >> > the_sum = await g() > >> > > >> > As for explanation about the execution of h() by an event loop: > >> > h is a native coroutine called by the event loop, having both > >> > returning pipe > >> and > >> > async pipe. The returning pipe leads to the end of the task, the > >> > async > >> pipe > >> > is used for cummunication with the scheduler. Then, g() is called > >> > asynchronously ? using the await keyword means the the access to > >> > the > >> async > >> > pipe is given to the callee. Then g() invokes the asyncronous > >> > generator > >> f() > >> > and gives it the access to its async pipe, so when f() is > >> > yielding > >> values to > >> > sum, it can also yield a future to the scheduler via the async > >> > pipe and suspend the whole task. > >> > > >> > Regards, Adam Barto? > >> > > >> > > >> > _______________________________________________ > >> > Python-ideas mailing list > >> > Python-ideas at python.org > >> > https://mail.python.org/mailman/listinfo/python-ideas > >> > Code of Conduct: http://python.org/psf/codeofconduct/ > >> > >> > >> > >> -- > >> Thanks, > >> Andrew Svetlov > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > -- Best regards, Paul mailto:pmiscml at gmail.com From yselivanov.ml at gmail.com Mon Jun 29 00:14:20 2015 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 28 Jun 2015 18:14:20 -0400 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: <20150628185249.61624b82@x230> References: <20150628185249.61624b82@x230> Message-ID: <559071BC.8090603@gmail.com> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote: >> There is also the problem that one cannot easily feed a queue, >> >asynchronous generator, or any asynchronous iterator to a simple >> >synchronous consumer like sum() or list() or "".join(). It would be >> >nice if there was a way to wrap them to asynchronous ones when needed >> >? something like (async sum)(asynchronously_produced_numbers()). > All that is easily achievable with classical Python coroutines, not > with asyncio garden variety of coroutines, which lately were casted > into a language level with async/await disablers: > > def coro1(): > yield 1 > yield 2 > yield 3 > > def coro2(): > yield from coro1() > yield 4 > yield 5 > > print(sum(coro2())) You have easily achieved combining two generators with 'yield from' and feeding that to 'sum' builtin. There is no way to combine synchronous loops with asynchronous coroutines; by definition, the entire process will block while you are iterating trough them. Yury From ncoghlan at gmail.com Mon Jun 29 03:09:14 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Jun 2015 11:09:14 +1000 Subject: [Python-ideas] Fwd: [Python-Dev] An yocto change proposal in logging module to simplify structured logs support In-Reply-To: References: <1110534133.1837705.1432584526401.JavaMail.yahoo@mail.yahoo.com> Message-ID: On 29 Jun 2015 1:50 am, "Ludovic Gasc" wrote: > In fact, the issue shouldn't be our brains, but it was clearly a time consuming task, and we have too much directly paid-work to take care. > > Don't be wrong: I don't say that ELK doesn't work, only it's time consuming with a high level of logs. > I'm pretty sure that a lot of people are happy with ELK, it's cool for them ;-) > > It's like Oracle and PostgreSQL databases: Where with Oracle you need a full-time DBA, with PostgreSQL: apt-get install postgresql > With this last sentence, I'm totally caricatural, but only to show where I see an issue that should be fixed, at least for us. > (FYI, in a previous professional life, I've maintained Oracle, MySQL and PostgreSQL servers for several clients, I know a little bit the subject). This discrepancy in manageability between services like PostgreSQL & more complex setups like the ELK stack is why Red Hat started working on Nulecule as part of Project Atomic: http://rhelblog.redhat.com/2015/06/23/announcing-yum-rpm-for-containerized-applications-nulecule-atomic-app/ There's still some work to be done making sure the related tools support Debian and derivatives properly, but "the ELK stack is too hard to install & maintain" is a distro level software management problem to be solved, rather than something to try to work around at the language level. Cheers, Nick. > > From my point of view, the features in journald are enough to replace most usages of ELK, at least for us, and, contrary to ELK, journald is already installed in all latest Linux distributions, even in Debian Jessie. You have almost no maintenance cost. > >> More importantly, when you drop logstash-forwarder, how are you intending to get the messages to the upstream server? You don't want to make your log calls synchronously wait for acknowledgement before returning. So you need some kind of buffering. And just buffering in memory doesn't work: if your service shuts down unexpectedly, you've lost the last batch of log messages which would tell you why it went down (plus, if the network goes down temporarily, your memory use becomes unbounded). You can of course buffer to disk, but then you've just reintroduced the same need for some kind of intermediate storage format you were trying to eliminate?and it doesn't really solve the problem, because if your service shuts down, the last messages won't get sent until it starts up again. So you could write a separate simple store-and-forward daemon that either reads those file buffers or listens on localhost UDP? but then you've just recreated logstash-forwarder. > > > In the past, we used directly a local rsyslog to play this role on each VM, connected directly with the Python daemons via a datagram UNIX socket. > See a logging config file example: > https://github.com/Eyepea/cookiecutter-API-Hour/blob/master/%7B%7Bcookiecutter.app_name%7D%7D/etc/%7B%7Bcookiecutter.app_name%7D%7D/api_hour/logging.ini#L21 > > Now, it's journald that plays this role, also via a datagram UNIX socket. > >> >> And even if you wanted to do all that, I don't see why you couldn't do it all with structlog. They recommend using an already-working workflow instead of designing a different one from scratch, but it's just a recommendation. > > > You're right: Don't reinvent the wheel. > However, if I follow your argument in another context: instead of to create AsyncIO, Guido should integrate Twisted in Python ? > As an end-user of Twisted and AsyncIO, it isn't for the pleasure or to be fancy that we migrated from Twisted to AsyncIO ;-) > To me, the expression should be: "Don't reinvent the wheel, except if you can provide a more efficient wheel" > > Now, in the context of logging: Please let me to try another approach, maybe I'll waste my time, or maybe I'll find an interesting gold nugget, who knows before to dig ? > You can think I'm trying to be only different from the "ELK" standard, and it's possible, who knows ? > > If I revive this thread, it isn't to troll you, but because I'm interested in by your opinion. > I may found a better approach that doesn't need a CPython patch and it's more powerful. > > In the source code of logging package, I've found this: > https://github.com/python/cpython/blob/master/Lib/logging/__init__.py#L269 > BTW, this approach should have more promotion: I didn't know you can use a dict to replace text in a log message, I thought only strings. > > Now, instead of to use extra parameter, I use directly this feature. > > For the developer, instead to write this: > > logger.debug('Receive a create_or_update request from "%s" account', account_id) > > he writes this: > > logger.debug('Receive a create_or_update request from "%(account_id)s" account', > {'request_id': request.request_id, > 'account_id': account_id, > 'aiohttp_request': request, > 'payload': payload}) > > With that, you can write logs as usual in your source code, and use the handler you want. > > However, if you use the systemDream handler, all metadata with your log will be sent to journald: > https://github.com/Eyepea/systemDream/blob/master/src/systemdream/journal/handler.py#L79 > > The another bonus of this approach is that you can use an element of your dict to improve your log message. > With my previous approach with extra parameter, you must pass two times the values. > The cherry on the cake is that extra can be used for something else. > And the bonus of bonus, for the developers who already use this logging feature, they are already journald compliant without to know. > > I see no drawbacks of this approach, except that the developers who already use this feature: he must be consistent in the key names of the dict to be useful with journald. > > I'm very interested in by your feedbacks, maybe I've missed something. > > If anybody doesn't find an issue, I'll push this pattern also in the official Python binding of journald, systemDream is only my laboratory to experiment around systemd/journald (and secondarily, it's impossible to setup the official Python binding of systemd/journald in a pyvenv, at least to me). > > I'll publish also a step-by-step tutorial for the new comers on my blog. > > Thanks for your attention. > -- > Ludovic Gasc (GMLudo) > http://www.gmludo.eu/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmiscml at gmail.com Mon Jun 29 08:44:39 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 29 Jun 2015 09:44:39 +0300 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: <559071BC.8090603@gmail.com> References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> Message-ID: <20150629094439.4f8a8efa@x230> Hello, On Sun, 28 Jun 2015 18:14:20 -0400 Yury Selivanov wrote: > > On 2015-06-28 11:52 AM, Paul Sokolovsky wrote: > >> There is also the problem that one cannot easily feed a queue, > >> >asynchronous generator, or any asynchronous iterator to a simple > >> >synchronous consumer like sum() or list() or "".join(). It would > >> >be nice if there was a way to wrap them to asynchronous ones when > >> >needed ? something like (async > >> >sum)(asynchronously_produced_numbers()). > > All that is easily achievable with classical Python coroutines, not > > with asyncio garden variety of coroutines, which lately were casted > > into a language level with async/await disablers: > > > > def coro1(): > > yield 1 > > yield 2 > > yield 3 > > > > def coro2(): > > yield from coro1() > > yield 4 > > yield 5 > > > > print(sum(coro2())) > > > You have easily achieved combining two generators with 'yield from' > and feeding that to 'sum' builtin. Right, the point here was that PEP492, banning usage of "yield" in coroutines, doesn't help with such simple and basic usage of them. And then I again can say what I said during initial discussion of PEP492: I have dual feeling about it: promise of making coroutines easier and more user friendly is worth all support, but step of limiting basic language usage in them doesn't seem good. What me and other people can do then is just trust that you guys know what you do and PEP492 will be just first step. But bottom line is that I personally don't find async/await worthy to use for now - it's better to stick to old good yield from, until the promise of truly better coroutines is delivered. > There is no way to combine synchronous loops with asynchronous > coroutines; by definition, the entire process will block while you > are iterating trough them. Indeed, to solve this issue, it requires to use "inversion of inversion of control" pattern. Typical real-world example is that someone has got their (unwise) main loop and wants us to do callback mess programming with it, but we don't want them to call us, we want to call them, at controlled intervals, to do controlled amount of work. The solution would be to pass a callback which looks like a normal function, but which is actually a coroutine. Then foreign main loop, calling it, would suspend it and pass control to "us", and us can let another iteration of foreign main loop by resuming that coroutine. The essence of this approach lies in having a coroutine "look like" a usual function, or more exactly, in being able to resume a coroutine from a context of normal function. And that's explicitly not what Python coroutines are - they require lexical marking of each site where coroutine suspension may happen (for good reasons which were described here on the list many times). During previous phase of discussion, I gave classification of different types of coroutines to graps/structure all this stuff better: http://code.activestate.com/lists/python-dev/136046/ -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Mon Jun 29 10:32:56 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Jun 2015 18:32:56 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: <20150629094439.4f8a8efa@x230> References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 29 June 2015 at 16:44, Paul Sokolovsky wrote: > Hello, > > On Sun, 28 Jun 2015 18:14:20 -0400 > Yury Selivanov wrote: > >> >> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote: >> >> There is also the problem that one cannot easily feed a queue, >> >> >asynchronous generator, or any asynchronous iterator to a simple >> >> >synchronous consumer like sum() or list() or "".join(). It would >> >> >be nice if there was a way to wrap them to asynchronous ones when >> >> >needed ? something like (async >> >> >sum)(asynchronously_produced_numbers()). >> > All that is easily achievable with classical Python coroutines, not >> > with asyncio garden variety of coroutines, which lately were casted >> > into a language level with async/await disablers: >> > >> > def coro1(): >> > yield 1 >> > yield 2 >> > yield 3 >> > >> > def coro2(): >> > yield from coro1() >> > yield 4 >> > yield 5 >> > >> > print(sum(coro2())) >> >> >> You have easily achieved combining two generators with 'yield from' >> and feeding that to 'sum' builtin. > > Right, the point here was that PEP492, banning usage of "yield" in > coroutines, doesn't help with such simple and basic usage of them. And > then I again can say what I said during initial discussion of PEP492: > I have dual feeling about it: promise of making coroutines easier and > more user friendly is worth all support, but step of limiting basic > language usage in them doesn't seem good. What me and other people can > do then is just trust that you guys know what you do and PEP492 will > be just first step. But bottom line is that I personally don't find > async/await worthy to use for now - it's better to stick to old good > yield from, until the promise of truly better coroutines is delivered. The purpose of PEP 492 is to fundamentally split the asynchronous IO use case away from traditional generators. If you're using native coroutines, you MUST have an event loop, or at least be using something like asyncio.run_until_complete() (which spins up a scheduler for the duration). If you're using generators without @types.coroutine or @asyncio.coroutine (or the equivalent for tulip, Twisted, etc), then you're expecting a synchronous driver rather than an asynchronous one. This isn't an accident, or something that will change at some point in the future, it's the entire point of the exercise: having it be obvious both how you're meant to interact with something based on the way it's defined, and how you factor outside subcomponents of the algorithm. Asynchronous driver? Use a coroutine. Synchronous driver? Use a generator. What we *don't* have are consumption functions that have an implied "async for" inside them - functions like sum(), any(), all(), etc are all synchronous drivers. The other key thing we don't have yet? Asynchronous comprehensions. A peak at the various options for parallel execution described in https://docs.python.org/3/library/concurrent.futures.html documentation helps illustrate why: once we're talking about applying reduction functions to asynchronous iterables we're getting into full-blown language-level-support-for-MapReduce territory. Do the substeps still need to be executed in series? Or can the substeps be executed in parallel, and either accumulated in iteration order or as they become available? Does it perhaps make sense to *require* that the steps be executable in parallel, such that we could write the following: result = sum(x*x for async x in coro) Where the reduction step remains synchronous, but we can mark the comprehension/map step as asynchronous, and have that change the generated code to create an implied lambda for the "lambda x: x*x" calculation, dispatch all of those to the scheduler at once, and then produce the results one at a time? The answer to that is "quite possibly, but we don't really know yet". PEP 492 is enough to address some major comprehensibility challenges that exist around generators-as-coroutines. It *doesn't* bring language level support for parallel MapReduce to Python, but it *does* bring some interesting new building blocks for folks to play around with in that regard (in particular, figuring out what we want the comprehension level semantics of "async for" to be). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Jun 29 11:33:21 2015 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Jun 2015 11:33:21 +0200 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: Not following this in detail, but want to note that async isn't a good model for parallelization (except I/O) because the expectation of coroutines is single threading. The event loop serializes callbacks. Changing this would break expectations and code. On Jun 29, 2015 10:33 AM, "Nick Coghlan" wrote: > On 29 June 2015 at 16:44, Paul Sokolovsky wrote: > > Hello, > > > > On Sun, 28 Jun 2015 18:14:20 -0400 > > Yury Selivanov wrote: > > > >> > >> On 2015-06-28 11:52 AM, Paul Sokolovsky wrote: > >> >> There is also the problem that one cannot easily feed a queue, > >> >> >asynchronous generator, or any asynchronous iterator to a simple > >> >> >synchronous consumer like sum() or list() or "".join(). It would > >> >> >be nice if there was a way to wrap them to asynchronous ones when > >> >> >needed ? something like (async > >> >> >sum)(asynchronously_produced_numbers()). > >> > All that is easily achievable with classical Python coroutines, not > >> > with asyncio garden variety of coroutines, which lately were casted > >> > into a language level with async/await disablers: > >> > > >> > def coro1(): > >> > yield 1 > >> > yield 2 > >> > yield 3 > >> > > >> > def coro2(): > >> > yield from coro1() > >> > yield 4 > >> > yield 5 > >> > > >> > print(sum(coro2())) > >> > >> > >> You have easily achieved combining two generators with 'yield from' > >> and feeding that to 'sum' builtin. > > > > Right, the point here was that PEP492, banning usage of "yield" in > > coroutines, doesn't help with such simple and basic usage of them. And > > then I again can say what I said during initial discussion of PEP492: > > I have dual feeling about it: promise of making coroutines easier and > > more user friendly is worth all support, but step of limiting basic > > language usage in them doesn't seem good. What me and other people can > > do then is just trust that you guys know what you do and PEP492 will > > be just first step. But bottom line is that I personally don't find > > async/await worthy to use for now - it's better to stick to old good > > yield from, until the promise of truly better coroutines is delivered. > > The purpose of PEP 492 is to fundamentally split the asynchronous IO > use case away from traditional generators. If you're using native > coroutines, you MUST have an event loop, or at least be using > something like asyncio.run_until_complete() (which spins up a > scheduler for the duration). If you're using generators without > @types.coroutine or @asyncio.coroutine (or the equivalent for tulip, > Twisted, etc), then you're expecting a synchronous driver rather than > an asynchronous one. > > This isn't an accident, or something that will change at some point in > the future, it's the entire point of the exercise: having it be > obvious both how you're meant to interact with something based on the > way it's defined, and how you factor outside subcomponents of the > algorithm. Asynchronous driver? Use a coroutine. Synchronous driver? > Use a generator. > > What we *don't* have are consumption functions that have an implied > "async for" inside them - functions like sum(), any(), all(), etc are > all synchronous drivers. > > The other key thing we don't have yet? Asynchronous comprehensions. > > A peak at the various options for parallel execution described in > https://docs.python.org/3/library/concurrent.futures.html > documentation helps illustrate why: once we're talking about applying > reduction functions to asynchronous iterables we're getting into > full-blown language-level-support-for-MapReduce territory. Do the > substeps still need to be executed in series? Or can the substeps be > executed in parallel, and either accumulated in iteration order or as > they become available? Does it perhaps make sense to *require* that > the steps be executable in parallel, such that we could write the > following: > > result = sum(x*x for async x in coro) > > Where the reduction step remains synchronous, but we can mark the > comprehension/map step as asynchronous, and have that change the > generated code to create an implied lambda for the "lambda x: x*x" > calculation, dispatch all of those to the scheduler at once, and then > produce the results one at a time? > > The answer to that is "quite possibly, but we don't really know yet". > PEP 492 is enough to address some major comprehensibility challenges > that exist around generators-as-coroutines. It *doesn't* bring > language level support for parallel MapReduce to Python, but it *does* > bring some interesting new building blocks for folks to play around > with in that regard (in particular, figuring out what we want the > comprehension level semantics of "async for" to be). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Jun 29 12:57:58 2015 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 29 Jun 2015 11:57:58 +0100 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 29 June 2015 at 09:32, Nick Coghlan wrote: > What we *don't* have are consumption functions that have an implied > "async for" inside them - functions like sum(), any(), all(), etc are > all synchronous drivers. Note that this requirement to duplicate big chunks of functionality in sync and async forms is a fundamental aspect of the design. It's not easy to swallow (hence the fact that threads like this keep coming up) as it seems to badly violate DRY principles, but it is deliberate. There are a number of blog posts that discuss this "two separate worlds" approach, some positive, some negative. Links have been posted recently in one of these threads, but I'm afraid I don't have them to hand right now. Paul From pmiscml at gmail.com Mon Jun 29 13:09:28 2015 From: pmiscml at gmail.com (Paul Sokolovsky) Date: Mon, 29 Jun 2015 14:09:28 +0300 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: <20150629140928.7e767753@x230> Hello, On Mon, 29 Jun 2015 11:57:58 +0100 Paul Moore wrote: > On 29 June 2015 at 09:32, Nick Coghlan wrote: > > What we *don't* have are consumption functions that have an implied > > "async for" inside them - functions like sum(), any(), all(), etc > > are all synchronous drivers. > > Note that this requirement to duplicate big chunks of functionality in > sync and async forms is a fundamental aspect of the design. It's not > easy to swallow (hence the fact that threads like this keep coming up) > as it seems to badly violate DRY principles, but it is deliberate. > > There are a number of blog posts that discuss this "two separate > worlds" approach, some positive, some negative. Links have been posted > recently in one of these threads, but I'm afraid I don't have them to > hand right now. Maybe not the links you meant, but definitely discussing a split-world problem designers of other languages and APIs face: What Color is Your Function? http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ Red and Green Callbacks http://joearms.github.io/2013/04/02/Red-and-Green-Callbacks.html > > Paul -- Best regards, Paul mailto:pmiscml at gmail.com From ncoghlan at gmail.com Mon Jun 29 13:23:52 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Jun 2015 21:23:52 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 29 Jun 2015 7:33 pm, "Guido van Rossum" wrote: > > Not following this in detail, but want to note that async isn't a good model for parallelization (except I/O) because the expectation of coroutines is single threading. The event loop serializes callbacks. Changing this would break expectations and code. Yeah, it's a bad idea - I realised after reading your post that because submission for scheduling and waiting for a result can already be separated it should be possible in Py 3.5 to write a "parallel" asynchronous iterator that eagerly consumes the awaitables produced by another asynchronous iterator, schedules them all, then produces the awaitables in order. (That idea is probably as clear as mud without code to show what I mean...) Regards, Nick. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at slenders.be Mon Jun 29 16:51:26 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Mon, 29 Jun 2015 16:51:26 +0200 Subject: [Python-ideas] Make os.pipe() return a namedtuple. Message-ID: Could we do that? Is there is reason it's not already a namedtuple? I always forget what the read-end and what the write-end of the pipe is, and I use it quite regularly. Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From joejev at gmail.com Mon Jun 29 20:00:24 2015 From: joejev at gmail.com (Joseph Jevnik) Date: Mon, 29 Jun 2015 14:00:24 -0400 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: Message-ID: Maybe make it a structsequence? On Mon, Jun 29, 2015 at 10:51 AM, Jonathan Slenders wrote: > Could we do that? Is there is reason it's not already a namedtuple? > > I always forget what the read-end and what the write-end of the pipe is, > and I use it quite regularly. > > Jonathan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ron3200 at gmail.com Mon Jun 29 23:51:10 2015 From: ron3200 at gmail.com (Ron Adam) Date: Mon, 29 Jun 2015 17:51:10 -0400 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 06/29/2015 07:23 AM, Nick Coghlan wrote: > > On 29 Jun 2015 7:33 pm, "Guido van Rossum" > > wrote: > > > > Not following this in detail, but want to note that async isn't a good > model for parallelization (except I/O) because the expectation of > coroutines is single threading. The event loop serializes callbacks. > Changing this would break expectations and code. > > Yeah, it's a bad idea - I realised after reading your post that because > submission for scheduling and waiting for a result can already be separated > it should be possible in Py 3.5 to write a "parallel" asynchronous iterator > that eagerly consumes the awaitables produced by another asynchronous > iterator, schedules them all, then produces the awaitables in order. > > (That idea is probably as clear as mud without code to show what I mean...) Only the parts concerning "schedules them all", and "produces awaitables in order". ;-) Async IO is mainly about recapturing idle cpu time while waiting for relatively slow io. But it could also be a way to organise asynchronous code. In the earlier example with circles, and each object having it's own thread... And that running into the thousands, it can be rearranged a bit if each scheduler has it's own thread. Then objects can be assigned to schedulers instead of threads. (or something like that.) Of course that's still clear as mud at this point, but maybe a different colour of mud. ;-) Cheers, Ron From cs at zip.com.au Tue Jun 30 02:12:42 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 30 Jun 2015 10:12:42 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: Message-ID: <20150630001242.GA53393@cskk.homeip.net> On 29Jun2015 16:51, Jonathan Slenders wrote: >Could we do that? Is there is reason it's not already a namedtuple? > >I always forget what the read-end and what the write-end of the pipe is, >and I use it quite regularly. The ordering is the same as for the default process file descriptors. A normal process has stdin as fd 0 and stdout as fd 1. So the return from pipe() has the read end as index 0 and the write end as fd 1. Cheers, Cameron Simpson From njs at pobox.com Tue Jun 30 02:45:47 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 29 Jun 2015 17:45:47 -0700 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: Message-ID: On Mon, Jun 29, 2015 at 7:51 AM, Jonathan Slenders wrote: > Could we do that? Is there is reason it's not already a namedtuple? > > I always forget what the read-end and what the write-end of the pipe is, and > I use it quite regularly. Sounds like a good idea to me. -n -- Nathaniel J. Smith -- http://vorpus.org From steve at pearwood.info Tue Jun 30 03:50:04 2015 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 30 Jun 2015 11:50:04 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <20150630001242.GA53393@cskk.homeip.net> References: <20150630001242.GA53393@cskk.homeip.net> Message-ID: <20150630015003.GA10773@ando.pearwood.info> On Tue, Jun 30, 2015 at 10:12:42AM +1000, Cameron Simpson wrote: > On 29Jun2015 16:51, Jonathan Slenders wrote: > >Could we do that? Is there is reason it's not already a namedtuple? > > > >I always forget what the read-end and what the write-end of the pipe is, > >and I use it quite regularly. > > The ordering is the same as for the default process file descriptors. A > normal process has stdin as fd 0 and stdout as fd 1. So the return from > pipe() has the read end as index 0 and the write end as fd 1. Yeah, I always forget which is fd 0 and which is fd 1 too. Having nice descriptive names rather than using numbered indexes is generally better practice, and I don't think there is any serious downside to using a namedtuple. A minor enhancement like this shouldn't require an extended discussion here on python-ideas. Jonathan, would you be so kind as to raise an enhancement request on the bug tracker? I don't think it's too late for 3.5. -- Steve From zachary.ware+pyideas at gmail.com Tue Jun 30 04:00:22 2015 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Mon, 29 Jun 2015 21:00:22 -0500 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <20150630015003.GA10773@ando.pearwood.info> References: <20150630001242.GA53393@cskk.homeip.net> <20150630015003.GA10773@ando.pearwood.info> Message-ID: On Mon, Jun 29, 2015 at 8:50 PM, Steven D'Aprano wrote: > Jonathan, would you be so kind as to raise an enhancement request on the > bug tracker? I don't think it's too late for 3.5. 3.5 is feature-frozen without a special exemption from Larry Hastings, and I think the window for those is already past too. 3.6 is open for development already, though. -- Zach From cs at zip.com.au Tue Jun 30 04:05:29 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 30 Jun 2015 12:05:29 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <20150630015003.GA10773@ando.pearwood.info> References: <20150630015003.GA10773@ando.pearwood.info> Message-ID: <20150630020529.GA55849@cskk.homeip.net> On 30Jun2015 11:50, Steven D'Aprano wrote: >On Tue, Jun 30, 2015 at 10:12:42AM +1000, Cameron Simpson wrote: >> On 29Jun2015 16:51, Jonathan Slenders wrote: >> >Could we do that? Is there is reason it's not already a namedtuple? >> > >> >I always forget what the read-end and what the write-end of the pipe is, >> >and I use it quite regularly. >> >> The ordering is the same as for the default process file descriptors. A >> normal process has stdin as fd 0 and stdout as fd 1. So the return from >> pipe() has the read end as index 0 and the write end as fd 1. > >Yeah, I always forget which is fd 0 and which is fd 1 too. Shrug. I use the rationale above. stdin==0 is extremely easy to remember. However, I have no cogent objection to a named tuple myself. >Having nice descriptive names rather than using numbered indexes is >generally better practice, and I don't think there is any serious >downside to using a namedtuple. A minor enhancement like this shouldn't >require an extended discussion here on python-ideas. But... what shall we call the attributes? Sure an extended bikeshed is required here:-) Cheers, Cameron Simpson I will be a speed bump on the information super-highway. - jvogel at math.rutgers.edu (jeff vogel) From ben+python at benfinney.id.au Tue Jun 30 04:10:22 2015 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 30 Jun 2015 12:10:22 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. References: <20150630001242.GA53393@cskk.homeip.net> <20150630015003.GA10773@ando.pearwood.info> Message-ID: <85mvzirlo1.fsf@benfinney.id.au> Steven D'Aprano writes: > Yeah, I always forget which is fd 0 and which is fd 1 too. > > Having nice descriptive names rather than using numbered indexes is > generally better practice I definitely prefer to use, and promote, the explicit names ?stdin?, ?stdout?, and ?stderr? rather than the file descriptor numbers. On the point of confusing them though: I find it easy enough to remember that the two streams for output stay together, and the input one comes first at 0. > and I don't think there is any serious downside to using a namedtuple. > A minor enhancement like this shouldn't require an extended discussion > here on python-ideas. +1, let's just get the standard names there as attributes of a namedtuple. One more set of magic numbers to relegate to implementation detail, encapsulated where they belong! -- \ Fry: ?Take that, poor people!? Leela: ?But Fry, you?re not | `\ rich.? Fry: ?No, but I will be someday, and then people like | _o__) me better watch out!? ?Futurama | Ben Finney From rosuav at gmail.com Tue Jun 30 04:14:17 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 30 Jun 2015 12:14:17 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <85mvzirlo1.fsf@benfinney.id.au> References: <20150630001242.GA53393@cskk.homeip.net> <20150630015003.GA10773@ando.pearwood.info> <85mvzirlo1.fsf@benfinney.id.au> Message-ID: On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney wrote: > Steven D'Aprano writes: > >> Yeah, I always forget which is fd 0 and which is fd 1 too. >> >> Having nice descriptive names rather than using numbered indexes is >> generally better practice > > I definitely prefer to use, and promote, the explicit names ?stdin?, > ?stdout?, and ?stderr? rather than the file descriptor numbers. > > On the point of confusing them though: I find it easy enough to remember > that the two streams for output stay together, and the input one comes > first at 0. > >> and I don't think there is any serious downside to using a namedtuple. >> A minor enhancement like this shouldn't require an extended discussion >> here on python-ideas. > > +1, let's just get the standard names there as attributes of a > namedtuple. Except that this isn't about stdin/stdout - that just happens to make a neat mnemonic. This is about a pipe, which has a reading end and a writing end. If you pass one of those to another process to use as its stdout, you'll be reading from the reading end; calling it "stdin" would be confusing, since you're getting what the process wrote to stdout. How about just "read" and "write"? Yep, Cameron was right... ChrisA From ncoghlan at gmail.com Tue Jun 30 06:08:19 2015 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Jun 2015 14:08:19 +1000 Subject: [Python-ideas] Are there asynchronous generators? In-Reply-To: References: <20150628185249.61624b82@x230> <559071BC.8090603@gmail.com> <20150629094439.4f8a8efa@x230> Message-ID: On 30 June 2015 at 07:51, Ron Adam wrote: > > On 06/29/2015 07:23 AM, Nick Coghlan wrote: >> >> >> On 29 Jun 2015 7:33 pm, "Guido van Rossum" >> > > wrote: >> > >> > Not following this in detail, but want to note that async isn't a good >> model for parallelization (except I/O) because the expectation of >> coroutines is single threading. The event loop serializes callbacks. >> Changing this would break expectations and code. >> >> Yeah, it's a bad idea - I realised after reading your post that because >> submission for scheduling and waiting for a result can already be >> separated >> it should be possible in Py 3.5 to write a "parallel" asynchronous >> iterator >> that eagerly consumes the awaitables produced by another asynchronous >> iterator, schedules them all, then produces the awaitables in order. >> >> (That idea is probably as clear as mud without code to show what I >> mean...) > > > Only the parts concerning "schedules them all", and "produces awaitables in > order". ;-) Some completely untested conceptual code that may not even compile, let alone run, but hopefully conveys what I mean better than English does: def get_awaitables(self, async_iterable): """Gets a list of awaitables from an asynchronous iterator""" asynciter = async_iterable.__aiter__() awaitables = [] while True: try: awaitables.append(asynciter.__anext__()) except StopAsyncIteration: break return awaitables async def wait_for_result(awaitable): """Simple coroutine to wait for a single result""" return await awaitable def iter_coroutines(async_iterable): """Produces coroutines to wait for each result from an asynchronous iterator""" for awaitable in get_awaitables(async_iterable): yield wait_for_result(awaitable) def iter_tasks(async_iterable, eventloop=None): """Schedules event loop tasks to wait for each result from an asynchronous iterator""" if eventloop is None: eventloop = asyncio.get_event_loop() for coroutine in iter_coroutines(async_iterable): yield eventloop.create_task(coroutine) class aiter_parallel: """Asynchronous iterator to wait for several asynchronous operations in parallel""" def __init__(self, async_iterable): # Concurrent evaluation of future results is launched immediately self._tasks = tasks = list(iter_tasks(async_iterable)) self._taskiter = iter(tasks) def __aiter__(self): return self def __anext__(self): try: return next(self._taskiter) except StopIteration: raise StopAsyncIteration # Example reduction function async def sum_async(async_iterable, start=0): tally = start async for x in aiter_parallel(async_iterable): tally += x return x # Parallel sum from synchronous code: result = asyncio.get_event_loop().run_until_complete(sum_async(async_iterable)) # Parallel sum from asynchronous code: result = await sum_async(async_iterable)) As the definition of "aiter_parallel" shows, we don't offer any nice syntactic sugar for defining asynchronous iterators yet (hence the question that started this thread). Hopefully the above helps illustrate the complexity hidden behind such a deceptively simple question :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From cs at zip.com.au Tue Jun 30 08:07:56 2015 From: cs at zip.com.au (Cameron Simpson) Date: Tue, 30 Jun 2015 16:07:56 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: Message-ID: <20150630060756.GA41465@cskk.homeip.net> On 30Jun2015 12:14, Chris Angelico wrote: >On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney wrote: >> Steven D'Aprano writes: >>> and I don't think there is any serious downside to using a namedtuple. >>> A minor enhancement like this shouldn't require an extended discussion >>> here on python-ideas. >> >> +1, let's just get the standard names there as attributes of a >> namedtuple. > >Except that this isn't about stdin/stdout - that just happens to make >a neat mnemonic. This is about a pipe, which has a reading end and a >writing end. If you pass one of those to another process to use as its >stdout, you'll be reading from the reading end; calling it "stdin" >would be confusing, since you're getting what the process wrote to >stdout. > >How about just "read" and "write"? +1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the same reason as outlined above. Cheers, Cameron Simpson From jonathan at slenders.be Tue Jun 30 09:02:05 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Tue, 30 Jun 2015 09:02:05 +0200 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <20150630060756.GA41465@cskk.homeip.net> References: <20150630060756.GA41465@cskk.homeip.net> Message-ID: If we use "read" and write as names. It means that often we end up writing code like this: os.write(our_pipe.write, data) os.read(our_pipe.read) Is that ok? I mean, it's not confusing that the os.read is a method, while pip.read is an attribute. Jonathan 2015-06-30 8:07 GMT+02:00 Cameron Simpson : > On 30Jun2015 12:14, Chris Angelico wrote: > >> On Tue, Jun 30, 2015 at 12:10 PM, Ben Finney >> wrote: >> >>> Steven D'Aprano writes: >>> >>>> and I don't think there is any serious downside to using a namedtuple. >>>> A minor enhancement like this shouldn't require an extended discussion >>>> here on python-ideas. >>>> >>> >>> +1, let's just get the standard names there as attributes of a >>> namedtuple. >>> >> >> Except that this isn't about stdin/stdout - that just happens to make >> a neat mnemonic. This is about a pipe, which has a reading end and a >> writing end. If you pass one of those to another process to use as its >> stdout, you'll be reading from the reading end; calling it "stdin" >> would be confusing, since you're getting what the process wrote to >> stdout. >> >> How about just "read" and "write"? >> > > +1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the > same reason as outlined above. > > Cheers, > Cameron Simpson > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jun 30 09:03:23 2015 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 30 Jun 2015 17:03:23 +1000 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: <20150630060756.GA41465@cskk.homeip.net> Message-ID: On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders wrote: > If we use "read" and write as names. It means that often we end up writing > code like this: > > os.write(our_pipe.write, data) > os.read(our_pipe.read) > > Is that ok? I mean, it's not confusing that the os.read is a method, while > pip.read is an attribute. I'd much rather that than the converse. You always put read with read, you always put write with write. ChrisA From njs at pobox.com Tue Jun 30 09:11:39 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 30 Jun 2015 00:11:39 -0700 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: <20150630060756.GA41465@cskk.homeip.net> Message-ID: On Tue, Jun 30, 2015 at 12:03 AM, Chris Angelico wrote: > On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders wrote: >> If we use "read" and write as names. It means that often we end up writing >> code like this: >> >> os.write(our_pipe.write, data) >> os.read(our_pipe.read) >> >> Is that ok? I mean, it's not confusing that the os.read is a method, while >> pip.read is an attribute. > > I'd much rather that than the converse. You always put read with read, > you always put write with write. It also appears to be the way that everyone is already naming their variables: https://codesearch.debian.net/perpackage-results/os%5C.pipe%20filetype%3Apython/2/page_0 I see returns named "r, w", "rout, wout", "rfd, wfd", "rfd, self.writepipe", "readfd, writefd", "p2cread, p2cwrite", etc. Maybe readfd/writefd or read_fileno/write_fileno would be a little better than plain read/write, both to remind the user that these are fds rather than file objects and to make the names nouns instead of verbs. But really read/write is fine too. -n -- Nathaniel J. Smith -- http://vorpus.org From jonathan at slenders.be Tue Jun 30 09:22:24 2015 From: jonathan at slenders.be (Jonathan Slenders) Date: Tue, 30 Jun 2015 09:22:24 +0200 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: References: <20150630060756.GA41465@cskk.homeip.net> Message-ID: I created an issue: http://bugs.python.org/issue24536 readfd/writefd sounds like a good choice, but it's still open for discussion. 2015-06-30 9:11 GMT+02:00 Nathaniel Smith : > On Tue, Jun 30, 2015 at 12:03 AM, Chris Angelico wrote: > > On Tue, Jun 30, 2015 at 5:02 PM, Jonathan Slenders > wrote: > >> If we use "read" and write as names. It means that often we end up > writing > >> code like this: > >> > >> os.write(our_pipe.write, data) > >> os.read(our_pipe.read) > >> > >> Is that ok? I mean, it's not confusing that the os.read is a method, > while > >> pip.read is an attribute. > > > > I'd much rather that than the converse. You always put read with read, > > you always put write with write. > > It also appears to be the way that everyone is already naming their > variables: > > > https://codesearch.debian.net/perpackage-results/os%5C.pipe%20filetype%3Apython/2/page_0 > > I see returns named "r, w", "rout, wout", "rfd, wfd", "rfd, > self.writepipe", "readfd, writefd", "p2cread, p2cwrite", etc. > > Maybe readfd/writefd or read_fileno/write_fileno would be a little > better than plain read/write, both to remind the user that these are > fds rather than file objects and to make the names nouns instead of > verbs. But really read/write is fine too. > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From niki.spahiev at gmail.com Tue Jun 30 09:32:22 2015 From: niki.spahiev at gmail.com (Niki Spahiev) Date: Tue, 30 Jun 2015 10:32:22 +0300 Subject: [Python-ideas] Make os.pipe() return a namedtuple. In-Reply-To: <20150630060756.GA41465@cskk.homeip.net> References: <20150630060756.GA41465@cskk.homeip.net> Message-ID: <55924606.8070805@gmail.com> On 30.06.2015 09:07, Cameron Simpson wrote: > > +1 for "read" and "write" for me. And -1 on "stdin" and "stdout" for the > same reason as outlined above. This is common found code: if not hasattr(src, 'read'): src = open(src) same for write. I think read_fd and write_fd is better. Niki