Mailman 3 September 2015 - Python-ideas

Bring line continuation to multi-level dictionary lookup
by John Wong Sept. 18, 2015

Sept. 18, 2015

Hi everyone. I work with APIs which have deep nested dictionary structure response. Imagine a simplified case: foo = {1: {2: {3: {4: {5: 6 } } } } Now imagine I need to get to 6: foo['1']['2']['3']['4']['5']['6'] This looks managable, but if the key name is long, then I certainly will end doing this to respect my style guide. To make it concrete, let's use something reallistic, a response call from AWS API: response = {'DescribeDBSnapshotsResponse': {'ResponseMetadata': {'RequestId': '… [View More]

5 5

Non-English names in the turtle module.
by Al Sweigart Sept. 16, 2015

Sept. 16, 2015

I've opened an issue for adding non-English names to the turtle module's function names: https://bugs.python.org/issue24990 This would effectively take this code: import turtle t = turtle.Pen() t.pencolor('green') t.forward(100) ...and have this code in French be completely equivalent: import turtle t = turtle.Plume() t.couleurplume('vert') t.avant(100) (Pardon my google-translate French.) This, of course, is terrible way for a software module to implement … [View More]

25 92

Add inspect.getenclosed to return/yield source code for nested classes and functions
by Un Do Sept. 16, 2015

Sept. 16, 2015

I propose adding a function into inspect module that will retrieve definitions of classes and functions (standard and lambdas) located inside another function/method. In my opinion this would a small but nice and useful addition to the standard library. It can be implemented using a couple of undocumented function from that module (findsource and getblock) without any performance drawbacks. Example: In [9]: print(getsource(function)) def function(): class inner_class(): … [View More]

1 0

Should our default random number generator be secure?
by Guido van Rossum Sept. 16, 2015

Sept. 16, 2015

I've received several long emails from Theo de Raadt (OpenBSD founder) about Python's default random number generator. This is the random module, and it defaults to a Mersenne Twister (MT) seeded by 2500 bytes of entropy taken from os.urandom(). Theo's worry is that while the starting seed is fine, MT is not good when random numbers are used for crypto and other security purposes. I've countered that it's not meant for that (you should use random.SystemRandom() or os.urandom() for that) but he … [View More]

28 125

Globally configurable random number generation
by Nick Coghlan Sept. 15, 2015

Sept. 15, 2015

This is an expansion of the random module enhancement idea I previously posted to Donald's thread: https://mail.python.org/pipermail/python-ideas/2015-September/035969.html I'll write it up as a full PEP later, but I think it's just as useful in this form for now. = Defining the problem = We're moving into an era where the easiest way to publish software is as a web application, with "deployment" to client systems done at runtime via a web browser. It's regularly the case that "learn to … [View More]program" classes (especially those aimed at adults picking up programming for the first time) will introduce folks to both a web development framework and how to deploy web applications on a developer focused service with a free hosting tier, like Heroku or OpenShift. It's also the case that we live in an era where there's a lot of well-intentioned-but-actually-bad advice on the internet when it comes to generating security sensitive tokens, and the folks receiving that advice through forums like Stack Overflow aren't necessarily ever going to see the "don't do that" guidance in the standard library's random module documentation, or the docs for the cryptography library, or the docs for a web framework like Flask, Django or Pyramid. One of the ways we know many of the folks doing web development often don't take admonitions in documentation seriously is because one of the most popular web servers for Python on these kinds of services is Django's "runserver", even though Django's docs specifically say only to use that for local development. It isn't OK to say "the developers deserve the consequences that come to them" as in many case, it isn't the developers that suffer the consequences, but the users of their applications. One reason we know weak RNGs can be a problem in practice is because the same kind of concern exists in PHP web applications, and https://media.blackhat.com/bh-us-12/Briefings/Argyros/BH_US_12_Argyros_PRNG… shows how the relative predictability of password reset tokens can be used to compromise administrator accounts. Rather than playing whackamole with individual web applications (many of which will be written by inexperienced developers), or attempting to demonstrate that a deterministic PRNG is "secure enough" for these use cases (when the research on PHP and deterministic PRNGs in general indicates that it isn't), it is proposed to migrate Python to a default random implementation that *is* known to be secure enough for these kinds of use cases. At the same time, deterministic random number generation is still desirable in many situations, and we also don't want to require that folks learning Python in the future be required to take a crash course in web application security theory first. Thus, it is also proposed that the abstraction used to present these differences to end users minimise the references to the underlying security concepts. A key outcome of this proposal is that it will retroactively upgrade a lot of existing instructions on the internet for generating default passwords and other sensitive tokens in Python from "actively harmful" to "not necessarily ideal, but at least not wrong if you're using Python 3.6+". This *is* a compatibility break for the sake of correcting default behaviours that are fine when developing applications for local use, but problematic from a network service security perspective, just as happened with the introduction of hash randomisation. Unlike the hash randomisation change, this one is readily addressed in old versions on a case by case basis, so it is only proposed to make the change in a future feature release of Python, not in any current maintenance releases. = Core abstraction = The core concept of this proposal involves classifying random number generators in Python as follows: * seedable * seedless * system These terms are chosen to make sense to folks that have *no idea* about the way different kinds of random number generator work and how that affects their security properties, but do know whether or not they need to be able to pass in a particular fixed seed in order to regenerate the same series of outputs. The guidance to Python users is then: * we use the seedless RNG by default as it provides the best balance of speed and security * if you need to be able to exactly reproduce output sequences, use the seedable RNG * if you know you're doing security sensitive work, use the system RNG directly to eliminate Python's seedless RNG as a potential source of vulnerabilities Importantly, there are relatively simple answers to the following two questions (which could be added to the Design FAQ): Q: Why isn't the seedable RNG the default random implementation (any more)? A: The same properties that make it possible to provide an explicit seed to the seedable RNG and get a predictable series of outputs make it inappropriate for tasks like generating session IDs and password reset tokens in web applications. Since folks continued to use the default RNG for those cases, even after years of the core development team, web framework developers and security engineers saying "Don't do that, use the system RNG instead", we eventually changed the default behaviour to just make those cases OK. Q: Why isn't the system RNG the default implementation? A: Due to the way operating systems work, calling into the kernel to get a random number is always going to be slower than generating one within the Python runtime. The default seedless generator provides most of the same benefits as using the system RNG directly, but is an order of magnitude faster as it doesn't need to call into the kernel as often. = Proposed change for Python 3.6 = * add a random.SeedlessRandom API that omits the seed(), getstate() and setstate() methods and uses a cryptographically secure PRNG internally (such as the ChaCha20 algorithm implemented by OpenBSD) * rename random.Random to random.SeedableRandom * make random.Random a subclass of SeedableRandom that deprecates seed(), getstate() and setstate() * deprecate the seed(), getstate() and setstate() methods on SystemRandom * expose the global SeedableRandom instance as random.seedable_random * expose a global SeedlessRandom instance as random.seedless_random * expose a global SystemRandom instance as random.system_random * provide a random.set_default_instance() API that makes it possible to specify the instance used by the module level methods * the module level seed(), getstate(), and setstate() functions will throw RuntimeError if the corresponding method is missing from the default instance In 3.6, "random.set_default_instance(random.seedless_random)" will opt in to the CSPRNG when using the module level functions process wide, while "from random import seedless_random as random" will do so on a module by module basis. "from random import system_random as random" also becomes available as a simple upgrade path for security sensitive modules. Appropriate helpers would be added to the six and future projects to allow single source Python 2/3 projects to easily cope with the change in behaviour when using the seeded RNG for its intended purposes. For many projects, compatibility code will consist of the following lines in a compatibility module: try: from random import seedable_random as random except ImportError: import random It would also be desirable for the seedless random number generator to be made available as a PyPI package for use on older Python versions. = Proposed change for Python 3.7 = * random.Random becomes an alias for random.SeedlessRandom * the default instance changes to be random.seedless_random In 3.7, "random.set_default_instance(random.seedable_random)" will opt back in to the deterministic PRNG when using the module level functions process wide, while "from random import seedable_random as random" will do so on a module by module basis. = Seedable random number generation = This is what we have today. The MT random implementation supports explicit seeding, state retrieval, and state restoration. It doesn't automatically mix in additional system entropy as it operates. This is the right choice for use cases like computer games, map generation, and randomising the order of test execution, as in these situations, it's desirable to be able to reproduce a past sequence exactly. = Seedless random number generators = This is the key proposed new addition: a cryptographically secure, non-deterministic, userspace PRNG. It's faster than the system RNG as it avoids the need to make a system API call. The "seedless" name comes from the fact that the inability to feed in a fixed seed is the most obvious API difference relative to deterministic RNGs, and hence provides a mental hook for people to remember which is which, without needing to know the relevant background security theory (which is arcane enough to be opaque even to developers with decades of experience and hence isn't something we want to be inflicting on folks in the process of learning to program). = System random number generator = The only proposed change here is providing a default instance to enable the "from random import system_random as random" pattern. Regards, Nick. -- Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia [View Less]

15 33

Python's Source of Randomness and the random.py module Redux
by Donald Stufft Sept. 15, 2015

Sept. 15, 2015

Ok, I reached out to Theo de Raadt to talk to him about what he was suggesting without Guido having to play messenger and forward fragments of the email conversation. I'm starting a new thread because this email is rather long, and I'm hoping to divorce it a bit from the back and forth about a proposal that wasn't exactly what Theo was suggesting that is being discussed in the other thread. Essentially, there are three basic types of uses of random (the concept, not the module). Those are: 1. … [View More]People/usecases who absolutely need deterministic output given a seed and for whom security properties don't matter. 2. People/usecases who absolutely need a cryptographically random output and for whom having a deterministic output is a downside. 3. People/usecases that fall somewhere in between where it may or may not be security sensitive or it may not be known if it's security sensitive. The people in group #1 are currently, in the Python standard library, best served using the MT random source as it provides exactly the kind of determinsm they need. The people in group #2 are currently, in the Python standard library, best served using os.urandom (either directly or via random.SystemRandom). However, the third case is the one that Theo's suggestion is attempting to solve. In the current landscape, the security minded folks will tell these people to use os.urandom/random.SystemRandom and the performance or otherwise less security minded folks will likely tell them to just use random.py. Leaving these people with a random that is not cryptographically safe. The questin then is, does it matter if #3 are using a cryptographically safe source of randomness? The answer is obviously that we don't know, and it's possible that the user doesn't know. In these cases it's typically best if we default to the more secure option and expect people to opt in to insecurity. In the case of randomness, a lot of languages (Python included) don't do that and instead they opt to pick the more peformant option first, often with the argument (as seen in the other thread) that if people need a cryptographically secure source of random, they'll know how to look for it and if they don't know how to look for it, then it's likely they'll have some other security problem. I think (and I believe Theo thinks) this sort of thinking is short sighted. Let's take an example of a web application, it's going to need session identifiers to put into a cookie, you'll want these to be random and it's not obvious on the tin for a non-expert that you can't just use the module level functions in the random module to do this. Another examples are generating API keys or a password. Looking on google, the first result for "python random password" is StackOverflow which suggests: ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N)) However, it was later edited to, after that, include: ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N)) So it wasn't obvious to the person who answered that question that the random module's module scoped functions were not appropiate for this use. It appears that the original answer lasted for roughly 4 years before it was corrected, so who knows how many people used that in those 4 years. The second result has someone asking if there is a better way to generate a random password in Python than: import os, random, string length = 13 chars = string.ascii_letters + string.digits + '!@#$%^&*()' random.seed = (os.urandom(1024)) print ''.join(random.choice(chars) for i in range(length)) This person obviously knew that os.urandom existed and that he should use it, but failed to correctly identify that the random module's module scoped functions were not what he wanted to use here. The third result has this code: import string import random def randompassword(): chars=string.ascii_uppercase + string.ascii_lowercase + string.digits size=8 return ''.join(random.choice(chars) for x in range(size,12)) I'm not going to keep pasting snippets, but going through the results it is clear that in the bulk of cases, this search turns up code snippets that suggest there is likely to be a lot of code out there that is unknownly using the random module in a very insecure way. I think this is a failing of the random.py module to provide an API that guides users to be safe which was attempted to be papered over by adding a warning to the documentation, however like has been said before, you can't solve a UX problem with documentation. Then we come to why might we want to not provide a safe random by default for the folks in the #3 group. As we've seen in the other thread, this basically boils down to the fact that for a lot of users they don't care about the security properties and they just want a fast random-esque value. This particular case is made stronger by the fact that there is a lot of code out there using Python's random module in a completely safe way that would regress in a meaningful way if the random module slowed down. The fact that speed is the primary reason not to give people in #3 a cryptographically secure source of random by default is where we come back to the meat of Theo's suggestion. His claim is that invoking os.urandom through any of the interfaces imposes a performance penalty because it has to round trip through the kernel crypto sub system for every request. His suggestion is essentially that we provide an interface to a modern, good, userland cryptographically secure source of random that is running within the same process as Python itself. One such example of this is the arc4random function (which doesn't actually provide ARC4 on OpenBSD, it provides ChaCha, it's not tied to one specific algorithm) which comes from libc on many platforms. According to Theo, modern userland CSPRNGs can create random bytes faster than memcpy which eliminates the argument of speed for why a CSPRNG shouldn't be the "default" source of randomness. Thus the proposal is essentially: * Provide an API to access a modern userland CSPRNG. * Provide an implementation of random.SomeKindOfRandom that utilizes this. * Move the MT based implementation of the random module to random.DeterministicRandom. * Deprecate the module scoped functions, instructing people to use the new random.SomeKindofRandom unless they need deterministic random, in which case use random.DeterministicRandom. This can of course be tweaked one way or the other, but that's the general idea translated into something actionable for Python. I'm not sure exactly how I feel about it, but I certainly do think that the current situation is confusing to end users and leaving them in an insecure state, and that a minimum we should move MT to something like random.DeterministicRandom and deprecate the module scoped functions because it seems obvious to me that the idea of a "default" random function that isn't safe is a footgun for users. As an additional consideration, there are security experts who believe that userland CSPRNGs should not be used at all. One of those is Thomas Ptacek who wrote a blog post [1] on the subject. In this, Thomas makes the case that a userland CSPRNG pretty much always depends on the cryptographic security of the system random, but that it itself may be broken which means you're adding a second, single point of failure where a mistake can cause you to get non-random data out of the system. I had asked Theo about this, and he stated that he disagreed with Thomas about never using a userland CSPRNG and in his opinion that blog post was mostly warning people away from using something like MT in the userland and away from /dev/random (which is often the cause of people reaching for MT because /dev/random blocks which makes programs even slower). It seems to boil down to, do we want to try to protect users by default or at least make it more obvious in the API which one they want to use (I think yes), and if so do we think that /dev/urandom is "fast enough" for most people in group #3 and if not, do we agree with Theo that a modern userland CSPRNG is safe enough to use, or do we agree with Thomas that it's not and if we think that it is, do we use arc4random and what do we do on systems that don't have a modern userland CSPRNG in their libc. [1] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/ ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA [View Less]

24 100

Re: [Python-ideas] Should our default random number generator be secure?
by Tim Peters Sept. 13, 2015

Sept. 13, 2015

[Nathaniel Smith <njs(a)vorpus.org>] > Yeah, the independent-seed-for-each-thread approach works for any RNG, but > just like people feel better if they have a 100% certified guarantee that > the RNG output in a single thread will pass through every combination of > possible values (if you wait some cosmological time), they also feel better > if there is some 100% certified guarantee that the RNG values in two threads > will also be uncorrelated with each other. > … [View More]

3 4

Round division
by Serhiy Storchaka Sept. 12, 2015

Sept. 12, 2015

In Python there is a operation for floor division: a // b. Ceil division easy can be expressed via floor division: -((-a) // b). But round division is more complicated. This operation is needed in Fraction.__round__, in a number of methods in the datetime module (see _divide_and_round). Due to the complexity of the correct Python implementation, it is slower then just division. I propose to add special function in the math module. This not only will speed up Python implementation of the … [View More]

9 10

Re: [Python-ideas] Should our default random number generator be secure?
by M.-A. Lemburg Sept. 12, 2015

Sept. 12, 2015

On 10.09.2015 19:04, Xavier Combelle wrote: >> I think this is the major misunderstanding here: >> >> The random module never suggested that it generates pseudo-random data >> of crypto quality. >> >> I'm pretty sure people doing crypto will know and most others >> simply don't care :-) >> >> Evidence: We used a Wichmann-Hill PRNG as default in random >> for a decade and people still got their work done. Mersenne >> was added in … [View More]Python 2.3 and bumped the period from >> 6,953,607,871,644 (13 digits) to 2**19937-1 (6002 digits). > > It is not a evidence, I have an evidence of the opposite: > some people can and does use random.random() for generating session key or > csrf tokens and it's an insecure default. It all depends on what you consider "secure" or "secure enough" and points directly to another misunderstanding: that "secure" is a well-defined term :-) The random module seeds its global Random instance using urandom (if available on the system), so while the generator itself is deterministic, the seed used to kick off the pseudo-random series is not. For many purposes, this is secure enough. It's also easy to make the output of the random instance more secure by passing it through a crypto hash function. But back to the original question: What is "secure" ? In crypto terms, "secure" usually refers to "computationally infeasible to calculate before the sun goes dark" (to take one variant). More realistically, it can be defined as: Based on the public knowledge known today, it's impossible to run a program which allows converting the output of a crypto function back to its inputs within a reasonable time span. And this property will - based on today's knowledge - hold for at least the next 5-10 years. You may notice the many parameters in these definition attempts. It all depends on who you ask. With the advent of new technologies like quantum computers, it's not at all clear that any of those definitions will still hold in a couple of years. It's well possible that only quantum computers will be able to implement the necessary programs and it'll take a while for mobile phones to catch up and come with chips implementing those ;-) Now, leaving aside this bright future, what's reasonable today ? If you look at tools like untwister: https://github.com/bishopfox/untwister you can get a feeling for how long it takes to deduce the seed from an output sequence. Bare in mind, that in order to be reasonably sure that the seed is correct, the available output sequence has to be long enough. That's a known plain text attack, so you need access to lots of session keys to begin with. The tools is still running on an example set of 1000 32-bit numbers and it says it'll be done in 1.5 hours, i.e. before the sun goes down in my timezone. I'll leave it running to see whether it can find my secret key. Untwister is only slightly smarter than bruteforce. Given that MT has a seed size of 32 bits, it's not surprising that a tool can find the seed within a day. Perhaps it's time to switch to a better version of MT, e.g. a 64-bit version (with 64-bit internal state): http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html or an even faster SIMD variant with better properties and 128 bit internal state: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html Esp. the latter will help make brute force attacks practically impossible. Tim ? BTW: Looking at the sources of the _random module, I found that the seed function uses the hash of non-integers such as e.g. strings passed to it as seeds. Given the hash randomization for strings this will create non-deterministic results, so it's probably wise to only use 32-bit integers as seed values for portability, if you need to rely on seeding the global Python RNG. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 11 2015) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> mxODBC Plone/Zope Database Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2015-09-18: PyCon UK 2015 ... 7 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ [View Less]

4 13

Re: [Python-ideas] Python's Source of Randomness and the random.py module Redux
by random832＠fastmail.us Sept. 11, 2015

Sept. 11, 2015

On Fri, Sep 11, 2015, at 09:36, Steven D'Aprano wrote: > Yes, calling `random.choice` is *significantly better* than calling > `random.SomethingRandom().choice`. It's better for beginners, it's even > better for expert users whose random needs are small, and those whose > needs are greater shouldn't be using the later anyway. Why is it that people who need deterministic/seed based random aren't considered to be "those whose needs are greater"?

2 2