Any chance on (slowly) deprecating `eval` and `exec` as builtins?

Hi, The dangers of eval and exec are obvious and well known to advanced users, but the availability as built-in functions makes it too tempting for beginners or even medium-level programmers. You can see questions about these function pretty often in stackoverflow (roughly once a day <https://stackoverflow.com/search?tab=newest&q=eval%20python>, though sometimes the uses are legitimate). Maybe we could start a ten-year process of deprecating the use of `builtins.eval` (in the docs, and then with warnings)? `builtins.eval` will be a wrapper to the real evaluation function, moved to `unsafe.eval` or something obvious like that, so all you need to do to port your code is to add `from unsafe import unsafe_eval as eval, unsafe_exec as exec` at the top of the file; it will be a nice warning to the reader. The fact that it is a wrapper will slightly slow it down and make the stack traces noisier - both are good things, IMO. Also, it is unfortunate that `ast.literal_eval` is less accessible than `builtins.eval`. Giving it an alias in builtins might make it easier for programmers (and less scary - "ast" might sound like I need a PhD to use it). What do you think? Elazar

07.11.17 12:29, אלעזר пише:
ast.literal_eval is not so safe as you think. Malicious input can cause a stack overflow in your program. [1] [1] https://bugs.python.org/issue31113

On Tue, Nov 07, 2017 at 03:35:58PM +0200, Serhiy Storchaka wrote:
I don't see anything about literal_eval in that bug report. In any case, I think that securing literal_eval is much simpler than securing eval: try: # a thousand character expression ought to be enough for # any legitimate purpose... value = literal_eval(tainted_string[:1000]) # untested except MemoryError: value = None versus, well, there's no obvious or simple way to secure eval. You might get something reasonable with: if '_' in tainted_string: raise ValueError("no underscores allowed!") globalns = {'__builtins__': None} # I think? eval(tainted_string[:1000], globalns, globalns) but even that is probably vulnerable to a clever enough attacker. I tried this on my computer with Python 3.5: # Build a deeply nested dict {'x':{'x':{...}}} as a string s = "{'x':0}" for i in range(1665): s = "{'x':%s}" % s assert len(s) < 10000 d = ast.literal_eval(s) and it failed with MemoryError rather than a segfault, so that's something. -- Steve

On Tue, Nov 7, 2017 at 6:41 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I agree -- literal_eval is a really nice utility, but folks are not likely to find it...or think it's as simple a tool as it is... In fact, a couple weeks ago, one of the students in my intro to python class used it -- good for him!), When my TA was reviewing the code (and she's a pretty experienced developer), she asked me: what is the ast module? I know it's the "Abstract Syntax Tree" module, so was very surprised that an intro student was messing about the the AST! when I looked at the code, I saw, oh that's literal_eval -- a simple (seeming, anyway) function that a beginner may just want to use (good old stack overflow...) I can see why liter_eval is needed in the AST module, but it's also a useful tool by itself,a nd that seems like a very strange place to store it... In any case, I think that securing literal_eval is much simpler than
sure -- though I'd use a lot more than 1000 characters -- not much these days, and you might want to unpack something like a JSON data package... And I'd raise an exception if it's too big, rather than trying to evaluate the subset... Maybe something like this should be patched into it? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Nov 8, 2017 at 7:33 AM, Chris Barker <chris.barker@noaa.gov> wrote:
That's the trouble, though. It's perfectly safe to literal_eval a large amount of well-formed data (say, a dict display with simple keys and good-sized strings as values), but you can cause major problems by literal_evalling a relatively small amount of malicious data (eg "["*100 bombs out with MemoryError, and I wouldn't trust that there isn't something far worse). If you're working with untrusted data, you probably should be using json.loads rather than ast.literal_eval. -1 on hiding eval/exec; these features exist in many languages, and they're identically dangerous everywhere. Basically, use eval only with text from the owner of the system, not from anyone untrusted. ChrisA

On Wed, Nov 8, 2017 at 8:39 AM, אלעזר <elazarg@gmail.com> wrote:
If someone is using eval/exec with untrusted code, no amount of hiding-behind-imports is going to change that. A quick glance at the Stack Overflow search you linked to (just the search results themselves - I didn't dive deeper) shows only a few that would be affected by this change, and most of them are from people who seem to at least broadly understand what's going on. So the benefit isn't going to be huge, and a backward compatibility break is extremely annoying (even obscure functions like reduce incurred some backlash when they were "hidden" behind an import). Hence I'm -1 on changing this. Had Python always had eval off in some module, I wouldn't push for its promotion to builtin, but IMO the cost of moving it is greater than any benefit of protection. The dangers of eval/exec should be well known. ChrisA

07.11.17 16:41, Steven D'Aprano пише:
Sorry, this particular issue isn't related to literal_eval. There was other recently fixed issue, but I forgot its number. But the point is that the compiler is recursive, and processing nested constructs consumes the C stack. There are some guards against too deep recursion (2.7 has less guards and more vulnerable), but it is hard to prove that all vulnerabilities are fixed. Your method (limiting the size of the input) helps against some attacks. Other methods -- restricting the set of characters and the number of parenthesis, braces and brackets.

But the point is that the compiler is recursive, and processing nested constructs consumes the C stack. There are some guards against too deep recursion (2.7 has less guards and more vulnerable), but it is hard to prove that all vulnerabilities are fixed. Your method (limiting the size of the input) helps against some attacks. Other methods -- restricting the set of characters and the number of parenthesis, braces and brackets. Hmm — I’d never really thought about it, bust presumably ast.literal_eval was designed for use in the compiler— or at least uses the compiler to do its real work. So maybe what we really need is a literal-eval that is DESIGNED to be a safe Python literal parser. Like a JSON parser but supporting the richer Python literal set. -CHB

On Tue, Nov 7, 2017 at 2:29 AM, אלעזר <elazarg@gmail.com> wrote:
I find it dubious to claim that these functions are dangerous to beginners. The dangers are related to attacks on servers that are exposed to the internet and beginners have no business running servers. Once you start exposing your code to attackers there are a lot of other things you have to worry about, and exec/eval are at least easily found using grep, unlike some other unsafe patterns. -- --Guido van Rossum (python.org/~guido)

On Tue, Nov 07, 2017 at 01:53:00PM -0800, Guido van Rossum wrote:
I don't think its so much that eval/exec are in themselves dangerous to beginners as that their easy availability as builtins encourages bad habits that can last long after the programmer is no longer a beginner. I know the Python ecosystem is not quite the wild west as PHP and Javascript sometimes is, but code injection attacks do exist: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9807 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9802 Sometimes they're written by beginners whose code isn't being reviewed carefully enough, and sometimes they're written by experienced coders who have simply learned bad habits and haven't learned better. I don't want to scare people away from using eval/exec, but it would be great if we could gently encourage them to think before using them, and to prefer literal_eval instead. -- Steve

On Tue, Nov 7, 2017 at 6:26 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, I'm all for making sure the documentation is clear. But the proposal at hand is to remove them from the builtins, and I don't see the situation as grave as needing that. -- --Guido van Rossum (python.org/~guido)

07.11.17 12:29, אלעזר пише:
ast.literal_eval is not so safe as you think. Malicious input can cause a stack overflow in your program. [1] [1] https://bugs.python.org/issue31113

On Tue, Nov 07, 2017 at 03:35:58PM +0200, Serhiy Storchaka wrote:
I don't see anything about literal_eval in that bug report. In any case, I think that securing literal_eval is much simpler than securing eval: try: # a thousand character expression ought to be enough for # any legitimate purpose... value = literal_eval(tainted_string[:1000]) # untested except MemoryError: value = None versus, well, there's no obvious or simple way to secure eval. You might get something reasonable with: if '_' in tainted_string: raise ValueError("no underscores allowed!") globalns = {'__builtins__': None} # I think? eval(tainted_string[:1000], globalns, globalns) but even that is probably vulnerable to a clever enough attacker. I tried this on my computer with Python 3.5: # Build a deeply nested dict {'x':{'x':{...}}} as a string s = "{'x':0}" for i in range(1665): s = "{'x':%s}" % s assert len(s) < 10000 d = ast.literal_eval(s) and it failed with MemoryError rather than a segfault, so that's something. -- Steve

On Tue, Nov 7, 2017 at 6:41 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I agree -- literal_eval is a really nice utility, but folks are not likely to find it...or think it's as simple a tool as it is... In fact, a couple weeks ago, one of the students in my intro to python class used it -- good for him!), When my TA was reviewing the code (and she's a pretty experienced developer), she asked me: what is the ast module? I know it's the "Abstract Syntax Tree" module, so was very surprised that an intro student was messing about the the AST! when I looked at the code, I saw, oh that's literal_eval -- a simple (seeming, anyway) function that a beginner may just want to use (good old stack overflow...) I can see why liter_eval is needed in the AST module, but it's also a useful tool by itself,a nd that seems like a very strange place to store it... In any case, I think that securing literal_eval is much simpler than
sure -- though I'd use a lot more than 1000 characters -- not much these days, and you might want to unpack something like a JSON data package... And I'd raise an exception if it's too big, rather than trying to evaluate the subset... Maybe something like this should be patched into it? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Wed, Nov 8, 2017 at 7:33 AM, Chris Barker <chris.barker@noaa.gov> wrote:
That's the trouble, though. It's perfectly safe to literal_eval a large amount of well-formed data (say, a dict display with simple keys and good-sized strings as values), but you can cause major problems by literal_evalling a relatively small amount of malicious data (eg "["*100 bombs out with MemoryError, and I wouldn't trust that there isn't something far worse). If you're working with untrusted data, you probably should be using json.loads rather than ast.literal_eval. -1 on hiding eval/exec; these features exist in many languages, and they're identically dangerous everywhere. Basically, use eval only with text from the owner of the system, not from anyone untrusted. ChrisA

On Wed, Nov 8, 2017 at 8:39 AM, אלעזר <elazarg@gmail.com> wrote:
If someone is using eval/exec with untrusted code, no amount of hiding-behind-imports is going to change that. A quick glance at the Stack Overflow search you linked to (just the search results themselves - I didn't dive deeper) shows only a few that would be affected by this change, and most of them are from people who seem to at least broadly understand what's going on. So the benefit isn't going to be huge, and a backward compatibility break is extremely annoying (even obscure functions like reduce incurred some backlash when they were "hidden" behind an import). Hence I'm -1 on changing this. Had Python always had eval off in some module, I wouldn't push for its promotion to builtin, but IMO the cost of moving it is greater than any benefit of protection. The dangers of eval/exec should be well known. ChrisA

07.11.17 16:41, Steven D'Aprano пише:
Sorry, this particular issue isn't related to literal_eval. There was other recently fixed issue, but I forgot its number. But the point is that the compiler is recursive, and processing nested constructs consumes the C stack. There are some guards against too deep recursion (2.7 has less guards and more vulnerable), but it is hard to prove that all vulnerabilities are fixed. Your method (limiting the size of the input) helps against some attacks. Other methods -- restricting the set of characters and the number of parenthesis, braces and brackets.

But the point is that the compiler is recursive, and processing nested constructs consumes the C stack. There are some guards against too deep recursion (2.7 has less guards and more vulnerable), but it is hard to prove that all vulnerabilities are fixed. Your method (limiting the size of the input) helps against some attacks. Other methods -- restricting the set of characters and the number of parenthesis, braces and brackets. Hmm — I’d never really thought about it, bust presumably ast.literal_eval was designed for use in the compiler— or at least uses the compiler to do its real work. So maybe what we really need is a literal-eval that is DESIGNED to be a safe Python literal parser. Like a JSON parser but supporting the richer Python literal set. -CHB

On Tue, Nov 7, 2017 at 2:29 AM, אלעזר <elazarg@gmail.com> wrote:
I find it dubious to claim that these functions are dangerous to beginners. The dangers are related to attacks on servers that are exposed to the internet and beginners have no business running servers. Once you start exposing your code to attackers there are a lot of other things you have to worry about, and exec/eval are at least easily found using grep, unlike some other unsafe patterns. -- --Guido van Rossum (python.org/~guido)

On Tue, Nov 07, 2017 at 01:53:00PM -0800, Guido van Rossum wrote:
I don't think its so much that eval/exec are in themselves dangerous to beginners as that their easy availability as builtins encourages bad habits that can last long after the programmer is no longer a beginner. I know the Python ecosystem is not quite the wild west as PHP and Javascript sometimes is, but code injection attacks do exist: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9807 https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2017-9802 Sometimes they're written by beginners whose code isn't being reviewed carefully enough, and sometimes they're written by experienced coders who have simply learned bad habits and haven't learned better. I don't want to scare people away from using eval/exec, but it would be great if we could gently encourage them to think before using them, and to prefer literal_eval instead. -- Steve

On Tue, Nov 7, 2017 at 6:26 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Sure, I'm all for making sure the documentation is clear. But the proposal at hand is to remove them from the builtins, and I don't see the situation as grave as needing that. -- --Guido van Rossum (python.org/~guido)
participants (10)
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Ethan Furman
-
Guido van Rossum
-
Michel Desmoulin
-
Serhiy Storchaka
-
Soni L.
-
Steven D'Aprano
-
אלעזר