Re: [Python-Dev] Fuzzing bugs: most bugs are closed

Le Saturday 19 July 2008 21:52:09 A.M. Kuchling, vous avez écrit :
Excellent work! Another fruitful area for fuzzing might be the miniature virtual machine used by the re module. It's possible to import _sre and call the compile() function directly (see the end of Lib/sre_compile.py for how it's invoked); I wonder how the regex VM copes with random strings of bytecode.
Hum... how can I say it? It's trivial to crash _sre :-) So I blacklisted _sre.compile() in my fuzzer. For information, it's also very easy to crash CPython with fuzzed .pyc file. It's hard to check bytecode without execute it. It's maybe better to add checks directly in the VM. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/

Victor Stinner wrote:
Le Saturday 19 July 2008 21:52:09 A.M. Kuchling, vous avez écrit :
Excellent work! Another fruitful area for fuzzing might be the miniature virtual machine used by the re module. It's possible to import _sre and call the compile() function directly (see the end of Lib/sre_compile.py for how it's invoked); I wonder how the regex VM copes with random strings of bytecode.
Hum... how can I say it? It's trivial to crash _sre :-) So I blacklisted _sre.compile() in my fuzzer.
For information, it's also very easy to crash CPython with fuzzed .pyc file.
It's hard to check bytecode without execute it. It's maybe better to add checks directly in the VM.
I think you'll find most developers (and many users too, come to that) reluctant to add any checking that would slow down eval.c, the heart of the virtual machine. So unless you can find a way to add the checks without slowing it down, an external checker might be better. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/

On 2008-07-20 22:45, Victor Stinner wrote:
Le Saturday 19 July 2008 21:52:09 A.M. Kuchling, vous avez écrit :
Excellent work! Another fruitful area for fuzzing might be the miniature virtual machine used by the re module. It's possible to import _sre and call the compile() function directly (see the end of Lib/sre_compile.py for how it's invoked); I wonder how the regex VM copes with random strings of bytecode.
Hum... how can I say it? It's trivial to crash _sre :-) So I blacklisted _sre.compile() in my fuzzer.
For information, it's also very easy to crash CPython with fuzzed .pyc file.
It's hard to check bytecode without execute it. It's maybe better to add checks directly in the VM.
I don't see that as a big problem: if you execute untrusted byte code, you are on your own anyway... whether that's byte code for the re engine or ceval. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 21 2008)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611

On Sun, Jul 20, 2008 at 10:45:39PM +0200, Victor Stinner wrote:
Hum... how can I say it? It's trivial to crash _sre :-) So I blacklisted _sre.compile() in my fuzzer.
We should certainly try to fix those issues, then; people usually assume the re module is safe for use inside a sandbox and probably aren't careful enough to block importing of the _sre module. --amk

Le Monday 21 July 2008 15:33:19 A.M. Kuchling, vous avez écrit :
On Sun, Jul 20, 2008 at 10:45:39PM +0200, Victor Stinner wrote:
Hum... how can I say it? It's trivial to crash _sre :-) So I blacklisted _sre.compile() in my fuzzer.
We should certainly try to fix those issues, then; people usually assume the re module is safe for use inside a sandbox and probably aren't careful enough to block importing of the _sre module.
Why is this function public? Is it used by re module? Only _sre module should be allowed to generated "regex bytecode". -- Victor Stinner aka haypo http://www.haypocalc.com/blog/

Victor Stinner <victor.stinner <at> haypocalc.com> writes:
Le Monday 21 July 2008 15:33:19 A.M. Kuchling, vous avez écrit :
On Sun, Jul 20, 2008 at 10:45:39PM +0200, Victor Stinner wrote:
Hum... how can I say it? It's trivial to crash _sre So I blacklisted _sre.compile() in my fuzzer.
We should certainly try to fix those issues, then; people usually assume the re module is safe for use inside a sandbox and probably aren't careful enough to block importing of the _sre module.
Why is this function public? Is it used by re module? Only _sre module should be allowed to generated "regex bytecode".
The underscore at the beginning of _sre clearly indicates that the module is not recommended for direct consumption, IMO. Even the functions that don't themselves start with an underscore...

yOn Mon, Jul 21, 2008 at 03:53:18PM +0000, Antoine Pitrou wrote:
The underscore at the beginning of _sre clearly indicates that the module is not recommended for direct consumption, IMO. Even the functions that don't themselves start with an underscore...
Sure, but if someone is trying to break in or DoS your application server, they don't care if the module starts with an underscore or not. To answer Victor's original question: the parser & compiler that turn a regex into bytecode is written in Python. I can't think of a way to prevent other Python modules from importing _sre or accessing the compile() function; if nothing else, code could always do 'import re ; re.sre_compile._sre.compile(...)'. --amk

On Mon, Jul 21, 2008 at 10:41 AM, A.M. Kuchling <amk@amk.ca> wrote:
On Mon, Jul 21, 2008 at 03:53:18PM +0000, Antoine Pitrou wrote:
The underscore at the beginning of _sre clearly indicates that the module is not recommended for direct consumption, IMO. Even the functions that don't themselves start with an underscore...
Sure, but if someone is trying to break in or DoS your application server, they don't care if the module starts with an underscore or not.
To answer Victor's original question: the parser & compiler that turn a regex into bytecode is written in Python. I can't think of a way to prevent other Python modules from importing _sre or accessing the compile() function; if nothing else, code could always do 'import re ; re.sre_compile._sre.compile(...)'.
I've written a re-code verifier for the Google App Engine. I have permission to open source this, hopefully I will get to this before 2.6 beta 3. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wed, Jul 30, 2008 at 11:17 AM, Guido van Rossum <guido@python.org> wrote:
On Mon, Jul 21, 2008 at 10:41 AM, A.M. Kuchling <amk@amk.ca> wrote:
On Mon, Jul 21, 2008 at 03:53:18PM +0000, Antoine Pitrou wrote:
The underscore at the beginning of _sre clearly indicates that the module is not recommended for direct consumption, IMO. Even the functions that don't themselves start with an underscore...
Sure, but if someone is trying to break in or DoS your application server, they don't care if the module starts with an underscore or not.
To answer Victor's original question: the parser & compiler that turn a regex into bytecode is written in Python. I can't think of a way to prevent other Python modules from importing _sre or accessing the compile() function; if nothing else, code could always do 'import re ; re.sre_compile._sre.compile(...)'.
I've written a re-code verifier for the Google App Engine. I have permission to open source this, hopefully I will get to this before 2.6 beta 3.
The code is now in the bug tracker: http://bugs.python.org/issue3487 I'll hold off submitting for a while until Barry has had the time to veto it (or hopefully not :-). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
The underscore at the beginning of _sre clearly indicates that the module is not recommended for direct consumption, IMO. Even the functions that don't themselves start with an underscore...
I've written a re-code verifier for the Google App Engine
... which means that a protection against "evil _sre bytecode" was needed :-) Thanks Google and Guido to release this validator. Victor
participants (6)
-
A.M. Kuchling
-
Antoine Pitrou
-
Guido van Rossum
-
M.-A. Lemburg
-
Steve Holden
-
Victor Stinner