Fuzzing the Python standard library
Hi, I have been fuzzing[1] various parts of Python standard library for Python 3.7 with python-afl[2] to find out internal implementation issues that exist in the library. What I have been looking for are mainly following: * Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc(). * Differences between values returned by C and Python versions of some functions. quopri module may have these. * Unexpected performance and memory allocation issues. These can be somewhat controversial to fix, if at all, but at least in some cases from end-user perspective it can be really nasty if for example fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of memory allocated and takes very long to calculate. I gave up waiting for that function call to finish after 5 minutes. As this is going to result in a decent amount of bug reports (currently I only filed one[4], although that audio processing area has much more issues to file), I would like to ask your opinion on filing these bug reports. Should I report all issues regarding some specific module in one bug report, or try to further split them into more fine grained reports that may be related? These different types of errors are specifically noticeable in zipfile module that includes a lot of different exception and behavioral types on invalid data https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes . And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or are some of you willing to go through the crashes that pop up and help with the report filing? The code and more verbose description for this is available from https://github.com/Barro/python-stdlib-fuzzers. It works by default on some GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being available and uses shell scripts as wrappers that rely on various tools that may not be installed on all systems by default. As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of automatically creating a regression test suite for each of the fuzzed modules to ensure that the existing functionality (input files under <fuzz-target>/corpus/ directory) does not suddenly result in additional exceptions and that it is more easy to test potential bug fixes (crash inducing files under <fuzz-target>/crashes/ directory). As a downside, this uses two quite specific tools (afl, python-afl) that have further dependencies (Cython) inside them, I doubt the viability of integrating this type of testing as part of normal Python verification process. As a difference to libFuzzer based fuzzing that is already integrated in Python[5], this instruments the actual (and only the) Python code and not the actions that the interpreter does in the background. So this should result in better fuzzer coverage for Python code that is used with the downside that when C functions are called, they are complete black boxes to the fuzzer. I have mainly run these fuzzer instances at most for several hours per module with 4 instances and stopped running no-issue modules after there have been no new coverage discovered after more than 10 minutes. Also I have not really created high quality initial input files, so I wouldn't be surprised if there are more issues lurking around that could be found with throwing more CPU and higher quality fuzzers at the problem. [1]: https://en.wikipedia.org/wiki/Fuzzing [2]: https://github.com/jwilk/python-afl [3]: https://docs.python.org/3/library/netrc.html [4]: https://bugs.python.org/issue34088 [5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz -- Jussi Judin https://jjudin.iki.fi/
I'm not a core Python Dev, but quick question, why would you expect "
fractions.Fraction("1.64E6646466664")" not to take 100s of megabytes and
hours to run?
Simply evaluating: 164 * 10**664646666 will take hundreds of megabytes by
definition.
Regards
Damian
On Tue, Jul 17, 2018, 12:54 Jussi Judin
Hi,
I have been fuzzing[1] various parts of Python standard library for Python 3.7 with python-afl[2] to find out internal implementation issues that exist in the library. What I have been looking for are mainly following:
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc(). * Differences between values returned by C and Python versions of some functions. quopri module may have these. * Unexpected performance and memory allocation issues. These can be somewhat controversial to fix, if at all, but at least in some cases from end-user perspective it can be really nasty if for example fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of memory allocated and takes very long to calculate. I gave up waiting for that function call to finish after 5 minutes.
As this is going to result in a decent amount of bug reports (currently I only filed one[4], although that audio processing area has much more issues to file), I would like to ask your opinion on filing these bug reports. Should I report all issues regarding some specific module in one bug report, or try to further split them into more fine grained reports that may be related? These different types of errors are specifically noticeable in zipfile module that includes a lot of different exception and behavioral types on invalid data < https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes> . And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or are some of you willing to go through the crashes that pop up and help with the report filing?
The code and more verbose description for this is available from < https://github.com/Barro/python-stdlib-fuzzers>. It works by default on some GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being available and uses shell scripts as wrappers that rely on various tools that may not be installed on all systems by default.
As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of automatically creating a regression test suite for each of the fuzzed modules to ensure that the existing functionality (input files under <fuzz-target>/corpus/ directory) does not suddenly result in additional exceptions and that it is more easy to test potential bug fixes (crash inducing files under <fuzz-target>/crashes/ directory).
As a downside, this uses two quite specific tools (afl, python-afl) that have further dependencies (Cython) inside them, I doubt the viability of integrating this type of testing as part of normal Python verification process. As a difference to libFuzzer based fuzzing that is already integrated in Python[5], this instruments the actual (and only the) Python code and not the actions that the interpreter does in the background. So this should result in better fuzzer coverage for Python code that is used with the downside that when C functions are called, they are complete black boxes to the fuzzer.
I have mainly run these fuzzer instances at most for several hours per module with 4 instances and stopped running no-issue modules after there have been no new coverage discovered after more than 10 minutes. Also I have not really created high quality initial input files, so I wouldn't be surprised if there are more issues lurking around that could be found with throwing more CPU and higher quality fuzzers at the problem.
[1]: https://en.wikipedia.org/wiki/Fuzzing [2]: https://github.com/jwilk/python-afl [3]: https://docs.python.org/3/library/netrc.html [4]: https://bugs.python.org/issue34088 [5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz
-- Jussi Judin https://jjudin.iki.fi/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/damian.peter.shaw%40gmail...
Quick answer: undocumented billion laughs/exponential entity expansion type of an attack that is accessible through web through any library that uses fractions module to parse user input (that are actually available on Github). Could be mitigated by explicitly mentioning this in documentation, exposing a setting for engineering notation exponent limits, using non-naive way of storing numbers, or limiting the total size that a number representation can take by default to some limited, but large (for example 1 megabyte), value. More details should be discussed in a bug report that what is the preferred mitigation approach in this case. On Tue, Jul 17, 2018, at 20:26, Damian Shaw wrote:
I'm not a core Python Dev, but quick question, why would you expect " fractions.Fraction("1.64E6646466664")" not to take 100s of megabytes and hours to run?
Simply evaluating: 164 * 10**664646666 will take hundreds of megabytes by definition.
Regards Damian
On Tue, Jul 17, 2018, 12:54 Jussi Judin
wrote: Hi,
I have been fuzzing[1] various parts of Python standard library for Python 3.7 with python-afl[2] to find out internal implementation issues that exist in the library. What I have been looking for are mainly following:
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc(). * Differences between values returned by C and Python versions of some functions. quopri module may have these. * Unexpected performance and memory allocation issues. These can be somewhat controversial to fix, if at all, but at least in some cases from end-user perspective it can be really nasty if for example fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of memory allocated and takes very long to calculate. I gave up waiting for that function call to finish after 5 minutes.
As this is going to result in a decent amount of bug reports (currently I only filed one[4], although that audio processing area has much more issues to file), I would like to ask your opinion on filing these bug reports. Should I report all issues regarding some specific module in one bug report, or try to further split them into more fine grained reports that may be related? These different types of errors are specifically noticeable in zipfile module that includes a lot of different exception and behavioral types on invalid data < https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes> . And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or are some of you willing to go through the crashes that pop up and help with the report filing?
The code and more verbose description for this is available from < https://github.com/Barro/python-stdlib-fuzzers>. It works by default on some GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being available and uses shell scripts as wrappers that rely on various tools that may not be installed on all systems by default.
As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of automatically creating a regression test suite for each of the fuzzed modules to ensure that the existing functionality (input files under <fuzz-target>/corpus/ directory) does not suddenly result in additional exceptions and that it is more easy to test potential bug fixes (crash inducing files under <fuzz-target>/crashes/ directory).
As a downside, this uses two quite specific tools (afl, python-afl) that have further dependencies (Cython) inside them, I doubt the viability of integrating this type of testing as part of normal Python verification process. As a difference to libFuzzer based fuzzing that is already integrated in Python[5], this instruments the actual (and only the) Python code and not the actions that the interpreter does in the background. So this should result in better fuzzer coverage for Python code that is used with the downside that when C functions are called, they are complete black boxes to the fuzzer.
I have mainly run these fuzzer instances at most for several hours per module with 4 instances and stopped running no-issue modules after there have been no new coverage discovered after more than 10 minutes. Also I have not really created high quality initial input files, so I wouldn't be surprised if there are more issues lurking around that could be found with throwing more CPU and higher quality fuzzers at the problem.
[1]: https://en.wikipedia.org/wiki/Fuzzing [2]: https://github.com/jwilk/python-afl [3]: https://docs.python.org/3/library/netrc.html [4]: https://bugs.python.org/issue34088 [5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz
-- Jussi Judin https://jjudin.iki.fi/ _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/damian.peter.shaw%40gmail...
-- Jussi Judin https://jjudin.iki.fi/
On Tue, Jul 17, 2018 at 4:57 PM Jussi Judin
Quick answer: undocumented billion laughs/exponential entity expansion type of an attack that is accessible through web through any library that uses fractions module to parse user input (that are actually available on Github).
Are you suggesting a warning in the fractions documentation to mention that large numbers require large amounts of memory?
In many languages numeric types can't hold arbitrarily large values, and I for one hadn't really previously recognized that if you read in a numeric value with an exponent that it would be represented *exactly* in memory (and thus one object with a very compact representation can take up huge amounts of memory). It's also not *inconceivable* that under the hood Python would represent fractions.Fraction("1.64E6646466664") "lazily" in some fashion so that it did not consume all the memory on disk. It seems to me that "Hey by the way the size of this thing is unbounded and because of exponents small strings can expand to huge objects" is a good tip. On 07/17/2018 06:15 PM, Michael Selik wrote:
On Tue, Jul 17, 2018 at 4:57 PM Jussi Judin
mailto:jjudin%2Bpython@iki.fi> wrote: Quick answer: undocumented billion laughs/exponential entity expansion type of an attack that is accessible through web through any library that uses fractions module to parse user input (that are actually available on Github).
Are you suggesting a warning in the fractions documentation to mention that large numbers require large amounts of memory?
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/paul%40ganssle.io
On Tue, Jul 17, 2018 at 11:44 PM, Paul G
In many languages numeric types can't hold arbitrarily large values, and I for one hadn't really previously recognized that if you read in a numeric value with an exponent that it would be represented *exactly* in memory (and thus one object with a very compact representation can take up huge amounts of memory). It's also not *inconceivable* that under the hood Python would represent fractions.Fraction("1.64E6646466664") "lazily" in some fashion so that it did not consume all the memory on disk.
Sooner or later you are going to need the digits of the number to perform a computation. Exactly when would you propose the deferred evaluation should take place?
There are already occasional inquiries about the effects of creation of such large numbers and their unexpected effects, so they aren't completely unknown. At the same time, this isn't exactly a mainstream "bug", as evidenced by the fact that such issues are relatively rare.
It seems to me that "Hey by the way the size of this thing is unbounded and because of exponents small strings can expand to huge objects" is a good tip.
Not an unreasonable suggestion. Perhaps you could draft a documentation change - personally I'm not even sure where the best place for the warning would be.
On 18 July 2018 at 17:49, Steve Holden
On Tue, Jul 17, 2018 at 11:44 PM, Paul G
wrote: In many languages numeric types can't hold arbitrarily large values, and I for one hadn't really previously recognized that if you read in a numeric value with an exponent that it would be represented *exactly* in memory (and thus one object with a very compact representation can take up huge amounts of memory). It's also not *inconceivable* that under the hood Python would represent fractions.Fraction("1.64E6646466664") "lazily" in some fashion so that it did not consume all the memory on disk.
Sooner or later you are going to need the digits of the number to perform a computation. Exactly when would you propose the deferred evaluation should take place?
There are already occasional inquiries about the effects of creation of such large numbers and their unexpected effects, so they aren't completely unknown. At the same time, this isn't exactly a mainstream "bug", as evidenced by the fact that such issues are relatively rare.
It does mean that if Fraction is being used with untrusted inputs though, it *does* make sense to put a reasonable upper bound on permitted exponents. The default decimal context caps expression results at an exponent of less than 1 million for example:
+decimal.Decimal("1e999_999") Decimal('1E+999999') +decimal.Decimal("1e1_000_000") Traceback (most recent call last): File "<stdin>", line 1, in <module> decimal.Overflow: [
]
That's already large enough to result in a ~415k integer that takes a minute or so for my machine to create if I call int() on it. So I think it does make sense to at least describe how to use the decimal module to do some initial sanity checking on potentially exponential inputs, even if the fractions module itself never gains native support for processing untrusted inputs. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Tue, Jul 17, 2018 at 9:44 AM, Jussi Judin
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc().
My main advice would be, before mass-filing bugs make sure that you and the maintainers agree on what a bug is :-). For example, I can see the argument that invalid encodings in netrc should be reported as NetrcParseError, but there are also many APIs where it's normal to get something like a TypeError even if that's not a documented exception, and it's very annoying as a maintainer to suddenly get 20 bugs where you don't even agree that they're bugs and just have to wade through and close them all. Any assistance you can give with triaging and prioritizing the bugs is also super helpful, since volunteer maintainers aren't necessarily prepared for sudden influxes of issues. In general this sounds like cool work, and help improving Python's quality is always welcome. Just be careful that it's actually helpful :-). -n -- Nathaniel J. Smith -- https://vorpus.org
On Tue, 17 Jul 2018 at 15:41 Nathaniel Smith
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass
On Tue, Jul 17, 2018 at 9:44 AM, Jussi Judin
wrote: properly sanitized file object to the netrc.netrc(). My main advice would be, before mass-filing bugs make sure that you and the maintainers agree on what a bug is :-). For example, I can see the argument that invalid encodings in netrc should be reported as NetrcParseError, but there are also many APIs where it's normal to get something like a TypeError even if that's not a documented exception, and it's very annoying as a maintainer to suddenly get 20 bugs where you don't even agree that they're bugs and just have to wade through and close them all. Any assistance you can give with triaging and prioritizing the bugs is also super helpful, since volunteer maintainers aren't necessarily prepared for sudden influxes of issues.
That was my initial reaction to that first bullet point as well. If the exception isn't at least explicitly raised then it shouldn't be considered a documentation problem, and even then I don't know if I would expect an explicit mentioning of ValueError if the docs say e.g. "only values within this range are expected" as that implicitly suggests ValueError will be used.
In general this sounds like cool work, and help improving Python's quality is always welcome. Just be careful that it's actually helpful :-).
It's definitely a balancing act. :)
On 17.07.2018 19:44, Jussi Judin wrote:
Hi,
I have been fuzzing[1] various parts of Python standard library for Python 3.7 with python-afl[2] to find out internal implementation issues that exist in the library. What I have been looking for are mainly following:
* Exceptions that are something else than the documented ones. These usually indicate an internal implementation issue. For example one would not expect an UnicodeDecodeError from netrc.netrc() function when the documentation[3] promises netrc.NetrcParseError and there is no way to pass properly sanitized file object to the netrc.netrc(). * Differences between values returned by C and Python versions of some functions. quopri module may have these. * Unexpected performance and memory allocation issues. These can be somewhat controversial to fix, if at all, but at least in some cases from end-user perspective it can be really nasty if for example fractions.Fraction("1.64E6646466664") results in hundreds of megabytes of memory allocated and takes very long to calculate. I gave up waiting for that function call to finish after 5 minutes.
As this is going to result in a decent amount of bug reports (currently I only filed one[4], although that audio processing area has much more issues to file), I would like to ask your opinion on filing these bug reports. Should I report all issues regarding some specific module in one bug report, or try to further split them into more fine grained reports that may be related? These different types of errors are specifically noticeable in zipfile module that includes a lot of different exception and behavioral types on invalid data https://github.com/Barro/python-stdlib-fuzzers/tree/master/zipfile/crashes . And in case of sndhdr module, there are multiple modules with issues (aifc, sunau, wave) that then show up also in sndhdr when they are used. Or are some of you willing to go through the crashes that pop up and help with the report filing?
I'm not from the core team, so will recite best practices from my own experience. Bugs should be reported "one per root cause" aka 1bug report=1fix. It's permissible to report separately, especially if you're not sure if they are the same bug (then add a prominent link), but since this is a volunteer project, you really should be doing any diplicate checks _before_ reporting. Since you'll be checking existing tickets before reporting each new one anyway, that'll automatically include _your own_ previous tickets ;-) For ditto bugs in multiple places, it's better to err on the side of fewer tickets -- this will both be less work for everyone and give a more complete picture. If something proves to warrant a separate ticket, it can be split off later.
The code and more verbose description for this is available from https://github.com/Barro/python-stdlib-fuzzers. It works by default on some GNU/Linux systems only (I use Debian testing), as it relies on /dev/shm/ being available and uses shell scripts as wrappers that rely on various tools that may not be installed on all systems by default.
As a bonus, as this uses coverage based fuzzing, it also opens up the possibility of automatically creating a regression test suite for each of the fuzzed modules to ensure that the existing functionality (input files under <fuzz-target>/corpus/ directory) does not suddenly result in additional exceptions and that it is more easy to test potential bug fixes (crash inducing files under <fuzz-target>/crashes/ directory).
As a downside, this uses two quite specific tools (afl, python-afl) that have further dependencies (Cython) inside them, I doubt the viability of integrating this type of testing as part of normal Python verification process. As a difference to libFuzzer based fuzzing that is already integrated in Python[5], this instruments the actual (and only the) Python code and not the actions that the interpreter does in the background. So this should result in better fuzzer coverage for Python code that is used with the downside that when C functions are called, they are complete black boxes to the fuzzer.
I have mainly run these fuzzer instances at most for several hours per module with 4 instances and stopped running no-issue modules after there have been no new coverage discovered after more than 10 minutes. Also I have not really created high quality initial input files, so I wouldn't be surprised if there are more issues lurking around that could be found with throwing more CPU and higher quality fuzzers at the problem.
[1]: https://en.wikipedia.org/wiki/Fuzzing [2]: https://github.com/jwilk/python-afl [3]: https://docs.python.org/3/library/netrc.html [4]: https://bugs.python.org/issue34088 [5]: https://github.com/python/cpython/tree/3.7/Modules/_xxtestfuzz
-- Regards, Ivan
participants (9)
-
Brett Cannon
-
Damian Shaw
-
Ivan Pozdeev
-
Jussi Judin
-
Michael Selik
-
Nathaniel Smith
-
Nick Coghlan
-
Paul G
-
Steve Holden