webmaster has already heard from 4 people who cannot install it.
I sent them to the bug tracker or to python-list but they seem
not to have gone either place. Is there some guide I should be
sending them to, 'how to debug installation problems'?
Laura
Hi,
tl;dr The summary is that I have a patch that improves CPython
performance up to 5-10% on macro benchmarks. Benchmarks results on
Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
at [1]. There are no slowdowns that I could reproduce consistently.
There are twodifferent optimizations that yield this speedup:
LOAD_METHOD/CALL_METHOD opcodes and per-opcode cache in ceval loop.
LOAD_METHOD & CALL_METHOD
-------------------------
We had a lot of conversations with Victor about his PEP 509, and he sent
me a link to his amazing compilation of notes about CPython performance
[2]. One optimization that he pointed out to me was LOAD/CALL_METHOD
opcodes, an idea first originated in PyPy.
There is a patch that implements this optimization, it's tracked here:
[3]. There are some low level details that I explained in the issue,
but I'll go over the high level design in this email as well.
Every time you access a method attribute on an object, a BoundMethod
object is created. It is a fairly expensive operation, despite a
freelist of BoundMethods (so that memory allocation is generally
avoided). The idea is to detect what looks like a method call in the
compiler, and emit a pair of specialized bytecodes for that.
So instead of LOAD_GLOBAL/LOAD_ATTR/CALL_FUNCTION we will have
LOAD_GLOBAL/LOAD_METHOD/CALL_METHOD.
LOAD_METHOD looks at the object on top of the stack, and checks if the
name resolves to a method or to a regular attribute. If it's a method,
then we push the unbound method object and the object to the stack. If
it's an attribute, we push the resolved attribute and NULL.
When CALL_METHOD looks at the stack it knows how to call the unbound
method properly (pushing the object as a first arg), or how to call a
regular callable.
This idea does make CPython faster around 2-4%. And it surely doesn't
make it slower. I think it's a safe bet to at least implement this
optimization in CPython 3.6.
So far, the patch only optimizes positional-only method calls. It's
possible to optimize all kind of calls, but this will necessitate 3 more
opcodes (explained in the issue). We'll need to do some careful
benchmarking to see if it's really needed.
Per-opcode cache in ceval
-------------------------
While reading PEP 509, I was thinking about how we can use
dict->ma_version in ceval to speed up globals lookups. One of the key
assumptions (and this is what makes JITs possible) is that real-life
programs don't modify globals and rebind builtins (often), and that most
code paths operate on objects of the same type.
In CPython, all pure Python functions have code objects. When you call
a function, ceval executes its code object in a frame. Frames contain
contextual information, including pointers to the globals and builtins
dict. The key observation here is that almost all code objects always
have same pointers to the globals (the module they were defined in) and
to the builtins. And it's not a good programming practice to mutate
globals or rebind builtins.
Let's look at this function:
def spam():
print(ham)
Here are its opcodes:
2 0 LOAD_GLOBAL 0 (print)
3 LOAD_GLOBAL 1 (ham)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
The opcodes we want to optimize are LAOD_GLOBAL, 0 and 3. Let's look at
the first one, that loads the 'print' function from builtins. The
opcode knows the following bits of information:
- its offset (0),
- its argument (0 -> 'print'),
- its type (LOAD_GLOBAL).
And these bits of information will *never* change. So if this opcode
could resolve the 'print' name (from globals or builtins, likely the
latter) and save the pointer to it somewhere, along with
globals->ma_version and builtins->ma_version, it could, on its second
call, just load this cached info back, check that the globals and
builtins dict haven't changed and push the cached ref to the stack.
That would save it from doing two dict lookups.
We can also optimize LOAD_METHOD. There are high chances, that 'obj' in
'obj.method()' will be of the same type every time we execute the code
object. So if we'd have an opcodes cache, LOAD_METHOD could then cache
a pointer to the resolved unbound method, a pointer to obj.__class__,
and tp_version_tag of obj.__class__. Then it would only need to check
if the cached object type is the same (and that it wasn't modified) and
that obj.__dict__ doesn't override 'method'. Long story short, this
caching really speeds up method calls on types implemented in C.
list.append becomes very fast, because list doesn't have a __dict__, so
the check is very cheap (with cache).
A straightforward way to implement such a cache is simple, but consumes
a lot of memory, that would be just wasted, since we only need such a
cache for LOAD_GLOBAL and LOAD_METHOD opcodes. So we have to be creative
about the cache design. Here's what I came up with:
1. We add a few fields to the code object.
2. ceval will count how many times each code object is executed.
3. When the code object is executed over ~900 times, we mark it as
"hot". We also create an 'unsigned char' array "MAPPING", with length
set to match the length of the code object. So we have a 1-to-1 mapping
between opcodes and MAPPING array.
4. Next ~100 calls, while the code object is "hot", LOAD_GLOBAL and
LOAD_METHOD do "MAPPING[opcode_offset()]++".
5. After 1024 calls to the code object, ceval loop will iterate through
the MAPPING, counting all opcodes that were executed more than 50 times.
6. We then create an array of cache structs "CACHE" (here's a link to
the updated code.h file: [6]). We update MAPPING to be a mapping
between opcode position and position in the CACHE. The code object is
now "optimized".
7. When the code object is "optimized", LOAD_METHOD and LOAD_GLOBAL use
the CACHE array for fast path.
8. When there is a cache miss, i.e. the builtins/global/obj.__dict__
were mutated, the opcode marks its entry in 'CACHE' as deoptimized, and
it will never try to use the cache again.
Here's a link to the issue tracker with the first version of the patch:
[5]. I'm working on the patch in a github repo here: [4].
Summary
-------
There are many things about this algorithm that we can improve/tweak.
Perhaps we should profile code objects longer, or account for time they
were executed. Maybe we shouldn't deoptimize opcodes on their first
cache miss. Maybe we can come up with better data structures. We also
need to profile the memory and see how much more this cache will require.
One thing I'm certain about, is that we can get a 5-10% speedup of
CPython with relatively low memory impact. And I think it's worth
exploring that!
If you're interested in these kind of optimizations, please help with
code reviews, ideas, profiling and benchmarks. The latter is especially
important, I'd never imagine how hard it is to come up with a good macro
benchmark.
I also want to thank my company MagicStack (magic.io) for sponsoring
this work.
Thanks,
Yury
[1] https://gist.github.com/1st1/aed69d63a2ff4de4c7be
[2] http://faster-cpython.readthedocs.org/index.html
[3] http://bugs.python.org/issue26110
[4] https://github.com/1st1/cpython/tree/opcache2
[5] http://bugs.python.org/issue26219
[6]
https://github.com/python/cpython/compare/master...1st1:opcache2?expand=1...
Saw recent discussion:
https://mail.python.org/pipermail/python-dev/2016-February/143013.html
I remember trying WPython; it was fast. Unfortunately it feels it came at
the wrong time when development was invested in getting py3k out the door.
It also had a lot of other ideas like *_INT instructions which allowed
having oparg to be a constant int rather than needing to LOAD_CONST one.
Anyways I'll stop reminiscing
abarnert has started an experiment with wordcode:
https://github.com/abarnert/cpython/blob/c095a32f2a68ac708466b9c64906cc4d...
I've personally benchmarked this fork with positive results. This
experiment seeks to be conservative-- it doesn't seek to introduce new
opcodes or combine BINARY_OP's all into a single op where the currently
unused-in-wordcode arg then states the kind of binary op (à la COMPARE_OP).
I've submitted a pull request which is working on fixing tests & updating
peephole.c
Bringing this up on the list to figure out if there's interest in a basic
wordcode change. It feels like there's no downsides: faster code, smaller
bytecode, simpler interpretation of bytecode (The Nth instruction starts at
the 2Nth byte if you count EXTENDED_ARG as an instruction). The only
downside is the transitional cost
What'd be necessary for this to be pulled upstream?
Hi all,
after talking to Guido and Serhiy we present the next revision
of this PEP. It is a compromise that we are all happy with,
and a relatively restricted rule that makes additions to PEP 8
basically unnecessary.
I think the discussion has shown that supporting underscores in
the from-string constructors is valuable, therefore this is now
added to the specification section.
The remaining open question is about the reverse direction: do
we want a string formatting modifier that adds underscores as
thousands separators?
cheers,
Georg
-----------------------------------------------------------------
PEP: 515
Title: Underscores in Numeric Literals
Version: $Revision$
Last-Modified: $Date$
Author: Georg Brandl, Serhiy Storchaka
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 10-Feb-2016
Python-Version: 3.6
Post-History: 10-Feb-2016, 11-Feb-2016
Abstract and Rationale
======================
This PEP proposes to extend Python's syntax and number-from-string
constructors so that underscores can be used as visual separators for
digit grouping purposes in integral, floating-point and complex number
literals.
This is a common feature of other modern languages, and can aid
readability of long literals, or literals whose value should clearly
separate into parts, such as bytes or words in hexadecimal notation.
Examples::
# grouping decimal numbers by thousands
amount = 10_000_000.0
# grouping hexadecimal addresses by words
addr = 0xDEAD_BEEF
# grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110
# same, for string conversions
flags = int('0b_1111_0000', 2)
Specification
=============
The current proposal is to allow one underscore between digits, and
after base specifiers in numeric literals. The underscores have no
semantic meaning, and literals are parsed as if the underscores were
absent.
Literal Grammar
---------------
The production list for integer literals would therefore look like
this::
integer: decinteger | bininteger | octinteger | hexinteger
decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
bininteger: "0" ("b" | "B") (["_"] bindigit)+
octinteger: "0" ("o" | "O") (["_"] octdigit)+
hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit: "1"..."9"
digit: "0"..."9"
bindigit: "0" | "1"
octdigit: "0"..."7"
hexdigit: digit | "a"..."f" | "A"..."F"
For floating-point and complex literals::
floatnumber: pointfloat | exponentfloat
pointfloat: [digitpart] fraction | digitpart "."
exponentfloat: (digitpart | pointfloat) exponent
digitpart: digit (["_"] digit)*
fraction: "." digitpart
exponent: ("e" | "E") ["+" | "-"] digitpart
imagnumber: (floatnumber | digitpart) ("j" | "J")
Constructors
------------
Following the same rules for placement, underscores will be allowed in
the following constructors:
- ``int()`` (with any base)
- ``float()``
- ``complex()``
- ``Decimal()``
Prior Art
=========
Those languages that do allow underscore grouping implement a large
variety of rules for allowed placement of underscores. In cases where
the language spec contradicts the actual behavior, the actual behavior
is listed. ("single" or "multiple" refer to allowing runs of
consecutive underscores.)
* Ada: single, only between digits [8]_
* C# (open proposal for 7.0): multiple, only between digits [6]_
* C++14: single, between digits (different separator chosen) [1]_
* D: multiple, anywhere, including trailing [2]_
* Java: multiple, only between digits [7]_
* Julia: single, only between digits (but not in float exponent parts)
[9]_
* Perl 5: multiple, basically anywhere, although docs say it's
restricted to one underscore between digits [3]_
* Ruby: single, only between digits (although docs say "anywhere")
[10]_
* Rust: multiple, anywhere, except for between exponent "e" and digits
[4]_
* Swift: multiple, between digits and trailing (although textual
description says only "between digits") [5]_
Alternative Syntax
==================
Underscore Placement Rules
--------------------------
Instead of the relatively strict rule specified above, the use of
underscores could be limited. As we seen from other languages, common
rules include:
* Only one consecutive underscore allowed, and only between digits.
* Multiple consecutive underscores allowed, but only between digits.
* Multiple consecutive underscores allowed, in most positions except
for the start of the literal, or special positions like after a
decimal point.
The syntax in this PEP has ultimately been selected because it covers
the common use cases, and does not allow for syntax that would have to
be discouraged in style guides anyway.
A less common rule would be to allow underscores only every N digits
(where N could be 3 for decimal literals, or 4 for hexadecimal ones).
This is unnecessarily restrictive, especially considering the
separator placement is different in different cultures.
Different Separators
--------------------
A proposed alternate syntax was to use whitespace for grouping.
Although strings are a precedent for combining adjoining literals, the
behavior can lead to unexpected effects which are not possible with
underscores. Also, no other language is known to use this rule,
except for languages that generally disregard any whitespace.
C++14 introduces apostrophes for grouping (because underscores
introduce ambiguity with user-defined literals), which is not
considered because of the use in Python's string literals. [1]_
Open Proposals
==============
It has been proposed [11]_ to extend the number-to-string formatting
language to allow ``_`` as a thousans separator, where currently only
``,`` is supported. This could be used to easily generate code with
more readable literals.
Implementation
==============
A preliminary patch that implements the specification given above has
been posted to the issue tracker. [12]_
References
==========
.. [1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3499.html
.. [2] http://dlang.org/spec/lex.html#integerliteral
.. [3] http://perldoc.perl.org/perldata.html#Scalar-value-constructors
.. [4] http://doc.rust-lang.org/reference.html#number-literals
.. [5]
https://developer.apple.com/library/ios/documentation/Swift/Conceptual/Sw...
.. [6] https://github.com/dotnet/roslyn/issues/216
.. [7]
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscor...
.. [8] http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#2.4
.. [9]
http://docs.julialang.org/en/release-0.4/manual/integers-and-floating-poi...
.. [10] http://ruby-doc.org/core-2.3.0/doc/syntax/literals_rdoc.html#label-Numbers
.. [11] https://mail.python.org/pipermail/python-dev/2016-February/143283.html
.. [12] http://bugs.python.org/issue26331
Copyright
=========
This document has been placed in the public domain.
Hi,
There is an old discussion about the performance of PyMem_Malloc()
memory allocator. CPython is stressing a lot memory allocators. Last
time I made statistics, it was for the PEP 454:
"For example, the Python test suites calls malloc() , realloc() or
free() 270,000 times per second in average."
https://www.python.org/dev/peps/pep-0454/#log-calls-to-the-memory-allocator
I proposed a simple change: modify PyMem_Malloc() to use the pymalloc
allocator which is faster for allocation smaller than 512 bytes, or
fallback to malloc() (which is the current internal allocator of
PyMem_Malloc()).
This tiny change makes Python up to 6% faster on some specific (macro)
benchmarks, and it doesn't seem to make Python slower on any
benchmark:
http://bugs.python.org/issue26249#msg259445
Do you see any drawback of using pymalloc for PyMem_Malloc()?
Does anyone recall the rationale to have two families to memory allocators?
FYI Python has 3 families since 3.4: PyMem, PyObject but also PyMem_Raw!
https://www.python.org/dev/peps/pep-0445/
--
Since pymalloc is only used for small memory allocations, I understand
that small objects will not more be allocated on the heap memory, but
only in pymalloc arenas which are allocated by mmap. The advantage of
arenas is that it's possible to "punch holes" in the memory when a
whole arena is freed, whereas the heap memory has the famous
"fragmentation" issue because the heap is a single contiguous memory
block.
The libc malloc() uses mmap() for allocations larger than a threshold
which is now dynamic, and initialized to 128 kB or 256 kB by default
(I don't recall exactly the default value).
Is there a risk of *higher* memory fragmentation if we start to use
pymalloc for PyMem_Malloc()? Does someone know how to test it?
Victor
I was throwing around some ideas with colleagues about how we detect
Python installations on Windows from within Visual Studio, and it came
up that there are many Python distros that install into different
locations but write the same registry entries. (I knew about this, of
course, but this time I decided to do something.)
Apart from not being detected properly by all IDEs/tools/installers,
non-standard distros that register themselves in the official keys may
also mess with the default sys.path values. For example, at one point
(possibly still true) if you installed both Canopy and Anaconda, you
would break the first one because they tried to load the other's stdlib.
Other implementations have different structures or do not register
themselves at all, which also makes it more complicated for tools to
discover them.
So here is a rough proposal to standardise the registry keys that can be
set on Windows in a way that (a) lets other installers besides the
official ones have equal footing, (b) provides consistent search and
resolution semantics for tools, and (c) includes slightly more rich
metadata (such as display names and URLs). Presented in PEP-like form
here, but if feedback suggests just putting it in the docs I'm okay with
that too. It is fully backwards compatible with official releases of
Python (at least back to 2.5, possibly further) and does not require
modifications to Python or the official installer - it is purely
codifying a superset of what we already do.
Any and all feedback welcomed, especially from the owners of other
distros, Python implementations or tools on the list.
Cheers,
Steve
-----
PEP: ???
Title: Python environment registration in the Windows Registry
Version: $Revision$
Last-Modified: $Date$
Author: Steve Dower <steve.dower(a)python.org>
Status: Draft
Type: ???
Content-Type: text/x-rst
Created: 02-Feb-2016
Abstract
========
When installed on Windows, the official Python installer creates a
registry key for discovery and detection by other applications.
Unofficial installers, such as those used by distributions, typically
create identical keys for the same purpose. However, these may conflict
with the official installer or other distributions.
This PEP defines a schema for the Python registry key to allow
unofficial installers to separately register their installation, and to
allow applications to detect and correctly display all Python
environments on a user's machine. No implementation changes to Python
are proposed with this PEP.
The schema matches the registry values that have been used by the
official installer since at least Python 2.5, and the resolution
behaviour matches the behaviour of the official Python releases.
Specification
=============
We consider there to be a single collection of Python environments on a
machine, where the collection may be different for each user of the
machine. There are three potential registry locations where the
collection may be stored based on the installation options of each
environment. These are::
HKEY_CURRENT_USER\Software\Python\<Company>\<Tag>
HKEY_LOCAL_MACHINE\Software\Python\<Company>\<Tag>
HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\<Company>\<Tag>
On a given machine, an environment is uniquely identified by its
Company-Tag pair. Keys should be searched in the order shown, and if the
same Company-Tag pair appears in more than one of the above locations,
only the first occurrence is offerred.
Official Python releases use ``PythonCore`` for Company, and the value
of ``sys.winver`` for Tag. Other registered environments may use any
values for Company and Tag. Recommendations are made in the following
sections.
Backwards Compatibility
-----------------------
Python 3.4 and earlier did not distinguish between 32-bit and 64-bit
builds in ``sys.winver``. As a result, it is possible to have valid
side-by-side installations of both 32-bit and 64-bit interpreters.
To ensure backwards compatibility, applications should treat
environments listed under the following two registry keys as distinct,
even if Tag matches::
HKEY_LOCAL_MACHINE\Software\Python\PythonCore\<Tag>
HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\PythonCore\<Tag>
Note that this does not apply to Python 3.5 and later, which uses
different Tags. Environments registered under other Company names must
use distinct Tags for side-by-side installations.
1. Environments in ``HKEY_CURRENT_USER`` are always preferred
2. Environments in ``HKEY_LOCAL_MACHINE\Software\Wow6432Node`` are
preferred if the interpreter is known to be 32-bit
Company
-------
The Company part of the key is intended to group related environments
and to ensure that Tags are namespaced appropriately. The key name
should be alphanumeric without spaces and likely to be unique. For
example, a trademarked name, a UUID, or a hostname would be appropriate::
HKEY_CURRENT_USER\Software\Python\ExampleCorp
HKEY_CURRENT_USER\Software\Python\6C465E66-5A8C-4942-9E6A-D29159480C60
HKEY_CURRENT_USER\Software\Python\www.example.com
If a string value named ``DisplayName`` exists, it should be used to
identify the environment category to users. Otherwise, the name of the
key should be used.
If a string value named ``SupportUrl`` exists, it may be displayed or
otherwise used to direct users to a web site related to the environment.
A complete example may look like::
HKEY_CURRENT_USER\Software\Python\ExampleCorp
(Default) = (value not set)
DisplayName = "Example Corp"
SupportUrl = "http://www.example.com"
Tag
---
The Tag part of the key is intended to uniquely identify an environment
within those provided by a single company. The key name should be
alphanumeric without spaces and stable across installations. For
example, the Python language version, a UUID or a partial/complete hash
would be appropriate; an integer counter that increases for each new
environment may not::
HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6
HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66
If a string value named ``DisplayName`` exists, it should be used to
identify the environment to users. Otherwise, the name of the key should
be used.
If a string value named ``SupportUrl`` exists, it may be displayed or
otherwise used to direct users to a web site related to the environment.
If a string value named ``Version`` exists, it should be used to
identify the version of the environment. This is independent from the
version of Python implemented by the environment.
If a string value named ``SysVersion`` exists, it must be in ``x.y`` or
``x.y.z`` format matching the version returned by ``sys.version_info``
in the interpreter. Otherwise, if the Tag matches this format it is
used. If not, the Python version is unknown.
Note that each of these values is recommended, but optional. A complete
example may look like this::
HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66
(Default) = (value not set)
DisplayName = "Distro 3"
SupportUrl = "http://www.example.com/distro-3"
Version = "3.0.12345.0"
SysVersion = "3.6.0"
InstallPath
-----------
Beneath the environment key, an ``InstallPath`` key must be created.
This key is always named ``InstallPath``, and the default value must
match ``sys.prefix``::
HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6\InstallPath
(Default) = "C:\ExampleCorpPy36"
If a string value named ``ExecutablePath`` exists, it must be a path to
the ``python.exe`` (or equivalent) executable. Otherwise, the
interpreter executable is assumed to be called ``python.exe`` and exist
in the directory referenced by the default value.
If a string value named ``WindowedExecutablePath`` exists, it must be a
path to the ``pythonw.exe`` (or equivalent) executable. Otherwise, the
windowed interpreter executable is assumed to be called ``pythonw.exe``
and exist in the directory referenced by the default value.
A complete example may look like::
HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\InstallPath
(Default) = "C:\ExampleDistro30"
ExecutablePath = "C:\ExampleDistro30\ex_python.exe"
WindowedExecutablePath = "C:\ExampleDistro30\ex_pythonw.exe"
Help
----
Beneath the environment key, a ``Help`` key may be created. This key is
always named ``Help`` if present and has no default value.
Each subkey of ``Help`` specifies a documentation file, tool, or URL
associated with the environment. The subkey may have any name, and the
default value is a string appropriate for passing to ``os.startfile`` or
equivalent.
If a string value named ``DisplayName`` exists, it should be used to
identify the help file to users. Otherwise, the key name should be used.
A complete example may look like::
HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\Help
Python\
(Default) = "C:\ExampleDistro30\python36.chm"
DisplayName = "Python Documentation"
Extras\
(Default) = "http://www.example.com/tutorial"
DisplayName = "Example Distro Online Tutorial"
Hi all,
I’ve been working on developing Python builds for mobile platforms, and I’m
looking for some help resolving a bug in Python’s build system.
The problem affects cross-platform builds - builds where you are compiling
python for a CPU architecture other than the one on the machine that is
doing the compilation. This requirement stems from supporting mobile
platforms (iOS, Android etc) where you compile on your laptop, then ship
the compiled binary to the device.
In the Python 3.5 dev cycle, Issue 22359 [1] was addressed, fixing parallel
builds. However, as a side effect, this patch broke (as far as I can tell)
*all* cross platform builds. This was reported in issue 22625 [2].
Since that time, the problem has gotten slightly worse; the addition of
changeset 95566 [3] and 95854 [4] has cemented the problem. I’ve been able
to hack together a fix that enables me to get a set of binaries, but the
patch is essentially reverting 22359, and making some (very dubious)
assumptions about the order in which things are built.
Autoconf et al aren’t my strong suit; I was hoping someone might be able to
help me resolve this issue.
Yours,
Russ Magee %-)
[1] http://bugs.python.org/issue22359
[2] http://bugs.python.org/issue22625
[3] https://hg.python.org/cpython/rev/565b96093ec8
[4] https://hg.python.org/cpython/rev/02e3bf65b2f8
##################################################################
*---------------------------------------------------*
* fuzzpy: CPython fuzz tester is now available *
* *
* Version 0.8 *
* https://bitbucket.org/ebadf/fuzzpy/ *
*---------------------------------------------------*
I am pleased to announce the creation of a coverage-guided fuzz tester for
CPython. It's a pretty small wrapper around LLVM's libFuzzer that enables
some powerful testing logic. AFL (American Fuzzy Lop) is another popular
fuzzer lately -- libFuzzer is very similar in concept to AFL. From what
I've read on list archives, Victor Stinner had previously done some good
fuzz testing on CPython using fusil. This project should expand on that
concept.
I'd love to get feedback, suggestions, patches and anything else the list
can offer.
Q: What is fuzzpy for?
A: It's primarily for testing CPython itself, but could also be used for
individual python projects too. Pure-python projects will be the simplest
to integrate at this point. Also, interesting test cases output by fuzzpy
may end up being useful in testing others such as pypy, pyston, etc.
Q: What is a fuzz tester?
A: It modifies inputs to a test case in order to find unique/rare failures.
Q: What does "coverage-guided" mean?
A: It means that libFuzzer is able to witness the specific code executed as
a result of a given test case. It feeds this information back into an
engine to modify the test cases to optimize for coverage.
Q: How can I help?
A1: donate cycles: build the project and crank away on one of the existing
tests. Relative to other common fuzzing, it's awfully slow, so consider
throwing as many cycles as you can afford to.
A2: contribute tests: write a ~10-line python script that exercises a
feature that you think could benefit from fuzz testing.
A3: if there's interest, I can accept cryptocoin donations to purchase
cycles on a cloud server.
##################################################################
--
-Brian