With my talk "Exploring our Python Interpreter", I think this VIM plugin
can be useful for the community. It's a syntax highlighter for the C API
of CPython 3.5 and 3.6. I used Clang for the parsing and automatically
generated the keywords for VIM.
PyObject and the others typedefs of CPython will have the defined color
of your favourite editor and it's the same for the enums, the typedefs,
the functions and the macros.
Where can you use this VIM plugin ? If you want to write a CPython
extension or if you want to hack in the CPython code.
Check this screenshot: http://i.imgur.com/0k13KOU.png
Here is the repository:
Please, if you see some issues, tell me via an issue on Github.
Thank you so much,
Stéphane Wirtel - http://wirtel.be - @matrixise
My name is Kevin and I am a staff member of HackIllinois, a 36-hour
hackathon at the University of Illinois Urbana-Champaign where students
from across the nation come to build some of the most innovative hardware
and software projects. For highlights from last year’s event, check out
>From February 19-21st 2016, HackIllinois returns and we are introducing a
new initiative called OpenSource@HackIllinois to promote Open Source
development during the event. This program is designed to provide students
with the opportunity to meet and collaborate with experienced developers,
like you all, who serve as a guide and mentor into the open source world.
Over the course of the event, you and your group of hackers will build
features for an open source project of your choosing. Please see
http://www.hackillinois.org/opensource for more details!
If you or any other open source developers you work with are interested in
learning more about OpenSource@HackIllinois, feel free to email me at
kevin.hong(a)hackillinois.org. I look forward to speaking with you soon!
I would like to propose the FAT Python project subject to the Google
Summer of Code:
I have a long list of optimization ideas for fatoptimizer:
The fatoptimizer project is written in pure Python and has a simple design.
I implemented quite simple optimizations which are learnt at school. IMHO
such project fits well for a student.
Does the PSF already plan to apply to the GSoC? Are there other projects?
Since we're all talking about making Python faster, I thought I'd drop
some previous ideas I've had here in case (1) someone wants to actually
do them, and (2) they really are new ideas that haven't failed in the
past. Mostly I was thinking about startup time.
Here are the list of modules imported on clean startup on my Windows,
US-English machine (from -v and cleaned up a bit):
Obviously the easiest first thing is to remove or delay unnecessary
imports. But a while ago I used a native profiler to trace through this
and the most impactful modules were the encodings:
While I don't doubt that we need all of these for *some* reason,
aliases, cp437 and cp1252 are relatively expensive modules to import.
Mostly due to having large static dictionaries or data structures
generated on startup.
Given this is static and mostly read-only information, I see no
reason why we couldn't either generate completely static versions of
them, or better yet compile the resulting data structures into the core
(: If being able to write to some of the encoding data is used by
some people, I vote for breaking that for 3.6 and making it read-only.)
This is probably the code snippet that bothered me the most:
### Encoding table
It shows up in many of the encodings modules, and while it is not a bad
function in itself, we are obviously generating a known data structure
on every startup. Storing these in static data is a tradeoff between
disk space and startup performance, and one I think it likely to be
Anyway, just an idea if someone wants to try it and see what
improvements we can get. I'd love to do it myself, but when it actually
comes to finding time I keep coming up short.
P.S. If you just want to discuss optimisation techniques or benchmarking
in general, without specific application to CPython 3.6, there's a whole
internet out there. Please don't make me the cause of a pointless
This is the second email thread I start regarding implementing an opcode
cache in ceval loop. Since my first post on this topic:
- I've implemented another optimization (LOAD_ATTR);
- I've added detailed statistics mode so that I can "see" how the cache
performs and tune it;
- some macro benchmarks are now 10-20% faster; 2to3 (a real application)
is 7-8% faster;
- and I have some good insights on the memory footprint.
** The purpose of this email is to get a general approval from
python-dev, so that I can start polishing the patches and getting them
Summary of optimizations
When a code object is executed more than ~1000 times, it's considered
"hot". It gets its opcodes analyzed to initialize caches for
LOAD_METHOD (a new opcode I propose to add in ), LOAD_ATTR, and
It's important to only optimize code objects that were executed "enough"
times, to avoid optimizing code objects for modules, classes, and
functions that were imported but never used.
The cache struct is defined in code.h , and is 32 bytes long. When a
code object becomes hot, it gets an cache offset table allocated for it
(+1 byte for each opcode) + an array of cache structs.
To measure the max/average memory impact, I tuned my code to optimize
*every* code object on *first* run. Then I ran the entire Python test
suite. Python test suite + standard library both contain around 72395
code objects, which required 20Mb of memory for caches. The test
process consumed around 400Mb of memory. Thus, the absolute worst case
scenario, the overhead is about 5%.
Then I ran the test suite without any modifications to the patch. This
means that only code objects that are called frequently enough are
optimized. In this more, only 2072 code objects were optimized, using
less than 1Mb of memory for the cache.
Damien George mentioned that they optimize a lot of dict lookups in
MicroPython by memorizing last key/value offset in the dict object, thus
eliminating lots of hash lookups. I've implemented this optimization in
my patch. The results are quite good. A simple micro-benchmark 
shows ~30% speed improvement. Here are some debug stats generated by
-- Opcode cache LOAD_ATTR hits = 14778415 (83%)
-- Opcode cache LOAD_ATTR misses = 750 (0%)
-- Opcode cache LOAD_ATTR opts = 282
-- Opcode cache LOAD_ATTR deopts = 60
-- Opcode cache LOAD_ATTR total = 17777912
Each "hit" makes LOAD_ATTR about 30% faster.
This turned out to be a very stable optimization. Here is the debug
output of the 2to3 test:
-- Opcode cache LOAD_GLOBAL hits = 3940647 (100%)
-- Opcode cache LOAD_GLOBAL misses = 0 (0%)
-- Opcode cache LOAD_GLOBAL opts = 252
All benchmarks (and real code) have stats like that. Globals and
builtins are very rarely modified, so the cache works really well. With
LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash
lookup for it at all. It makes optimizations like "def foo(len=len)"
This is a new opcode I propose to add in . The idea is to substitute
LOAD_ATTR with it, and avoid instantiation of BoundMethod objects.
With the cache, we can store a reference to the method descriptor (I use
type->tp_version_tag for cache invalidation, the same thing
_PyType_Lookup is built around).
The cache makes LOAD_METHOD really efficient. A simple micro-benchmark
like , shows that with the cache and LOAD_METHOD,
"s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'".
LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than
LOAD_ATTR/CALL_FUNCTION. With the cache, it's about 30% faster.
Here's the debug output of the 2to3 benchmark:
-- Opcode cache LOAD_METHOD hits = 5164848 (64%)
-- Opcode cache LOAD_METHOD misses = 12 (0%)
-- Opcode cache LOAD_METHOD opts = 94
-- Opcode cache LOAD_METHOD deopts = 12
-- Opcode cache LOAD_METHOD dct-chk= 1614801
-- Opcode cache LOAD_METHOD total = 7945954
First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
. It's a very straightforward optimization, the patch is small and
easy to review.
Second, I'd like to merge the new opcode cache, see issue 26219 .
All unittests pass. Memory usage increase is very moderate (<1mb for
the entire test suite), and the performance increase is significant.
The only potential blocker for this is PEP 509 approval (which I'd be
happy to assist with).
What do you think?
With the upcoming move to Git, I thought people might be
interested in some thoughts that I wrote down when learning Git
for the first time as a long-time Mercurial user:
Comments are welcome (but probably more appropriate off-list).
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
»Time flies like an arrow, fruit flies like a Banana.«
On 2 February 2016 at 05:21, raymond.hettinger
> changeset: 100142:0731f097157b
> parent: 100140:c7f1acdd8be1
> user: Raymond Hettinger <python(a)rcn.com>
> date: Mon Feb 01 21:21:19 2016 -0800
> Doc/library/collections.rst | 4 ++--
> Lib/test/test_deque.py | 23 ++++++++++++-----------
> Modules/_collectionsmodule.c | 7 ++-----
> 3 files changed, 16 insertions(+), 18 deletions(-)
This wasn’t actually a merge (there is only one parent). Hopefully I
fixed it up with <https://hg.python.org/cpython/rev/03708c680eca>. But
it looks like the original NEWS entry didn’t get merged in your
earlier merge <https://hg.python.org/cpython/rev/58266f5101cc>, so
there was nothing for me to merge the NEWS changes into in the default
That's great news about the speed improvements with the dict offset cache!
> The cache struct is defined in code.h , and is 32 bytes long. When a
> code object becomes hot, it gets an cache offset table allocated for it
> (+1 byte for each opcode) + an array of cache structs.
Ok, so each opcode has a 1-byte cache that sits separately to the
actual bytecode. But a lot of opcodes don't use it so that leads to
some wasted memory, correct?
But then how do you index the cache, do you keep a count of the
current opcode number? If I remember correctly, CPython has some
opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
bytecode cannot be easily mapped to a bytecode number.