I've posted in Discourse under core-workflow category
<https://discuss.python.org/t/using-cla-assistant-for-python/990>, and this
has been previously discussed on the core-workflow mailing list
but I feel this affects the wider contributors to Python, so wanted to
share it here for more visibility.
We'd like to start using CLA assistant for contributions to Python
(including CPython, devguide, PEPs, all the bots etc). Ernest had set up
our own instance of CLA assistant, and it had been tested by several core
developers. We've also consulted The PSF and Van Lindberg for legal advice.
Unless I hear strong opposition (with reasons) from Python Steering
Council, Python core developers, and active core contributors, I plan to
switch us over to to CLA assistant in the coming week (before my OOOS of
How this will affect all contributors to Python old and new:
- you will need to sign the CLA again, even if you've signed it before (in
bpo). It will take you several clicks, but then you'll do this only once,
and it takes effect immediately. (instead of waiting for a PSF staff to
check for it)
- bpo username will no longer be required when signing the CLA
- CLA will be accepted under Apache v2 only (no more Academic Free license)
For even more details, please follow the discourse post and the
core-workflow mailing list linked above, as well as the "CLA" section of my
blog post about Core Python sprint 2018
I came from https://bugs.python.org/issue35838
Since there are no "expert" for configparser in
Expert Index, I ask here to make design decision.
The default behavior of CofigParser.optionxform
is str.lowercase(). This is used to canonicalize
option key names.
The document of the optionxform shows example
overrides it to identity function `lambda option: option`.
BPO-35838 is issue about optionxform can be called twice
If optionxfrom is not idempotent, it creates unexpected option
But even if all APIs calls optionxform exactly once, user may
read option name and value, and write updated value with same name.
In this case, user read option name already optionxform-ed
(canonicalized). So non-idempotent optionxform will break
So what should we do about optionxform?
a) Document "optionxform must be idempotent".
b) Ensure all APIs calls optionxform exactly once, and document
"When you get option name from section objects, it is already
optionxform-ed. You can not reuse the option name if
optionxform is not idempotent, because optionxform will be
applied to the name again."
I prefer (a) to (b) because it's simple and easy solution.
But for some use cases (e.g. read only, write only, use only
predefined option name and read only it's value), (b) works.
At least issue reporter try this use case and be trapped by
How do you think?
Inada Naoki <songofacandy(a)gmail.com>
On 07/03/2019 19.08, Mariatta wrote:
> I'd like to formally present to Python-dev PEP 581: Using GitHub Issues
> for CPython
> Full text: https://www.python.org/dev/peps/pep-0581/
> This is my first PEP, and in my opinion it is ready for wider
One part of this PEP stands out to me:
| We should not be moving all open issues to GitHub. Issues with little
| or no activity should just be closed. Issues with no decision made for
| years should just be closed.
I strongly advise against closing bug reports just because they're old.
I know that the Python developers value trying to be a welcoming
community. To many people, having a bug report that they put some effort
into closed for no reason other than the passage of time feels like a
slap in the face which stings harder than, for example, intemperate
words on a mailing list.
This is even more true if there won't be an option to re-open the bug,
which seems to be what the PEP is saying will be the case.
If a bug has been around for a long time and hasn't been fixed, the most
useful information for the bug tracker to contain is "this bug has been
around for a long time and it hasn't been fixed". Leaving the bug open
is the simplest way to achieve that.
(I think the above only goes for issues which are actually reporting
bugs. Wishlist items are a different matter.)
On Thu, Mar 7, 2019 at 12:36 PM Manuel Cerón <ceronman(a)gmail.com> wrote:
> After some frustration with bpo, I decided to file some issues into the
> meta tracker, just to find out that the link  provided by the Python
> Developer's Guide  is broken, giving a connection timeout.
Sometime ago we've started experimenting moving the meta tracker to GitHub.
https://github.com/python/bugs.python.org I don't know whether this is now
the "official" place for it, but I've definitely been referring people to
this repo if they need to file issue about bpo.
Again I don't know if this is now official or not, and should we start
updating all documentations accordingly?
I'm working on compact and ordered set implementation.
It has internal data structure similar to new dict from Python 3.6.
It is still work in progress. Comments, tests, and documents
should be updated. But it passes existing tests excluding
test_sys and test_gdb (both tests checks implementation detail)
Before completing this work, I want to evaluate it.
Following is my current thoughts about the compact ordered set.
## Preserving insertion order
Order is not fundamental for set. There are no order in set in the
But it is convenient sometime in real world. For example, it makes
doctest easy. When writing set to logs, we can use "grep" command
if print order is stable. pyc is stable without PYTHONHASHSEED=0 hack.
Additionally, consistency with dict is desirable. It removes one pitfall for
new Python users. "Remove duplicated items from list" idiom become
`list(set(duplicated))` from `list(dict.fromkeys(duplicated))`.
## Memory efficiency
Hash table has dilemma. To reduce collision rate, hash table
should be sparse. But it wastes memory.
Since current set is optimized for both of hit and miss cases,
it is more sparse than dict. (It is bit surprise that set typically uses
more memory than same size dict!)
New implementation partially solve this dilemma. It has sparse
"index table" which items are small (1byte when table size <= 256,
2bytes when table size <= 65536), and dense entry table (each item
has key and hash, which is 16bytes on 64bit system).
I use 1/2 for capacity rate for now. So new implementation is
memory efficient when len(s) <= 32768. But memory efficiency is
roughly equal to current implementation when 32768 < len(s) <= 2**31,
and worse than current implementation when len(s) > 2**31.
Here is quick test about memory usage.
$ ./python -m perf compare_to master.json oset2.json -G --min-speed=2
- unpickle_list: 8.48 us +- 0.09 us -> 12.8 us +- 0.5 us: 1.52x slower (+52%)
- unpickle: 29.6 us +- 2.5 us -> 44.1 us +- 2.5 us: 1.49x slower (+49%)
- regex_dna: 448 ms +- 3 ms -> 462 ms +- 2 ms: 1.03x slower (+3%)
- meteor_contest: 189 ms +- 1 ms -> 165 ms +- 1 ms: 1.15x faster (-13%)
- telco: 15.8 ms +- 0.2 ms -> 15.3 ms +- 0.2 ms: 1.03x faster (-3%)
- django_template: 266 ms +- 6 ms -> 259 ms +- 3 ms: 1.03x faster (-3%)
- unpickle_pure_python: 818 us +- 6 us -> 801 us +- 9 us: 1.02x faster (-2%)
Benchmark hidden because not significant (49)
unpickle and unpickle_list shows massive slowdown. I suspect this slowdown
is not caused from set change. Linux perf shows many pagefault is happened
in pymalloc_malloc. I think memory usage changes hit weak point of pymalloc
accidentally. I will try to investigate it.
On the other hand, meteor_contest shows 13% speedup. It uses set.
Other doesn't show significant performance changes.
I need to write more benchmarks for various set workload.
I expect new set is faster on simple creation, iteration and destruction.
Especially, sequential iteration and deletion will reduce cache misses.
(e.g. https://bugs.python.org/issue32846 )
On the other hand, new implementation will be slow on complex
(heavy random add & del) case.
Any comments are welcome. And any benchmark for set workloads
are very welcome.
INADA Naoki <songofacandy(a)gmail.com>