Mailman 3 January 2015 - Python-Dev

New Windows installer for Python 3.5
by Steve Dower Jan. 13, 2015

Jan. 13, 2015

I've put together a short post showing where I've been taking the Windows installer for Python 3.5, since I know there are interested people on this list who will have valuable feedback. http://stevedower.id.au/blog/the-python-3-5-installer/ Nothing is merged in yet and everything can still change, so I'm keen to hear whatever feedback people have. I've tried to make improvements fairly for first-time users through to sysadmins, but if I've missed something big I'd like to hear about it … [View More]

11 30

Summary of Python tracker Issues
by Python tracker Jan. 9, 2015

Jan. 9, 2015

ACTIVITY SUMMARY (2015-01-02 - 2015-01-09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue. Do NOT respond to this message. Issues counts and deltas: open 4726 (+20) closed 30244 (+37) total 34970 (+57) Open issues with patches: 2214 Issues opened (44) ================== #14099: ZipFile.open() should not reopen the underlying file http://bugs.python.org/issue14099 reopened by serhiy.storchaka #14853: test_file.… [View More]

1 0

contributing to multiprocessing
by Davin Potts Jan. 9, 2015

Jan. 9, 2015

Hi all -- I am interested in making some serious ongoing contributions around multiprocessing. My inspiration, first and foremost, comes from the current documentation for multiprocessing. There is great material there but I believe it is being presented in a way that hinders adoption and understanding. I've taken some initial baby-steps to propose specific changes: http://bugs.python.org/issue22952 http://bugs.python.org/issue23100 The first, issue22952, can reasonably be tackled with a … [View More]

4 4

datetime nanosecond support (ctd?)
by Matthieu Bec Jan. 9, 2015

Jan. 9, 2015

newbie first post on this list, if what follows is of context ... Hi all, I'm struggling with issue per the subject, read different threads and issue http://bugs.python.org/issue15443 that started 2012 still opened as of today. Isn't there a legitimate case for nanosecond support? it's all over the place in 'struct timespec' and maybe wrongly I always found python and C were best neighbors. That's for the notional aspect. More practically, aren't we close enough yet with current … [View More]

16 30

My thinking about the development process
by Brett Cannon Jan. 8, 2015

Jan. 8, 2015

This is a bit long as I espoused as if this was a blog post to try and give background info on my thinking, etc. The TL;DR folks should start at the "Ideal Scenario" section and read to the end. P.S.: This is in Markdown and I have put it up at https://gist.github.com/brettcannon/a9c9a5989dc383ed73b4 if you want a nicer formatted version for reading. # History lesson Since I signed up for the python-dev mailing list way back in June 2002, there seems to be a cycle where we as a group come to … [View More]a realization that our current software development process has not kept up with modern practices and could stand for an update. For me this was first shown when we moved from SourceForge to our own infrastructure, then again when we moved from Subversion to Mercurial (I led both of these initiatives, so it's somewhat a tradition/curse I find myself in this position yet again). And so we again find ourselves at the point of realizing that we are not keeping up with current practices and thus need to evaluate how we can improve our situation. # Where we are now Now it should be realized that we have to sets of users of our development process: contributors and core developers (the latter whom can play both roles). If you take a rough outline of our current, recommended process it goes something like this: 1. Contributor clones a repository from hg.python.org 2. Contributor makes desired changes 3. Contributor generates a patch 4. Contributor creates account on bugs.python.org and signs the [contributor agreement](https://www.python.org/psf/contrib/contrib-form/) 4. Contributor creates an issue on bugs.python.org (if one does not already exist) and uploads a patch 5. Core developer evaluates patch, possibly leaving comments through our [custom version of Rietveld](http://bugs.python.org/review/) 6. Contributor revises patch based on feedback and uploads new patch 7. Core developer downloads patch and applies it to a clean clone 8. Core developer runs the tests 9. Core developer does one last `hg pull -u` and then commits the changes to various branches I think we can all agree it works to some extent, but isn't exactly smooth. There are multiple steps in there -- in full or partially -- that can be automated. There is room to improve everyone's lives. And we can't forget the people who help keep all of this running as well. There are those that manage the SSH keys, the issue tracker, the review tool, hg.python.org, and the email system that let's use know when stuff happens on any of these other systems. The impact on them needs to also be considered. ## Contributors I see two scenarios for contributors to optimize for. There's the simple spelling mistake patches and then there's the code change patches. The former is the kind of thing that you can do in a browser without much effort and should be a no-brainer commit/reject decision for a core developer. This is what the GitHub/Bitbucket camps have been promoting their solution for solving while leaving the cpython repo alone. Unfortunately the bulk of our documentation is in the Doc/ directory of cpython. While it's nice to think about moving the devguide, peps, and even breaking out the tutorial to repos hosting on Bitbucket/GitHub, everything else is in Doc/ (language reference, howtos, stdlib, C API, etc.). So unless we want to completely break all of Doc/ out of the cpython repo and have core developers willing to edit two separate repos when making changes that impact code **and** docs, moving only a subset of docs feels like a band-aid solution that ignores the big, white elephant in the room: the cpython repo, where a bulk of patches are targeting. For the code change patches, contributors need an easy way to get a hold of the code and get their changes to the core developers. After that it's things like letting contributors knowing that their patch doesn't apply cleanly, doesn't pass tests, etc. As of right now getting the patch into the issue tracker is a bit manual but nothing crazy. The real issue in this scenario is core developer response time. ## Core developers There is a finite amount of time that core developers get to contribute to Python and it fluctuates greatly. This means that if a process can be found which allows core developers to spend less time doing mechanical work and more time doing things that can't be automated -- namely code reviews -- then the throughput of patches being accepted/rejected will increase. This also impacts any increased patch submission rate that comes from improving the situation for contributors because if the throughput doesn't change then there will simply be more patches sitting in the issue tracker and that doesn't benefit anyone. # My ideal scenario If I had an infinite amount of resources (money, volunteers, time, etc.), this would be my ideal scenario: 1. Contributor gets code from wherever; easiest to just say "fork on GitHub or Bitbucket" as they would be official mirrors of hg.python.org and are updated after every commit, but could clone hg.python.org/cpython if they wanted 2. Contributor makes edits; if they cloned on Bitbucket or GitHub then they have browser edit access already 3. Contributor creates an account at bugs.python.org and signs the CLA 3. The contributor creates an issue at bugs.python.org (probably the one piece of infrastructure we all agree is better than the other options, although its workflow could use an update) 4. If the contributor used Bitbucket or GitHub, they send a pull request with the issue # in the PR message 5. bugs.python.org notices the PR, grabs a patch for it, and puts it on bugs.python.org for code review 6. CI runs on the patch based on what Python versions are specified in the issue tracker, letting everyone know if it applied cleanly, passed tests on the OSs that would be affected, and also got a test coverage report 7. Core developer does a code review 8. Contributor updates their code based on the code review and the updated patch gets pulled by bugs.python.org automatically and CI runs again 9. Once the patch is acceptable and assuming the patch applies cleanly to all versions to commit to, the core developer clicks a "Commit" button, fills in a commit message and NEWS entry, and everything gets committed (if the patch can't apply cleanly then the core developer does it the old-fashion way, or maybe auto-generate a new PR which can be manually touched up so it does apply cleanly?) Basically the ideal scenario lets contributors use whatever tools and platforms that they want and provides as much automated support as possible to make sure their code is tip-top before and during code review while core developers can review and commit patches so easily that they can do their job from a beach with a tablet and some WiFi. ## Where the current proposed solutions seem to fall short ### GitHub/Bitbucket Basically GitHub/Bitbucket is a win for contributors but doesn't buy core developers that much. GitHub/Bitbucket gives contributors the easy cloning, drive-by patches, CI, and PRs. Core developers get a code review tool -- I'm counting Rietveld as deprecated after Guido's comments about the code's maintenance issues -- and push-button commits **only for single branch changes**. But for any patch that crosses branches we don't really gain anything. At best core developers tell a contributor "please send your PR against 3.4", push-button merge it, update a local clone, merge from 3.4 to default, do the usual stuff, commit, and then push; that still keeps me off the beach, though, so that doesn't get us the whole way. You could force people to submit two PRs, but I don't see that flying. Maybe some tool could be written that automatically handles the merge/commit across branches once the initial PR is in? Or automatically create a PR that core developers can touch up as necessary and then accept that as well? Regardless, some solution is necessary to handle branch-crossing PRs. As for GitHub vs. Bitbucket, I personally don't care. I like GitHub's interface more, but that's personal taste. I like hg more than git, but that's also personal taste (and I consider a transition from hg to git a hassle but not a deal-breaker but also not a win). It is unfortunate, though, that under this scenario we would have to choose only one platform. It's also unfortunate both are closed-source, but that's not a deal-breaker, just a knock against if the decision is close. ### Our own infrastructure The shortcoming here is the need for developers, developers, developers! Everything outlined in the ideal scenario is totally doable on our own infrastructure with enough code and time (donated/paid-for infrastructure shouldn't be an issue). But historically that code and time has not materialized. Our code review tool is a fork that probably should be replaced as only Martin von Löwis can maintain it. Basically Ezio Melotti maintains the issue tracker's code. We don't exactly have a ton of people constantly going "I'm so bored because everything for Python's development infrastructure gets sorted so quickly!" A perfect example is that R. David Murray came up with a nice update for our workflow after PyCon but then ran out of time after mostly defining it and nothing ever became of it (maybe we can rectify that at PyCon?). Eric Snow has pointed out how he has written similar code for pulling PRs from I think GitHub to another code review tool, but that doesn't magically make it work in our infrastructure or get someone to write it and help maintain it (no offense, Eric). IOW our infrastructure can do anything, but it can't run on hopes and dreams. Commitments from many people to making this happen by a certain deadline will be needed so as to not allow it to drag on forever. People would also have to commit to continued maintenance to make this viable long-term. # Next steps I'm thinking first draft PEPs by February 1 to know who's all-in (8 weeks away), all details worked out in final PEPs and whatever is required to prove to me it will work by the PyCon language summit (4 months away). I make a decision by May 1, and then implementation aims to be done by the time 3.5.0 is cut so we can switch over shortly thereafter (9 months away). Sound like a reasonable timeline? [View Less]

15 45

Cpython code and ...
by Ethan Furman Jan. 7, 2015

Jan. 7, 2015

I found this: PyObject * PyBytes_FromFormat(const char *format, ...) { Can someone enlighten me on what the '...' means? -- ~Ethan~

2 2

python 2.7.9 regression in argparse?
by anatoly techtonik Jan. 6, 2015

Jan. 6, 2015

https://github.com/nickstenning/honcho/pull/121 -- anatoly t.

5 4

Issue 22619 at bugs.python.org
by Dmitry Kazakov Jan. 6, 2015

Jan. 6, 2015

Greetings. I'm sorry if I'm too insistent, but it's not truly rewarding to constantly improve a patch that no one appears to need. Again, I understand people are busy working and/or reviewing critical patches, but 2 months of inactivity is not right. Yes, I posted a message yesterday, but no one seemed to be bothered. In any case, I'll respect your decision about this patch and will never ask for a review of this patch again. Regards, Dmitry.

4 3

Help with finding tutors for Python, Linux, R, Perl, Octave, MATLAB and/or Cytoscape for yeast microarray analysis, next generation sequencing and constructing gene interaction networks
by thomas hahn Jan. 4, 2015

Jan. 4, 2015

Help with finding tutors for Python, Linux, R, Perl, Octave, MATLAB and/or Cytoscape for yeast microarray analysis, next generation sequencing and constructing gene interaction networks Hi I am a visually impaired bioinformatics graduate student using microarray data for my master’s thesis aimed at deciphering the mechanism by which the yeast wild type can suppress the rise of free reactive oxygen species (ROS) induced by caloric restriction (CR) but the Atg15 and Erg6 knockout mutant … [View More]cannot. Since my remaining vision is very limited I need very high magnification. But that makes my visual field very small. Therefore I need somebody to teach me how to use these programming environments, especially for microarray analysis, next generation sequencing and constructing gene and pathway interaction networks. This is very difficult for me to figure out without assistance because Zoomtext, my magnification and text to speech software, on which I am depending because I am almost blind, has problems reading out aloud many programming related websites to me. And even those websites it can read, it can only read sequentially from left to right and then from top to bottom. Unfortunately, this way of acquiring, finding, selecting and processing new information and answering questions is too tiresome, exhausting, ineffective and especially way too time consuming for graduating with a PhD in bioinformatics before my funding runs out despite being severely limited by my visual disability. I would also need help with writing a good literature review and applying the described techniques to my own yeast Affimetrix microarray dataset because I cannot see well enough to find all relevant publications on my own. Some examples for specific tasks I urgently need help with are: 1. Analyzing and comparing the three publically available microarray datasets that can be accessed at: A. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE41860 B. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE38635 C. http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9217 2. Learning how to use the Affymetrics microarray analysis software for the Yeast 2 chip, which can be found at http://www.affymetrix.com/support/technical/libraryfilesmain.affx 3. For Cytoscape I need somebody, who can teach me how to execute the tutorials at the following links because due to my very limited vision field I cannot see tutorial and program interface simultaneously. A. http://opentutorials.cgl.ucsf.edu/index.php/Tutorial:Introduction_to_Cytosc… B. http://opentutorials.cgl.ucsf.edu/index.php/Tutorial:Filtering_and_Editing_… C. http://cytoscape.org/manual/Cytoscape2_8Manual.html#Import%20Fixed-Format%2… D. http://wiki.cytoscape.org/Cytoscape_User_Manual/Network_Formats 4. Learning how to use the TopGo R package to perform statistical analysis on GO enrichments. Since I am legally blind the rehab agency is giving me money to pay tutors for this purpose. Could you please help me getting in touch regarding this with anybody, who could potentially be interested in teaching me one on one thus saving me time for acquiring new information and skills, which I need to finish my thesis on time, so that I can remain eligible for funding to continue in my bioinformatics PhD program despite being almost blind? The tutoring can be done remotely via TeamViewer 5 and Skype. Hence, it does not matter where my tutors are physically located. Currently I have tutors in Croatia and UK. But since they both work full time jobs while working on their PhD dissertation they only have very limited time to teach me online. Could you therefore please forward this request for help to anybody, who could potentially be interested or, who could connect me to somebody, who might be, because my graduation and career depend on it? Who else would you recommend me to contact regarding this? Where else could I post this because I am in urgent need for help? Could you please contact me directly via email at Thomas.F.Hahn2(a)gmail.com and/or Skype at tfh002 because my text to speech software has problems to read out this website aloud to me? I thank you very much in advance for your thoughts, ideas, suggestions, recommendations, time, help, efforts and support. With very warm regards, *Thomas Hahn* 1) *Graduate student in the Joint Bioinformatics Program at the University of Arkansas at Little Rock (UALR) and the University of Arkansas Medical Sciences (UAMS) &* 2) *Research & Industry Advocate, Founder and Board Member of RADISH MEDICAL SOLUTIONS, INC. (**http://www.radishmedical.com/thomas-hahn/* <http://www.radishmedical.com/thomas-hahn/>*) * *Primary email: **Thomas.F.Hahn2(a)gmail.com* <Thomas.F.Hahn2(a)gmail.com> *Cell phone: 318 243 3940* *Office phone: 501 682 1440* *Office location: EIT 535* *Skype ID: tfh002* *Virtual Google Voice phone to reach me while logged into my email (i.e. * *Thomas.F.Hahn2(a)gmail.com* <Thomas.F.Hahn2(a)gmail.com>*), even when having no cell phone reception, e.g. in big massive buildings: *(501) 301-4890 <%28501%29%20301-4890> *Web links: * 1) https://ualr.academia.edu/ThomasHahn 2) https://www.linkedin.com/pub/thomas-hahn/42/b29/42 3) http://facebook.com/Thomas.F.Hahn <https://www.facebook.com/Thomas.F.Hahn> 4) https://twitter.com/Thomas_F_Hahn [View Less]

2 1

More compact dictionaries with faster iteration
by Raymond Hettinger Jan. 3, 2015

Jan. 3, 2015

The current memory layout for dictionaries is unnecessarily inefficient. It has a sparse table of 24-byte entries containing the hash value, key pointer, and value pointer. Instead, the 24-byte entries should be stored in a dense table referenced by a sparse table of indices. For example, the dictionary: d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'} is currently stored as: entries = [['--', '--', '--'], [-8522787127447073495, 'barry', 'green'], … [View More] ['--', '--', '--'], ['--', '--', '--'], ['--', '--', '--'], [-9092791511155847987, 'timmy', 'red'], ['--', '--', '--'], [-6480567542315338377, 'guido', 'blue']] Instead, the data should be organized as follows: indices = [None, 1, None, None, None, 0, None, 2] entries = [[-9092791511155847987, 'timmy', 'red'], [-8522787127447073495, 'barry', 'green'], [-6480567542315338377, 'guido', 'blue']] Only the data layout needs to change. The hash table algorithms would stay the same. All of the current optimizations would be kept, including key-sharing dicts and custom lookup functions for string-only dicts. There is no change to the hash functions, the table search order, or collision statistics. The memory savings are significant (from 30% to 95% compression depending on the how full the table is). Small dicts (size 0, 1, or 2) get the most benefit. For a sparse table of size t with n entries, the sizes are: curr_size = 24 * t new_size = 24 * n + sizeof(index) * t In the above timmy/barry/guido example, the current size is 192 bytes (eight 24-byte entries) and the new size is 80 bytes (three 24-byte entries plus eight 1-byte indices). That gives 58% compression. Note, the sizeof(index) can be as small as a single byte for small dicts, two bytes for bigger dicts and up to sizeof(Py_ssize_t) for huge dict. In addition to space savings, the new memory layout makes iteration faster. Currently, keys(), values, and items() loop over the sparse table, skipping-over free slots in the hash table. Now, keys/values/items can loop directly over the dense table, using fewer memory accesses. Another benefit is that resizing is faster and touches fewer pieces of memory. Currently, every hash/key/value entry is moved or copied during a resize. In the new layout, only the indices are updated. For the most part, the hash/key/value entries never move (except for an occasional swap to fill a hole left by a deletion). With the reduced memory footprint, we can also expect better cache utilization. For those wanting to experiment with the design, there is a pure Python proof-of-concept here: http://code.activestate.com/recipes/578375 YMMV: Keep in mind that the above size statics assume a build with 64-bit Py_ssize_t and 64-bit pointers. The space savings percentages are a bit different on other builds. Also, note that in many applications, the size of the data dominates the size of the container (i.e. the weight of a bucket of water is mostly the water, not the bucket). Raymond [View Less]

27 70