Mailman 3 October 2009 - stdlib-sig

ctypes or struct from an h file
by Yuvgoog Greenle Oct. 18, 2009

Oct. 18, 2009

Is there a way that Python and C can have a shared definition for a binary data structure? It could be nice if: 1. struct or ctypes had a function that could parse a .h/.c/.cpp file to auto-generate constructors or 2. a ctypes definition could be exported to a .h file. So my question is - is there a way to do this in the std-lib or even pypi? --yuv ps If this doesn't exist, then I'm probably going to open a project and would like some tips/ideas.

2 2

Evolving the Standard Library
by Armin Ronacher Oct. 12, 2009

Oct. 12, 2009

Hi everybody, I'm known for my dislike of the standard libray. In the past I wrote some blog posts about this topic into my personal blog already. However as many people pointed out earlier, a blog is not the place for this kind of criticism. Not only that, also just ranting about a topic does not help at all. Yesterday I subscribed to the stdlib-sig and immediately tons of mails ended up in my inbox. A quick look at the mail archives confirms what I was afraid of: this list is really high traffic. I tried to read up some of the discussions I missed but it's nearly impossible to do that. I would love to sum up my thoughts about the standard library here and my ideas to improve it. This list of ideas and improvements does not include any unrealistic plans such as rewriting the standard library, an approach I was a big fan of. I can see a couple of problems with the standard library currently, and some reasons why that is the case. If we look back on the history of Python it's obvious that a large number of modules in the standard library appeared out of the need of a single developer or company a while ago. Many of these libraries finally disappered or where renamed in the big standard library reorganization in Python 3 and I'm very happy that this happened. However at the same time a large number of the modules still continue to show their age. Python is currently heading into a new direction many people would not have thought about a few years ago. And that are web applications. For web applications different rules apply than for desktop applications. Command line scripts or GUI applications are mostly fine with shared state on module level, web applications are not. It is true that Python currently has some issues with high concurrency and people try to fix that by forking and spawning new processes which certainly hides away the problem of shared state, but that does not solve it. In fact, very recently Facebook open sourced the Tornado framework which does very well at high concurrency by using async IO. Also this recent interest in Tornado will probably also motivate Twisted developers to improve their project's documentation and performance, because competition is often the what causes projects to improve. Now if we look at the standard library, we can see many modules that just do not work in such environments because they have some sort of shared state. The most obvious ones are certainly the `locale` module and all the other modules that change behavior based on the locale settings. Did you know that every major Python framework reimplements time formatting even for something as simple as HTTP headers, because Python does not provide a way to format the time to english strings reliably? But there are certainly more modules that have this sort of problem. Also we have many modules in the standard library that in my opinion just do not belong there. From my point of view, stuff like XML does not belong into the standard library. But it appears that not many people agree with me on this one. But even if everybody would, backwards compatibility would still be a good reason to keep these modules around. Besides modules that do not work in every environment or modules that were probably a mistake to include, we also have modules in the standard library with a hideous implementation or no reusability, forcing people to reinvent what's already there. For a long time, `urllib` was a module I would have listed there, but as of Python 2.6, the module largely improved by exposing the underlaying socket more which finally alllows us to set the timeout in a reliable way. But there are still a ton of modules in the library that cause troubles for people. `dis` is one of them. The implementation of dis prints to stdout no matter what you do. Of course you can replace sys.stdout with something else for a brief moment, but again: this is not something we should aim for or advertise because it breaks for many people. `Cookie` is a module people monkey patched for a while (badly) to support the http only flag. Not only does the code expose a weird API, it is also nearly impossible to extend and even ships cookie subclasses that use unsigned pickles and trust the client. `cgi` has again, shared state on the global namespace that alters the behavior of the lirbary. Of course it was never intended to be used by anything but `cgi`, but that leaves people reimplementing it or abusing it. So when the discussion started replacing `optparse` with `argparse`, because the former is unmaintained I became alerted. My wishes have always been the standard library to be a reliable fallback to be used if everything else fails. Something I can rely on which will not change, except for maybe some additions or modules moved to different locations. As Python developers we became used to moving import locations a lot. It it's `cPickle` or any of the element tree implementations, you name it. I wonder if the solution to this problem wouldn't be a largely improved packaging system and some sort of standardized reviewing process for the standard library. Currently there is not even an accepted style for modules ending up in the Python distribution. That, and a group of people, dedicated to standard library refactoring. The majority of libraries in the standard library are small and easy to understand, I'm sure they are perfectly suited for students on projects like GSOC or GHOP to work on. They could even be used as some sort of "playground" for new Python developers. Ubuntu recently started the "100 paper cuts" project. There people work on tiny little patches to improve the system, rather to replace components. Even though a large place of the standard library appears to be broken by design they could still be redesigned on the small scale, without breaking backwards compatibility. Of course libraries like `locale` and `logging` are hard to change, but it would still be possible. For `locale` it would probably a useful idea to go into the direction of datetime, where the timezone information is left to a 3rd party library. `locale` could provide some hooks for libraries like `babel` to fill the gap. On the other hand `Cookie` would be very easy to fix by moving the parsing code into a separate function and refactoring the cookie objects. We could probably also start a poll out there with well-selected questions of what users think about parts of the library. And for that poll it would make a lot of sense to not just ask the questions and evaluating the results, but also track the area the user is coming from (small size company, open / closed source, web development etc.). Because we all are biased and seeing results grouped by some of these factoids could be enlightening. That said, it could tell us that I'm completely wrong with my ideas of how the state of the standard library. But how realistic is it to refactor the standard library? I don't know. For a long time people were pretty sure Python will not get any faster and yet Unleaden Swallow is doing some really amazing progress. If we want to push Python foward into new areas, and the web is one of them, it is necessary to jump into the cold water and start things. Any maybe we should have some elected task forces for things like the standard library. Judging from the mailinglist it appears that far too many people are discussing *every detail* of it. It is a good idea to ask as many people as possible, but I am not sure if the mailinglist is the way to do that. It is currently very hard to see the direction in which development is heading. Please think of this email just as a suggestion. I don't have too much trust into myself to follow the discussions on this list camely enough to become a real part of a solution, but I would love to help shifting the development into a better direction, no matter which one it will be. Regards, Armin

19 71

standard metadata for the standard library (Was: Re: [Python-Dev] sharing stdlib across python implementations)
by Kevin Teague Oct. 1, 2009

Oct. 1, 2009

On Sep 30, 2009, at 7:28 AM, Chris Withers wrote: > Frank Wierzbicki wrote: >> Talk has started up again on the stdlib-sig list about finding a core >> stdlib + tests that can be shared by all implementations, potentially >> living apart from CPython. I have volunteered to put together a PEP >> on the subject, with Jessie Noller and Brett Canon are helping me >> out. >> When I have something worth showing, I'll start the real PEP process. > > I'm on on stdlib-sig and I'm afraid I don't have the bandwidth to > start on it, but I'd just like to throw in (yet again) that it would > be great if the stdlib was actually a set of separate python > packages with their own version metadata so that packaging tools > could manage them, and upgrade them independently of python packages > when there are bug fixes. Amen! Currently there are a number of penalties for package-savy developer to use packages in the standard library, since they can't use their normal tool chains to work with the standard library. Instead it has to be treated as special cases. Aside from the annoying if-else statements used to build-up install_requires fields, a few other problems the lack of metadata for the standard library poses: * Install tools work differently with 3rd party packages that have been added to the standard library. For example, simplejson and easy_install. easy_install is not supposed to upgrade a distribution if it's already installed unless the -U switch is supplied. However, do an "easy_install simplejson" (no -U switch) with Python 2.6 and the distribution is unexpectedly upgraded. This is because the metadata has been tossed out once the distribution was incorporated to the standard lib. * Bug fixes are harder. If I'm working on a project which depends upon another project, and I find a bug in that dependant project, then the preferred route to solve that problem is contact the dependant project's author(s) and see if they'll provide a fix and do a new release. Then I just update the project so that its install_requries field specifies the minimum bug-free version. If it's in the standard library though, I file a bug report, but then instead of asking for a release for the package in question, I instead have to put a work- around into the project, even if the bug has been fixed, since there is no way to specify that I just need a fix for one particular package and that work-around needs to stay in-place in the project I was working on until the minimum required version of Python for that project is equal to the Python release which provides the fix. Bleh! * What metadata that does exist about the standard library is buried in non-standard formats and isn't programmatically accessible. The maintainers field is stored in Misc/maintainers.rst, author and version is stored as module attributes (__author__ and __version__). Ideally this metadata could be collected into setup.cfg files, and when installed would live in PEP 376 .egg-info directories, and you would replace __version__ attributes with something such as : import distutils __version__ = distutils.get_distribution('packagename').version > The big changes I can see from here would be moving the tests to the > packages from the central tests directory, and adding a setup.py > file or some other form of metadata providion for each package. Not > that big now that I've written it ;-) > Yeah, this is what I was thinking. It doesn't sound big, until you count up the number of packages in the standard library ... there's more distributions in there than Zope 3! :P However, if you are relying on Distutils to write-out the metadata, you run into a bootstrapping issue, where you need to use the Python interpreter you're installing to install the standard library, but the installation requires the standard library. Maybe there are some clever ways to solve this, by fiddling with PATHs and installing Distutils first or something ... But perhaps another way to solve the problem is to not use Distutils for installation of the working set of distributions that ships with a given release of a Python interpreter. You only need to ensure that the end-result is the same, and comply with the .egg-info metadata format. It really doesn't matter if a package is installed with Distutils or not. If the metadata consumed by setup.py files is in setup.cfg files (or perhaps some kind of .egg-info templated format that the standard lib setup.py files read), then those files could be munged by some shell commands, and written out as part of the makefile during "make install". (the only tricky bits in the new .egg-info format is computing the full-path to all installed files and computing the MD5 hash). Speaking of which, there is one .egg-info file in the standard library in the old-style format ... if PEP 376 is accepted then this line in CPython's Makefile will become a bug @for i in $(srcdir)/Lib/*.py $(srcdir)/Lib/*.doc $(srcdir)/Lib/*.egg- info ; \ Although wsgiref is the only project in the standard library with metadata, so it'd be easy enough to fix this by just removing it's metadata. But if the only package with standard metadata in the standard library had it's metadata removed, it would make me sad :(

1 0