[stdlib-sig] Evolving the Standard Library

Wed Sep 16 15:25:25 CEST 2009

Hello,

I'll just comment on some specific points:

> A quick look at the
> mail archives confirms what I was afraid of: this list is really high
> traffic.

Actually, it was very low traffic until those recent threads were
spawned.
I'm probably guilty of some of the traffic :-)

> It is true
> that Python currently has some issues with high concurrency and people
> try to fix that by forking and spawning new processes which certainly
> hides away the problem of shared state, but that does not solve it.  In
> fact, very recently Facebook open sourced the Tornado framework which
> does very well at high concurrency by using async IO.  Also this recent
> interest in Tornado will probably also motivate Twisted developers to
> improve their project's documentation and performance, because
> competition is often the what causes projects to improve.

First, I'm not sure what it has to do with the stdlib.
Second, if you look at the HTTP implementation in Tornado, it does not
handle 1/10th of the spec. Basically, it parses headers and handles a
couple of them (Content-Length, perhaps another one). It's not difficult
to write a fast HTTP server if you only need to support one smallish
part of the spec, and then to show impressive "Hello, World" benchmarks.
(besides, Tornado seems platform-specific since it explicitly uses
epoll)

The way Tornado was promoted looks like a marketing stunt. Glyph
Lefkowitz had a very reasonable answer to it on his blog.

(and, in any case, if you need speedy HTTP, just use mod_wsgi. There's
no need to try and look fancy by using a pure-Python async server, which
will always be much less tested, supported and documented than Apache
is; not to mention the wealth of plugins which are available to
customize Apache behaviour)

> The most obvious ones are certainly the `locale` module
> and all the other modules that change behavior based on the locale
> settings.  Did you know that every major Python framework reimplements
> time formatting even for something as simple as HTTP headers, because
> Python does not provide a way to format the time to english strings
> reliably?

Yes, it is very annoying.
Please note, however, that the locale module addresses a specific need,
which is to interface with the system-level locale mechanism. The global
state comes from this and is not caused by the design of the module
itself. While it does limit its uses a lot (and makes it fragile because
of system variations), it is still useful precisely when what you want
is to rely on the system's locale mechanism.

I don't know if including something like Babel in the stdlib would be a
good thing. It depends on the size of it, and the required maintenance
(I suppose there is a continuous flow of patches, as long as new
languages/cultures get supported?).
Making locale being able to delegate to Babel sounds awkward. Just tell
people to use Babel if they need to (whether it is in the stdlib, or
not).

> Also we have many modules in the standard library that in my opinion
> just do not belong there.  From my point of view, stuff like XML does
> not belong into the standard library.  But it appears that not many
> people agree with me on this one.

I would disagree indeed :)
Things like XML and JSON in the standard library are very useful,
because they provide a proven and reliable way to parse standardized
formats without having to install any third-party library.
Being able to do this kind of thing without installing additional stuff
is especially useful when writing small scripts.
(moreover, those libraries often have C accelerators, which might be
non-trivial to package properly or install manually on Windows
platforms)

> `dis` is
> one of them.  The implementation of dis prints to stdout no matter what
> you do.  Of course you can replace sys.stdout with something else for a
> brief moment, but again: this is not something we should aim for or
> advertise because it breaks for many people.

Sure, but `dis` is used mainly by the core developers themselves, for
testing and development purposes, and for these uses it is fine.
Besides, it is certainly possible to propose an extension of the API so
as to direct the output to another file-like object.

> Ubuntu recently started the "100 paper cuts" project.  There people work
> on tiny little patches to improve the system, rather to replace
> components.  Even though a large place of the standard library appears
> to be broken by design they could still be redesigned on the small
> scale, without breaking backwards compatibility.

This "call to arms" can be a good idea. But we have to be able to
channel it and appropriately review / validate the submitted changes.

> But how realistic is it to refactor the standard library?  I don't know.

It depends what you mean by "refactor". It doesn't sound very precise :)
I think it's better to discuss proposed changes case by case rather than
trying to reach a consensus on such vague terms.

> It is a good idea to
> ask as many people as possible, but I am not sure if the mailinglist is
> the way to do that.

If you have precise feature requests or bugs to report, the bug tracker
might indeed be a better place.
Especially if you have patches ready :-)

Regards

Antoine.