Mailman 3 August 2009 - Python-Dev

Summary of Python tracker Issues
by Python tracker Aug. 21, 2009

Aug. 21, 2009

ACTIVITY SUMMARY (08/14/09 - 08/21/09) Python tracker at http://bugs.python.org/ To view or respond to any of the issues listed below, click on the issue number. Do NOT respond to this message. 2353 open (+40) / 16226 closed (+15) / 18579 total (+55) Open issues with patches: 928 Average duration of open issues: 656 days. Median duration of open issues: 411 days. Open Issues Breakdown open 2322 (+40) pending 30 ( +0) Issues Created Or Reopened (57) … [View More]

1 0

Two laments about CPython's AST Nodes
by Frank Wierzbicki Aug. 21, 2009

Aug. 21, 2009

Before I start complaining, I want to mention what a huge help it has been to be able to directly compare the AST exposed by ast.py in making Jython a better Python. Thanks for that! Now on to the complaints: Though I recently added support for this in Jython, I don't like that nodes can be defined without required attributes, for example: node = ast.Assign() Is valid, even though it requires "node.targets" and "node.value" (I'm less concerned about the required lineno and col_offset, as I … [View More]

4 7

standard library mimetypes module pathologically broken?
by Jacob Rus Aug. 20, 2009

Aug. 20, 2009

Hi all, In an attempt to figure out some twisted.web code, I was reading through the Python Standard Library’s mimetypes module today, and was shocked at the poor quality of the code. I wonder how the mimetypes code made it into the standard library, and whether anyone has ever bothered to read it or update it: it is an embarrassment. Much of the code is redundant, portions fail to execute, control flow is routed through a horribly confusing mess of spaghetti, and most of the complexity has … [View More]no clear benefit as far as I can tell. I probably should drop the subject and get back to work, but as a good citizen, it’s hard to just ignore this sort of thing. mimetypes.py stores its types in a pair of dictionaries, one for "strict" use, and the other for "non-standard types". It creates the strict dictionary by default out of apache's mime.types file, and then overrides the entries it finds with a set of exceptions. Then it creates the non-standard dictionary, which is set to match if the strict parameter is set to False when guessing types. Just in this basic design, and in the list of types in the file, there are several problems: * Various apache mime types files are read, if found, but the ordering of the files is such that older versions of apache are sometimes read after newer ones, overriding updated mime types with out-of-date versions if multiple versions of apache are installed on the system. * The vast majority of types declared in mimetypes.py are duplicates of types already declared by Apache. In a few cases this is to change the apache default (make an exception, that is), but in most cases the mime type and extension are completely identical. This huge number of redundant types makes the file substantially harder to follow. No comments are provided to explain why various sets of exceptions are made to Apache's default mime types, and in several cases mimetypes.py seems to just be out of date as compared to recent versions of Apache, for instance not knowing about the 'text/troff' type which was registered in January 2006 in RFC 4263. * The 'non-standard' type dictionary is nearly useless, because all of the types it declares are already in apache's mime.types file, meaning that types are, as far as I can tell trying to follow ugly program flow, *never* drawn from the non-strict dictionary, except in the improbable situation where the mimetypes module is initialized with a custom set of apache-mime.types–like files, which does not include those 'non-standard' types. I personally cannot see a use case for initializing the module with a custom set of mime types, but then leaving the very few types included as non-strict to the defaults: this seems like a fragile and pathological use case. Given this, I don’t see any benefit to dragging the 'strict' parameter along all the way through the code, and would advise getting rid of it altogether. Does anyone know of any code that uses the mimetypes module with strict set to False, where the non-strict code path ever *actually* is executed? But though these problems, which affect actual use of the code and are therefore probably most important, are significant, they really pale in comparison to the awful quality of implementation. I'll try to briefly outline my understanding of how code flows in mimetypes.py, and what the problems are. I haven't stepped through the code in a debugger, this is just from reading it, so I apologize in advance if I get something wrong. This is, however, some of the worst code I’ve seen in the standard library or anywhere else. * It defines __all__: I didn’t even realize __all__ could be used for single-file modules (w/o submodules), but it definitely shouldn’t be here. This specific __all__ oddly does not include all of the documented variables and functions in the mimetypes class. It’s not clear why someone calling import * here wouldn’t want the bits not included. * It creates a _default_mime_types() function which declares a bunch of global variables, and then immediately calls _default_mime_types() below the definition. There is literally no difference in result between this and just putting those variables at the top level of the file, so I have no idea why this function exists, except to make the code more confusing. * It allows command line usage: I don’t think this is necessary for a part of the standard library like this. There are better tools for finding mime types from the command line which ship with most operating systems. * Its API is pretty poorly designed. It offers 6 functions when about 3 are needed, and it takes a couple reads-through of the code to figure out exactly what any of them are supposed to do. * The operation is crazy: It defines a MimeTypes class which actually stores the type mappings, but this class is designed to be a singleton. The way that such a design is enforced is through the use of the module-global 'init' function, which makes an instance of the class, and then maps all of the functions in the module global namespace to instance methods. But confusingly, all such functions are also defined independently of the init function, with definitions such as: def guess_type(url, strict=True): if not inited: init() return guess_type(url, strict) I’d be amazed if anyone could guess what that code was trying to do. I did a double-take when I saw it. Of course, that return call is only ever reached the first time this function is called, if init() has not happened yet. This was all presumably done for lazy initialization, so that the type information would only be loaded when needed. Needless to say, there are more pythonic ways to accomplish such a goal. Oh, also, the other good one here is that it means that someone who writes `from mimetypes import guess_types` gets something different than someone who writes: `import mimetypes; guess_types = mimetypes.guess_types`. In the former case, this wrapper function is saved as guess_type, which each time just calls the (changed after init()) mimetypes.guess_types function. This caused a performance nightmare before March of this year, when there was no check for `if not inited` before running init() (amazing!?). * Because the type datastore is set up to be a singleton, any time init() is called in one section of code, it resets any types which have been added manually: this means that if init() is called by different pieces of code in the same python program, they will interfere with each-others’ type databases, and break each-other. This is extremely fragile and, in my opinion, crazy. It is hard for me to imagine any use case that would benefit from this ability to clobber custom type mappings, and I very much doubt that any code calling the mimetypes module realizes that the contract of the API is so flimsy by definition. In practice, I would not advise consumers of this API to ever call init() manually, or to ever add custom mime type mappings, because they are setting themselves up for hard-to-track bugs down the line. * The 'inited' flag is a documented part of the interface, in the standard library documentation. I cannot imagine any reason to set this flag manually: setting it to false when it was true will have no effect, because the top-level functions have already been replaced by instance methods of the 'db' MimeTypes instance. Setting it to true when it was false will make the code just break outright. * In python 3, this has been changed a bit. There’s still an inited flag, and it still in the docs, but now awful code from above has been changed slightly, to: def guess_type(url, strict=True): if _db is None: init() return _db.guess_type(url, strict) Which is still embarrassingly confusing. On the upside, the inited flag now does literally nothing, but remains defined, and in the docs. * The 'types_map' and 'common_types' (for 'strict' and 'common' types, respectively) dictionaries are also a documented part of the interface. When init() is called, a new MimeTypes instance makes a (different) types_map which is a tuple of two dictionaries, for 'strict' and 'common' types. Then this instance reads the apache mime.types files and adds the types to its pair of self.types_map dictionaries, and then after that looks at the global types_map and common_types dictionaries and adds *those* types to its self.types_map. Then at the end it replaces the global types_map with self.types_map[True] and replaces common_types with self.types_map[False]. Unfortunately, while changing these dictionaries will have an effect on the operation of the library, it will not update the types_map_inv mapping, so inverse lookups will not behave as the changer expects. If these dictionaries are going to remain documented, the documentation should be clear to describe them as read only to avoid very confusing bugs. * Speaking of these dictionaries, .copy() is called on those two and a few other inside MimeTypes.__init__(), which happens every time the global init() function is called, but then init() puts the copies back in the global namespace, meaning that the original is discarded. Basically the only reason for the .copy() is to make sure that the correct updates are applied to the apache mimetype defaults, but the code will gladly re-read all of the apache files even after its mapped types are already in these dictionaries, essentially making re-initializing a (very expensive) no-op. All we’re doing is a lot of unnecessary extra disk reads and memory allocations and deallocations. The only time this has any effect is when a non-singleton MimeTypes instance is created, as in the read_mime_types function. * And that read_mime_types function is a doozy. It tries to open a filename, spits back None if there’s an IOError (instead of raising the exception as it should), and then creates a new MimeTypes instance (remember, this is identical to the singleton MimeTypes instance because it starts itself from that one’s mappings), adds any new types it finds in the file with that name, and then returns the 'strict' types_map from it. I’m not sure whether any sane user of this API would expect it to return the existing type mappings *plus* the extra ones in the provided filename, but I really can’t imagine this function ever being particularly useful: it requires you are reading mime types in apache format, but not the apache mime type files you already looked at, and then the only way to find out what new mappings were defined is to take the difference of the default mappings with the result of the function. * The code itself, on a line-by-line basis, is unpythonic and unnecessarily verbose, confusing, and slow. The code should be rewritten to use python 2.3–2.6 features: even leaving its functionality identical it could be cut to about half the number of lines, and made clearer. In case the above doesn’t make this clear: this code is extremely confusing. Trying to read it has caused all the people around me to look up as I shout "what the fuck??!" at the screen every few minutes, as each new revelation gives another surprise. I’m not convinced that I completely understand what the code does, because it has been quite effectively obfuscated, but I understand enough to want to throw the whole thing out, and start essentially from scratch. So the question is, what should be done about this? I’d like to hear how people use the mimetypes module, and how they expect it to work, to figure out the sanest possible mostly-backwards-compatible replacement which could be dropped in (ideally this would just allow the use of default mimetypes and rip out the ability to alter the default datastore: or is there some easy way to change this away from a singleton without breaking code which calls these methods?), and then extend that replacement to support a somewhat saner model for anyone who actually wants to extend the set of mappings. My guess is that replacement code could actually fix subtle bugs in existing uses of this module, by people who had a sane expectation of how it was supposed to work. At the very least, the parts about figuring out exactly which exceptions to Apache’s set of default types are useful would be a good idea, and I’d maybe even recommend including an up-to-date copy of Apache’s mime.types file in the Python Standard Library, and then only overriding its definitions for future versions of Apache (and then overriding the combination of both of those with further exceptions deemed useful for python, with comments explaining why each exception), so that we’re not bothering to look up horribly out-of-date types in multiple locations from Apache 1, 1.2, 1.3, etc. I’d also recommend making the API for overriding definitions be the same as the code used to declare the default overrides, because as it is there are three ways do define types: a) in a mime.types formatted file, b) in a python dictionary that gets initialized with a confusing bit of code, and c) through the add_type function. Does anyone else have thoughts about this, or maybe some good (it had better be *really* good) explanations why this code is the way it is? I'd be happy to try to rewrite it, but I think I’d need a bit of help figuring out how to make the rewrite backwards-compatible. Note: someone else has had fun with this module: <http://lucumr.pocoo.org/2009/3/1/the-1000-speedup-or-the-stdlib-sucks> <http://lucumr.pocoo.org/2009/7/24/singletons-and-their-problems-in-python> Cheers, Jacob Rus [View Less]

14 29

Microsoft MSDN
by Steve Holden Aug. 20, 2009

Aug. 20, 2009

I sent fourteen requests for licenses in to Microsoft. I've asked them to let me know which they grant (since they may choose to limit the number) and will inform you all personally when I hear their decision. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Watch PyCon on video now! http://pycon.blip.tv/

4 3

request for comments - standardization of python's purelib and platlib
by Jan Matejek Aug. 20, 2009

Aug. 20, 2009

Hello, I'm cross-posting this to distributions@freedesktop and python-dev, because the topic is relevant to both groups and should be solved in cooperation. The issue: In Python's default configuration (on linux), both purelib (location for pure python modules) and platlib (location for platform-dependent binary extensions) point to $prefix/lib/pythonX.Y/site-packages. That is no good for two main reasons. One, python depends on the "lib" directory. (from distro's point of view, prefix is /… [View More]usr, so let's talk /usr/lib) Due to this, it's impossible to install python under /usr/lib64 without heavy patching. Repeated attempts to bring python developers to acknowledge and rectify the situation have all failed (common argument here is "that would mean redesign of distutils and huge parts of whatnot"). Conversely, that also means that multiarch setup (/usr/lib or lib32 with 32bit python and /usr/lib64 with 64bit python) is not possible with stock python. Two, the default configuration makes purelib and platlib identical, which somehow defeats the purpose of the distinction in the first place. You either need to patch the default, or supply some alternate configuration to take advantage of this feature. And that's not the end of it - the next step is to make python aware of two different locations on sys.path, one for purelib and one for platlib, which is a different story altogether. As distributors, we like to take advantage of purelib/platlib separation to package pure python modules as platform-independent (noarch for rpm-speakers). And that's not easy to do properly. The proposal: Let's put our heads together and choose good default locations for purelib and platlib. Then add support to python for recognizing the locations by default, and possibly leave note in FHS that "this is the place". This is IMO a good first step to making python multiarch-aware, and it would also help a bit with LSB integration [1]. I've come up with three basic options for the configuration (substitute "/usr" with "$prefix" if you're not a distributor). This list is by no means comprehensive, it's just what looked reasonable at the time of writing. 1 - the traditional way purelib = /usr/lib/pythonX.Y/site-packages platlib = /usr/lib(64)/pythonX.Y/site-packages pros: + this is already the default for 32bit systems + major distributions (including Fedora, Mandriva and now finally openSUSE too) do this cons: - 32bit systems have no separation, poor they! - with multiarch setup, /usr/lib is "cluttered" by both platform-dependent files for 32bit and platform-independent files shared by the platforms. Also, 64bit python can pick up 32bit modules. That doesn't cause problems in practice, but doesn't fell like a clean design. 2 - the sharedir way purelib = /usr/share/python/X.Y platlib = /usr/lib(64)/pythonX.Y/site-packages pros: + clean separation of purelib - nice! + unheard of - a good place to start anew cons: - FHS states that /usr/share is for data. But OTOH, they don't say much about platform-independent bytecode. We could probably get an exception for this. - unheard of - everyone will be surprised 3 - the perl way purelib = /usr/lib/pythonX.Y platlib = /usr/lib/pythonX.Y/lib-dynload-(platform-identifier)/site-packages pros: + possibility of multiarch packages that would install pure python parts into purelib and extensions or accelerators for more platforms at once - and therefore, possibility to split large modules into platform-dependent and platform-independent parts and save space on installation media + "idea compatibility" with perl and ruby, one less install layout to learn cons: - completely different from what we have now - would require the most work from both python developers and distributions comments? regards jan matejek python packager for SUSE Linux [1] http://www.linuxfoundation.org/en/LsbPython [View Less]

5 7

VC++ versions to match python versions?
by Chris Withers Aug. 18, 2009

Aug. 18, 2009

Hi All, Is the Express Edition of Visual C++ 2008 suitable for compiling packages for Python 2.6 on Windows? (And Python 2.6 itself for that matter...) Ditto for 2.5, 3.1 and the trunk (which I guess becomes 3.2?) cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk

7 9

functools.compose to chain functions together
by Jason R. Coombs Aug. 18, 2009

Aug. 18, 2009

I'd like to express additional interest in python patch 1660179, discussed here: http://mail.python.org/pipermail/patches/2007-February/021687.html On several occasions, I've had the desire for something like this. I've made due with lambda functions, but as was mentioned, the lambda is clumsy and harder to read than functools.compose would be. A potentially common use-case is when a library has a multi-decorator use case in which they want to compose a meta decorator out of one or … [View More]more individual decorators. Consider the hypothetical library. # we have three decorators we use commonly def dec_register_function_for_x(func): # do something with func return func def dec_alter_docstring(func): # do something to func.__doc__ return func def inject_some_data(data): def dec_inject_data(func): func.data = data # this may not be legal, but assume it does something useful return func return dec_inject_data # we could use these decorators explicitly throughout our project @dec_register_function_for_x @dec_alter_docstring @dec_inject_some_data('foo data 1') def our_func_1(params): pass @dec_register_function_for_x @dec_alter_docstring @dec_inject_some_data('foo data 2') def our_func_2(params): pass For two functions, that's not too onerous, but if it's used throughout the application, it would be nice to abstract the collection of decorators. One could do this with lambdas. def meta_decorator(data): return lambda func: dec_register_function_for_x(dec_alter_docstring(dec_inject_some_data(data)(f unc))) But to me, a compose function is much easier to read and much more consistent with the decorator usage syntax itself. def meta_decorator(data): return compose(dec_register_function_for_x, dec_alter_docstring, dec_inject_some_data(data)) The latter implementation seems much more readable and elegant. One doesn't even need to know the decorator signature to effectively compose meta_decorators. I've heard it said that Python is not a functional language, but if that were really the case, then functools would not exist. In addition to the example described above, I've had multiple occasions where having a general purpose function composition function would have simplified the implementation by providing a basic functional construct. While Python isn't primarily a functional language, it does have some functional constructs, and this is one of the features that makes Python so versatile; one can program functionally, procedurally, or in an object-oriented way, all within the same language. I admit, I may be a bit biased; my first formal programming course was taught in Scheme. Nevertheless, I believe functools is the ideal location for a very basic and general capability such as composition. I realize this patch was rejected, but I'd like to propose reviving the patch and incorporating it into functools. Regards, Jason [View Less]

12 24

another Py_TPFLAGS_HEAPTYPE question
by Joshua Haberman Aug. 17, 2009

Aug. 17, 2009

I wrote to this list a few weeks ago asking about Py_TPFLAGS_HEAPTYPE (http://thread.gmane.org/gmane.comp.python.devel/105648). It occurred to me today that I could probably make object instances INCREF and DECREF my type appropriately, without setting Py_TPFLAGS_HEAPTYPE, by writing my own tp_alloc and tp_dealloc functions. My tp_alloc function could be: PyObject *my_tp_alloc(PyTypeObject *type, Py_ssize_t nitems) { PyObject *obj = PyType_GenericAlloc(type, nitems); if(obj) Py_INCREF(… [View More]

4 8

PEP Submission
by Eric Pruitt Aug. 17, 2009

Aug. 17, 2009

Several days ago, around the time the python.org servers went down, I submitted a PEP to editor(a)python.org. When things to have been worked, I submitted the PEP again. I have not seen any activity on the PEP in Python-Dev or any reply acknowledging that it was received. Did I misunderstand the process of submitting a PEP?

3 2

FAO John Arbash Meinel
by Chris Withers Aug. 17, 2009

Aug. 17, 2009

Mail Delivery System wrote: > This is the mail system at host server1.simplistix.co.uk. > > I'm sorry to have to inform you that your message could not > be delivered to one or more recipients. It's attached below. > > For further assistance, please send mail to postmaster. > > If you do so, please include this problem report. You can > delete your own text from the attached returned message. > > The mail system > > <john(a)… [View More]

1 0