Mailman 3 April 2007 - Python-ideas

Ideas towards GIL removal
by Greg Ewing 18 Jul '08

18 Jul '08

I've been thinking about some ideas for reducing the amount of refcount adjustment that needs to be done, with a view to making GIL removal easier. 1) Permanent objects In a typical Python program there are many objects that are created at the beginning and exist for the life of the program -- classes, functions, literals, etc. Refcounting these is a waste of effort, since they're never going to go away. So perhaps there could be a way of marking such objects as "permanent" or "immortal". Any refcount operation on a permanent object would be a no-op, so no locking would be needed. This would also have the benefit of eliminating any need to write to the object's memory at all when it's only being read. 2) Objects owned by a thread Python code creates and destroys temporary objects at a high rate -- stack frames, argument tuples, intermediate results, etc. If the code is executed by a thread, those objects are rarely if ever seen outside of that thread. It would be beneficial if refcount operations on such objects could be carried out by the thread that created them without locking. To achieve this, two extra fields could be added to the object header: an "owning thread id" and a "local reference count". (The existing refcount field will be called the "global reference count" in what follows.) An object created by a thread has its owning thread id set to that thread. When adjusting an object's refcount, if the current thread is the object's owning thread, the local refcount is updated without locking. If the object has no owning thread, or belongs to a different thread, the object is locked and the global refcount is updated. The object is considered garbage only when both refcounts drop to zero. Thus, after a decref, both refcounts would need to be checked to see if they are zero. When decrementing the local refcount and it reaches zero, the global refcount can be checked without locking, since a zero will never be written to it until it truly has zero non-local references remaining. I suspect that these two strategies together would eliminate a very large proportion of refcount-related activities requiring locking, perhaps to the point where those remaining are infrequent enough to make GIL removal practical. -- Greg

8 19

PEP for executing a module in a package containing relative imports
by Brett Cannon 07 May '07

07 May '07

Some of you might remember a discussion that took place on this list about not being able to execute a script contained in a package that used relative imports (read the PEP if you don't quite get what I am talking about). The PEP below proposes a solution (along with a counter-solution). Let me know what you think. I especially want to hear which proposal people prefer; the one in the PEP or the one in the Open Issues section. Plus I wouldn't mind suggestions on a title for this PEP. =) ------------------------------------------- PEP: XXX Title: XXX Version: $Revision: 52916 $ Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $ Author: Brett Cannon Status: Draft Type: Standards Track Content-Type: text/x-rst Created: XXX-Apr-2007 Abstract ======== Because of how name resolution works for relative imports in a world where PEP 328 is implemented, the ability to execute modules within a package ceases being possible. This failing stems from the fact that the module being executed as the "main" module replaces its ``__name__`` attribute with ``"__main__"`` instead of leaving it as the actual, absolute name of the module. This breaks import's ability to resolve relative imports from the main module into absolute names. In order to resolve this issue, this PEP proposes to change how a module is delineated as the module that is being executed as the main module. By leaving the ``__name__`` attribute in a module alone and setting a module attribute named ``__main__`` to a true value for the main module (and thus false in all others), proper relative name resolution can occur while still having a clear way for a module to know if it is being executed as the main module. The Problem =========== With the introduction of PEP 328, relative imports became dependent on the ``__name__`` attribute of the module performing the import. This is because the use of dots in a relative import are used to strip away parts of the calling module's name to calcuate where in the package hierarchy a relative import should fall (prior to PEP 328 relative imports could fail and would fall back on absolute imports which had a chance of succeeding). For instance, consider the import ``from .. import spam`` made from the ``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package itself, i.e., does not define ``__path__``). Name resolution of the relative import takes the caller's name (``bacon.ham.beans``), splits on dots, and then slices off the last n parts based on the level (which is 2). In this example both ``ham`` and ``beans`` are dropped and ``spam`` is joined with what is left (``bacon``). This leads to the proper import of the module ``bacon.spam``. This reliance on the ``__name__`` attribute of a module when handling realtive imports becomes an issue with executing a script within a package. Because the executing script is set to ``'__main__'``, import cannot resolve any relative imports. This leads to an ``ImportError`` if you try to execute a script in a package that uses any relative import. For example, assume we have a package named ``bacon`` with an ``__init__.py`` file containing:: from . import spam Also create a module named ``spam`` within the ``bacon`` package (it can be an empty file). Now if you try to execute the ``bacon`` package (either through ``python bacon/__init__.py`` or ``python -m bacon``) you will get an ``ImportError`` about trying to do a relative import from within a non-package. Obviously the import is valid, but because of the setting of ``__name__`` to ``'__main__'`` import thinks that ``bacon/__init__.py`` is not in a package since no dots exist in ``__name__``. To see how the algorithm works, see ``importlib.Import._resolve_name()`` in the sandbox [#importlib]_. Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed. The Solution ============ The solution to the problem is to not change the value of ``__name__`` in modules. But there still needs to be a way to let executing code know it is being executed as a script. This is handled with a new module attribute named ``__main__``. When a module is being executed as a script, ``__main__`` will be set to a true value. For all other modules, ``__main__`` will be set to a false value. This changes the current idiom of:: if __name__ == '__main__': ... to:: if __main__: ... The current idiom is not as obvious and could cause confusion for new programmers. The proposed idiom, though, does not require explaining why ``__name__`` is set as it is. With the proposed solution the convenience of finding out what module is being executed by examining ``sys.modules['__main__']`` is lost. To make up for this, the ``sys`` module will gain the ``main`` attribute. It will contain a string of the name of the module that is considered the executing module. A competing solution is discussed in `Open Issues`_. Transition Plan =============== Using this solution will not work directly in Python 2.6. Code is dependent upon the semantics of having ``__name__`` set to ``'__main__'``. There is also the issue of pre-existing global variables in a module named ``__main__``. To deal with these issues, a two-step solution is needed. First, a Py3K deprecation warning will be raised during AST generation when a global variable named ``__main__`` is defined. This will help with the detection of code that would reset the value of ``__main__`` for a module. Without adding a warning when a global variable is injected into a module, though, it is not fool-proof. But this solution should cover the vast majority of variable rebinding problems. Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if __name__ == '__main__': ...`` idiom to the new one. While it will not help with code that checks ``__name__`` outside of the idiom, that specific line of code makes up a large proporation of code that every looks for ``__name__`` set to ``'__main__'``. Open Issues =========== A counter-proposal to introducing the ``__main__`` attribute on modules was to introduce a built-in with the same name. The value of the built-in would be the name of the module being executed (just like the proposed ``sys.main``). This would lead to a new idiom of:: if __name__ == __main__: ... The perk of this idiom over the one proposed earlier is that the general semantics does not differ greatly from the current idiom. The drawback is that the syntactic difference is subtle; the dropping of quotes around "__main__". Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug. The other pro of this proposal over the earlier one is the alleviation of requiring import code to have to set the value of ``__main__``. By making it a built-in variable import does not have to care about ``__main__`` as executing the code itself will pick up the built-in ``__main__`` itself. This simplies the implementation of the proposal as it only requires setting a built-in instead of changing import to set an attribute on every module that has exactly one module have a different value (much like the current implementation has to do to set ``__name__`` in one module to ``'__main__'``). References ========== .. [#2to3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC] .. [#importlib] importlib (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=mar…) [ViewVC] Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

14 58

fixing mutable default argument values
by Chris Rebert 02 May '07

02 May '07

If A.M. Kuchling's list of Python Warts is any indication, Python has removed many of the warts it once had. However, the behavior of mutable default argument values is still a frequent stumbling-block for newbies. It is also present on at least 3 different lists of Python's deficiencies ([0][1][2]). Example of current, unintuitive behavior (snipped from [0]): >>> def popo(x=[]): ... x.append(666) ... print x ... >>> popo() [666] >>> popo() [666, 666] >>> popo() [666, 666, 666] Whereas a newbie with experience with immutable default argument values would, by analogy, expect: >>> popo() [666] >>> popo() [666] >>> popo() [666] In scanning [0], [1], [2], and other similar lists, I have only found one mediocre use-case for this behavior: Using the default argument value to retain state between calls. However, as [2] comments, this purpose is much better served by decorators, classes, or (though less preferred) global variables. Other uses are alluded to be equally esoteric and unpythonic. To work around this behavior, the following idiom is used: def popo(x=None): if x is None: x = [] x.append(666) print x However, why should the programmer have to write this extra boilerplate code when the current, unusual behavior is only relied on by 1% of Python code? Therefore, I propose that default arguments be handled as follows in Py3K: 1. The initial default value is evaluated at definition-time (as in the current behavior). 2. That in a function call where the caller has not specified a value for an optional argument, Python calls copy.deepcopy(initial_default_value), and fills in the optional argument with the resulting value. This is fully backwards-compatible with the aforementioned workaround, and removes the need for the it, allowing one to write the first, simpler definition of popo(). Comments? - Chris Rebert [0] 10 Python pitfalls (http://zephyrfalcon.org/labs/python_pitfalls.html) [1] Python Gotchas (http://www.ferg.org/projects/python_gotchas.html#contents_item_6) [2] When Pythons Attack (http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2)

9 25

ordered dict
by Mathias Panzenböck 01 May '07

01 May '07

Some kind of ordered dictionary would be nice to have in the standard library. e.g. a AVL tree or something like that. It would be nice so we can do things like that: for value in tree[:end_key]: do_something_with(value) del tree[:end_key] A alternative would be just to sort the keys of a dict but that's O( n log n ) for each sort. Depending on what's the more often occurring case (lookup, insert, get key-range, etc.) a other kind of dict object would make sense. What do you think? -panzi

5 18

Python package files
by Adam Atlas 30 Apr '07

30 Apr '07

I think it would be useful for Python to accept imports of standalone files representing entire packages, maybe with the extension .pyp. A package file would basically be a ZIP file, so it would follow fairly easily from the current zipimport mechanism... its top-level directory would be the contents of a package named by the outer ZIP file. In other words, suppose we have a ZIP file called "package.pyp", and at its top level, it contains "__init__.py" and "blah.py". Anywhere this can be located, it would be equivalent to a physical directory called "package" containing those two files. So you can simply do "import package" as usual, regardless of whether it's a directory or a .pyp. A while ago I wrote a program called Squisher that does this (it takes a ZIP file and turns it into an importable .pyc file), but it's a huge hack. The hackishness mainly comes from my desire to not require users of Squished packages to install Squisher itself; so each module basically has to bootstrap itself, adding its own import hook and then adding its own path to sys.path and shuffling around a couple of things in sys.modules. All that could be avoided if this were a core feature; I expect a straightforward import hook would suffice. As PEP 302 says, "Distributing lots of source or pyc files around is not always appropriate, so there is a frequent desire to package all needed modules in a single file." It's very useful to be able to download a single file, plop it into a directory, and immediately be able to import it like any .py or .pyc file. Eggs are nice, but having to manually add them to sys.path or install them system-wide with setuptools is not always ideal.

5 5

Re: [Python-ideas] Minor suggestion for unittest
by Terry Jones 29 Apr '07

29 Apr '07

Hi Collin Thanks for the reply. | It sounds like what you're looking for is FunctionTestCase | (http://docs.python.org/lib/unittest-contents.html). Using that, your | loop above becomes something like | | for testFunc, expectedResult in MyTestData: | def tester(): | self.assertEqual(testFunc(), expectedResult) | suite.addTest(FunctionTestCase(tester)) I had read about FunctionTestCase but it didn't seem to be what I was looking for - though it's the closest. FunctionTestCase is intended to allow people to easily bring a set of pre-existing tests under the umbrella of unittest. It overrides setUp and tearDown, and doesn't result in the test being a first-class test like those you get when you write tests for unittest from scratch (using TestCase directly, or something you write based on it). I want to dynamically (i.e. at run time) add functions that are treated equally with those that are added statically in python code. That could be really simple (and I can hack around it to achieve it), but the current method unittest uses to set its self._testMethodName prevents me from doing this in a nice way (because TestCase.__init__ immediately does a hasattr to look for the named method, and fails if it's absent). I wonder if I'm being clear... it's pretty simple, but my explanation may not be so good. Regards, Terry

2 1

The case against static type checking, in detail (long)
by Talin 27 Apr '07

27 Apr '07

(This is a fragment of an email that I sent to Guido earlier, I mention this here so that Guido can skip reading it. Of course, I recognize that most people here already know all this - but since it relates to the recent discussion about the value of type checking, I'd like to post it here as a kind of "manifesto" of why Python is the way it is.) Strongly-typed languages such as C++ and Java require up-front declaration of everything. It is the nature of such languages that there is a lot of cross-checking in the compiler between the declaration of a thing and its use. The idea is to prevent programmer errors by insuring internal consistency. However, we find in practice that much of the programmer's effort is spent in maintaining this cross-checking structure. To use a building analogy, a statically-typed program is like a truss structure, where there's a complex web of cross-braces, and a force applied at any given point is spread out over the whole structure. Each time such a program is modified, the programmer must partially dismantle and then re-assemble the existing structure. This takes time. It also clutters the code. Reading the source code of a program written in a statically typed language reveals that a substantial part of the text serves only to support the compile-time checking of definitions, or provides visual redundancy to aid the programmer in connecting two aspects of the program which are defined far apart. An example of what I mean is the use of variable type declarations - even in a statically typed language, it would be fairly easy for the compiler to automatically infer most variable types if the language were designed that way; The fact that the programmer is required to manually specify these types serves as an additional consistency check on the code. However, time spend serving the needs of these consistency checks is time away from actually serving the functional purpose of the code. Programmers in Python, on the other hand, not only need not worry about type declarations, they also spend much less time worrying about converting from one type to another in order to meet the constraints of a particular API. This is one of the reasons why I can generally write Python code about 4 times as fast as the C++ equivalent. (Understand that this is coming from someone who loves working in C++ and Java and has used them daily for the last 15 years. At the same time, however, I also enjoy programming in Python and I recognize that each language has their strengths.) There is also the question of how much static typing helps improve program reliability. In statically typed languages, there are two kinds of ways that types are used. In languages such as C and Pascal, the type declarations serve primarily as a consistency check. However, in C++ template metaprogramming, and in languages like Haskell, there is a second use for types, which is to provide a kind of type calculus or type inferencing, which gives additional expressive power to the language. C++ templates can act as powerful code generators, allowing the programmer to program in ever higher levels of abstraction and express the basic ideas even more succinctly and clearly than before. In a rapid-prototyping environment, the second use of types can be a major productivity win; However I would argue that the first use of types, consistency checking, is less beneficial, and is often more of a distraction to the programmer than a help. Yes, static type checking does detect some errors; But it also causes errors by making the code larger and more wordy, because that the programmer cannot hold large portions of the program in their mind all at once, which can lead to errors in overall design. It means the programmer spends more time thinking about the behavior of individual variables and less about the algorithm as a whole. At this point, I want to talk about a related matter, another fundamental design aspect of Python which I call "decriminalization of minor errors". An example of this is illustrated by the recent discussion over string slicing. As you know, when you attempt to index a string with a slice object that extends outside of the bounds of the string, the range is silently truncated. Some argued that Python should be more strict, and report an error when this occurs - but instead, it was reaffirmed that the current behavior is correct. I would agree that this current behavior is the more Pythonic, and is part of a general pattern, which I shall attempt to describe: To "decriminalize" an error means to find some alternative interpretation of the programmer's intent that produces a non-error result. That is, given a syntactical construct, and a choice of several interpretations of what that construct should mean, attempt to pick an interpretation that, when executed, does not produce an error. In the design of the Python language, it is a regular practice to decriminalize minor errors, provided that the alternative interpretation can meet some fairly strict criteria: That it is useful, intuitive, reasonable, and pedagogically sound. Note that this is a much more conservative rule than that used by languages such as Rexx, Javascript, and Perl, languages which make "heroic efforts" to bend the interpretation of an operation to a non-error result. Python does not do this. Nor is decriminalizing errors isn't the same as ignoring errors. Errors are still, and should be, enforced vigorously and reported. The distinction is that decriminalizing an error results in code that produces a useful, logical result, whereas ignoring errors results in code that produces garbage or nothing. Decriminalization comes about when we broaden our definitions of what is the correct result of a given operation. A couple of other examples of decriminalization: 1) there are languages in which the only valid argument for an 'if' statement is a boolean. Attempts to say "if 0" are errors. In Python we relax that rule, allowing any type to be used as the argument to an if-statement. We do this by having a broader interpretation of what it means to test an object for 'trueness', and allow 'trueness' to be implied by 'non-emptiness'. 2) Duck-typing is a decriminalization of the error that polymorphic types are required to inherit from a common interface. It also decriminalizes "missing methods", as long as those methods are never called. Again, this is due to having a broader interpretation of 'polymorphism'. (In fact, this aspect of Python is so fundamental, that I think that it deserves its own acronym alongside TOOWTDI and others, but I can't think of a short, pithy description of it. Maybe IOANEIR - "Interpret operations as non-errors if reasonable.") Both static typing and decriminalization serve the same end - telling the programmer "don't sweat the small stuff". Both are very helpful and powerful, because they allow programmers to spend much less time worrying about minor error cases, things that would have to be checked for in C++. Python code is simply more *concise* than the C++ equivalent, yet it achieves this without being terse and cryptic, because the text of a Python program more closely embodies the "essence" of an algorithm, uncluttered by other agendas. The price we pay for this, of course, is that sometimes errors show up much later (like, after ship) than they would have otherwise. But unit testing can catch a lot of the same errors. And in many cases, the seriousness of such errors depends on what we mean by "ship". It's one thing to discover a fatal error after you've pressed thousands of CDs and shipped them all over the world; It's a much different matter if the program has the ability to automatically update itself, or is downloaded from some kind of subscription model such as a package manager. In many environments, it is far more important to get something done quickly and validate the general concept, than it is to insure that the code is 100% correct. In other words, if it would take you 6 months to write it in a statically typed language, but only 2 months to write it in a dynamic language - well, that's 4 extra months you have to write unit tests and make sure it's right! And in the mean time, you can have real users banging on the code and making sure of something that is far more important, which is whether what you wrote is the right thing at all. -- Talin

6 5

What would you call such an object (was: ordered dict)
by LD 'Gus' Landis 26 Apr '07

26 Apr '07

Hi, I am wondering if in y'alls opinion the following is "just a dictionary" or is it a different kind of object that has some of the characteristics of a dictionary, and has order. If y'all think that it is "just a dictionary", then how does one override the notion of a "hash" for the key in a dictionary and make it some other ordered structure (e.g. a B-Tree, AVL, etc). (Please no flame toss to some other list -- this is a "use" of an ordered "ordered dict") I don't know what such a critter would be called (in Python). It has the name of "array" language where it is central, but don't want to go into that. The object has the following characteristics: - It is indexed by keys which are immutable (like dicts) - Each key has a single value (like dicts) - The keys are ordered (usually a B-Tree underneath) - The keys are "sorted" yielding a hierarchy such that (using Python tuples as an example and pseudo Python): object = { (0,): "value of node", (0,"name") : "name of node", (0,"name",1): "some data about name", (1,): "value of another node", (1,2,3): "some data value", (2,): 2, (2,2,"somekey",1): 32, (3,): 28, ("abc",1,2): 14 } - Introspection of the object allows walking the keys by hierarchy, using the above: key = object.order(None) -> 0 key = object.order(key) -> 1 key = object.order(key) -> 2 key = object.order(key) -> 3 key = object.order(key) -> "abc" key = object.order(key) -> None The first key is fetched when (None) is the initial key (or last key if modifier is -1) Supplying a modifier (-1, where 1 is default of forward, -1 is reverse) in the call traverses the keys in the reverse order from that shown above. - Introspection of the key results in: hasdata = object.data(key) =0 no subkeys no data for 'key' (in the above (39) would have no subkeys, no data) =1 no subkeys has data for 'key' (in the above (3) has no subkeys, but has data) =10 has subkeys no data for 'key' (in the above (2,2) has subkeys but no data) =11 has subkeys has data for 'key' (in the above (2) has subkeys and has data) - Introspection of object can yield "depth first" keys key = object.query(None) -> (0,) key = object.query(key) -> (0,"name") key = object.query(key) -> (0,"name",1) key = object.query(key) -> (1,) key = object.query(key) -> (1,2,3) key = object.query(key) -> (2,) key = object.query(key) -> (2,2,"somekey",1) key = object.query(key) -> (3,) key = object.query(key) -> ("abc",1,2) key = object.query(key) -> None Like object.order(), object.query() has the same "reverse" (using -1) option to walk the keys in a reverse order. - Having an iterator over order/query: for key in object.ordered([start[,end]): for key in object.queryed([start[,end]): (spelling?? other alternative) - Set/get of object[(0,"name")] = "new name of node" print object[(0,"name")] Cheers, --ldl -- LD Landis - N0YRQ - de la tierra del encanto 3960 Schooner Loop, Las Cruces, NM 88012 651/340-4007 N32 21'48.28" W106 46'5.80" "If a thing is worth doing, it is worth doing badly." –GK Chesterton. An interpretation: For things worth doing: Doing them, even if badly, is better than doing nothing perfectly (on them).

2 1

Minor suggestion for unittest
by Terry Jones 26 Apr '07

26 Apr '07

There's a simple change that could be made to unittest that would make it easier to automate some forms of testing. I want to be able to dynamically add tests to an instance of a class derived from unittest.TestCase. There are occasions when I don't want to write my tests upfront in a Python file. E.g., given a bunch of test/expectedResult data sitting around (below in a variable named MyTestData), it would be nice to be able to do the following (untested here, but I did it earlier for real and it works fine): import unittest class Test(unittest.TestCase): def runTest(): pass suite = unittest.TestSuite() for testFunc, expectedResult in MyTestData: newTestFuncName = 'dynamic-test-' + testFunc.__name__ def tester(): self.assertEqual(testFunc(), expectedResult) test = Test() setattr(test, newTestFuncName, tester) # Set the class instance up so that it will be the one run. test.__init__(newTestFuncName) # ugh! suite.addTest(test) suite.run() The explicit call to __init__ (marked ugh!) is ugly, dangerous, etc. You could also say test._testMethodName = newTestFuncName (and set _testMethodDoc too), but that's also ugly. This would all be very simple though if instead of starting out like: class TestCase: def __init__(self, methodName='runTest'): try: self._testMethodName = methodName testMethod = getattr(self, methodName) self._testMethodDoc = testMethod.__doc__ except AttributeError: raise ValueError, "no such test method in %s: %s" % \ (self.__class__, methodName) unittest.TestCase started out like this: class TestCase: def __init__(self, methodName='runTest'): self.setTestMethod(methodName) def setTestMethod(self, methodName): try: self._testMethodName = methodName testMethod = getattr(self, methodName) self._testMethodDoc = testMethod.__doc__ except AttributeError: raise ValueError, "no such test method in %s: %s" % \ (self.__class__, methodName) That would allow people to create an instance of their Test class, add a method to it using setattr, and then use setTestMethod to set the method to be run. A further improvement would be to have _testMethodName be None or left undefined (and accessed via __getattr__) for as long as possible rather than being set to runTest (and looked up with getattr) immediately. That would allow the removal of the do-nothing runTest method in the above. No old code need be broken as runTest would still be the default. You'd just have a chance to get in there earlier so it never saw the light of day. Programmers like to automate things, especially testing. These changes don't break any existing code but they allow additional test automation. Of course you _could_ achieve the above by writing out a brand new temp.py file, running it, and so on, but that's not very Pythonic, is a bunch more work, needs cleanup (temp.py needs to go away), etc. I have some further thoughts about how to make this a bit more flexible, but I'll save those for later, supposing there's any interest in the above. Terry Jones

2 1

Sandbox?
by Mathias Panzenböck 24 Apr '07

24 Apr '07

Are there any plans on a sandbox for python 3.0? Just wondering. -panzi

2 1