I've been thinking about some ideas for reducing the
amount of refcount adjustment that needs to be done,
with a view to making GIL removal easier.
1) Permanent objects
In a typical Python program there are many objects
that are created at the beginning and exist for the
life of the program -- classes, functions, literals,
etc. Refcounting these is a waste of effort, since
they're never going to go away.
So perhaps there could be a way of marking such
objects as "permanent" or "immortal". Any refcount
operation on a permanent object would be a no-op,
so no locking would be needed. This would also have
the benefit of eliminating any need to write to the
object's memory at all when it's only being read.
2) Objects owned by a thread
Python code creates and destroys temporary objects
at a high rate -- stack frames, argument tuples,
intermediate results, etc. If the code is executed
by a thread, those objects are rarely if ever seen
outside of that thread. It would be beneficial if
refcount operations on such objects could be carried
out by the thread that created them without locking.
To achieve this, two extra fields could be added
to the object header: an "owning thread id" and a
"local reference count". (The existing refcount
field will be called the "global reference count"
in what follows.)
An object created by a thread has its owning thread
id set to that thread. When adjusting an object's
refcount, if the current thread is the object's owning
thread, the local refcount is updated without locking.
If the object has no owning thread, or belongs to
a different thread, the object is locked and the
global refcount is updated.
The object is considered garbage only when both
refcounts drop to zero. Thus, after a decref, both
refcounts would need to be checked to see if they
are zero. When decrementing the local refcount and
it reaches zero, the global refcount can be checked
without locking, since a zero will never be written
to it until it truly has zero non-local references
remaining.
I suspect that these two strategies together would
eliminate a very large proportion of refcount-related
activities requiring locking, perhaps to the point
where those remaining are infrequent enough to make
GIL removal practical.
--
Greg
Some of you might remember a discussion that took place on this list
about not being able to execute a script contained in a package that
used relative imports (read the PEP if you don't quite get what I am
talking about). The PEP below proposes a solution (along with a
counter-solution).
Let me know what you think. I especially want to hear which proposal
people prefer; the one in the PEP or the one in the Open Issues
section. Plus I wouldn't mind suggestions on a title for this PEP.
=)
-------------------------------------------
PEP: XXX
Title: XXX
Version: $Revision: 52916 $
Last-Modified: $Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) $
Author: Brett Cannon
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: XXX-Apr-2007
Abstract
========
Because of how name resolution works for relative imports in a world
where PEP 328 is implemented, the ability to execute modules within a
package ceases being possible. This failing stems from the fact that
the module being executed as the "main" module replaces its
``__name__`` attribute with ``"__main__"`` instead of leaving it as
the actual, absolute name of the module. This breaks import's ability
to resolve relative imports from the main module into absolute names.
In order to resolve this issue, this PEP proposes to change how a
module is delineated as the module that is being executed as the main
module. By leaving the ``__name__`` attribute in a module alone and
setting a module attribute named ``__main__`` to a true value for the
main module (and thus false in all others), proper relative name
resolution can occur while still having a clear way for a module to
know if it is being executed as the main module.
The Problem
===========
With the introduction of PEP 328, relative imports became dependent on
the ``__name__`` attribute of the module performing the import. This
is because the use of dots in a relative import are used to strip away
parts of the calling module's name to calcuate where in the package
hierarchy a relative import should fall (prior to PEP 328 relative
imports could fail and would fall back on absolute imports which had a
chance of succeeding).
For instance, consider the import ``from .. import spam`` made from the
``bacon.ham.beans`` module (``bacon.ham.beans`` is not a package
itself, i.e., does not define ``__path__``). Name resolution of the
relative import takes the caller's name (``bacon.ham.beans``), splits
on dots, and then slices off the last n parts based on the level
(which is 2). In this example both ``ham`` and ``beans`` are dropped
and ``spam`` is joined with what is left (``bacon``). This leads to
the proper import of the module ``bacon.spam``.
This reliance on the ``__name__`` attribute of a module when handling
realtive imports becomes an issue with executing a script within a
package. Because the executing script is set to ``'__main__'``,
import cannot resolve any relative imports. This leads to an
``ImportError`` if you try to execute a script in a package that uses
any relative import.
For example, assume we have a package named ``bacon`` with an
``__init__.py`` file containing::
from . import spam
Also create a module named ``spam`` within the ``bacon`` package (it
can be an empty file). Now if you try to execute the ``bacon``
package (either through ``python bacon/__init__.py`` or
``python -m bacon``) you will get an ``ImportError`` about trying to
do a relative import from within a non-package. Obviously the import
is valid, but because of the setting of ``__name__`` to ``'__main__'``
import thinks that ``bacon/__init__.py`` is not in a package since no
dots exist in ``__name__``. To see how the algorithm works, see
``importlib.Import._resolve_name()`` in the sandbox [#importlib]_.
Currently a work-around is to remove all relative imports in the
module being executed and make them absolute. This is unfortunate,
though, as one should not be required to use a specific type of
resource in order to make a module in a package be able to be
executed.
The Solution
============
The solution to the problem is to not change the value of ``__name__``
in modules. But there still needs to be a way to let executing code
know it is being executed as a script. This is handled with a new
module attribute named ``__main__``.
When a module is being executed as a script, ``__main__`` will be set
to a true value. For all other modules, ``__main__`` will be set to a
false value. This changes the current idiom of::
if __name__ == '__main__':
...
to::
if __main__:
...
The current idiom is not as obvious and could cause confusion for new
programmers. The proposed idiom, though, does not require explaining
why ``__name__`` is set as it is.
With the proposed solution the convenience of finding out what module
is being executed by examining ``sys.modules['__main__']`` is lost.
To make up for this, the ``sys`` module will gain the ``main``
attribute. It will contain a string of the name of the module that is
considered the executing module.
A competing solution is discussed in `Open Issues`_.
Transition Plan
===============
Using this solution will not work directly in Python 2.6. Code is
dependent upon the semantics of having ``__name__`` set to
``'__main__'``. There is also the issue of pre-existing global
variables in a module named ``__main__``. To deal with these issues,
a two-step solution is needed.
First, a Py3K deprecation warning will be raised during AST generation
when a global variable named ``__main__`` is defined. This will help
with the detection of code that would reset the value of ``__main__``
for a module. Without adding a warning when a global variable is
injected into a module, though, it is not fool-proof. But this
solution should cover the vast majority of variable rebinding
problems.
Second, 2to3 [#2to3]_ will gain a rule to transform the current ``if
__name__ == '__main__': ...`` idiom to the new one. While it will not
help with code that checks ``__name__`` outside of the idiom, that
specific line of code makes up a large proporation of code that every
looks for ``__name__`` set to ``'__main__'``.
Open Issues
===========
A counter-proposal to introducing the ``__main__`` attribute on
modules was to introduce a built-in with the same name. The value of
the built-in would be the name of the module being executed (just like
the proposed ``sys.main``). This would lead to a new idiom of::
if __name__ == __main__:
...
The perk of this idiom over the one proposed earlier is that the
general semantics does not differ greatly from the current idiom.
The drawback is that the syntactic difference is subtle; the dropping
of quotes around "__main__". Some believe that for existing Python
programmers bugs will be introduced where the quotation marks will be
put on by accident. But one could argue that the bug would be
discovered quickly through testing as it is a very shallow bug.
The other pro of this proposal over the earlier one is the alleviation
of requiring import code to have to set the value of ``__main__``. By
making it a built-in variable import does not have to care about
``__main__`` as executing the code itself will pick up the built-in
``__main__`` itself. This simplies the implementation of the proposal
as it only requires setting a built-in instead of changing import to
set an attribute on every module that has exactly one module have a
different value (much like the current implementation has to do to set
``__name__`` in one module to ``'__main__'``).
References
==========
.. [#2to3] 2to3 tool
(http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC]
.. [#importlib] importlib
(http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=mar…)
[ViewVC]
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:
If A.M. Kuchling's list of Python Warts is any indication, Python has
removed many of the warts it once had. However, the behavior of mutable
default argument values is still a frequent stumbling-block for newbies.
It is also present on at least 3 different lists of Python's
deficiencies ([0][1][2]).
Example of current, unintuitive behavior (snipped from [0]):
>>> def popo(x=[]):
... x.append(666)
... print x
...
>>> popo()
[666]
>>> popo()
[666, 666]
>>> popo()
[666, 666, 666]
Whereas a newbie with experience with immutable default argument values
would, by analogy, expect:
>>> popo()
[666]
>>> popo()
[666]
>>> popo()
[666]
In scanning [0], [1], [2], and other similar lists, I have only found
one mediocre use-case for this behavior: Using the default argument
value to retain state between calls. However, as [2] comments, this
purpose is much better served by decorators, classes, or (though less
preferred) global variables. Other uses are alluded to be equally
esoteric and unpythonic.
To work around this behavior, the following idiom is used:
def popo(x=None):
if x is None:
x = []
x.append(666)
print x
However, why should the programmer have to write this extra boilerplate
code when the current, unusual behavior is only relied on by 1% of
Python code?
Therefore, I propose that default arguments be handled as follows in Py3K:
1. The initial default value is evaluated at definition-time (as in the
current behavior).
2. That in a function call where the caller has not specified a value
for an optional argument, Python calls
copy.deepcopy(initial_default_value), and fills in the optional argument
with the resulting value.
This is fully backwards-compatible with the aforementioned workaround,
and removes the need for the it, allowing one to write the first,
simpler definition of popo().
Comments?
- Chris Rebert
[0] 10 Python pitfalls (http://zephyrfalcon.org/labs/python_pitfalls.html)
[1] Python Gotchas
(http://www.ferg.org/projects/python_gotchas.html#contents_item_6)
[2] When Pythons Attack
(http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2)
Some kind of ordered dictionary would be nice to have in the
standard library. e.g. a AVL tree or something like that.
It would be nice so we can do things like that:
for value in tree[:end_key]:
do_something_with(value)
del tree[:end_key]
A alternative would be just to sort the keys of a dict but
that's O( n log n ) for each sort. Depending on what's the more
often occurring case (lookup, insert, get key-range, etc.) a
other kind of dict object would make sense.
What do you think?
-panzi
I think it would be useful for Python to accept imports of standalone
files representing entire packages, maybe with the extension .pyp. A
package file would basically be a ZIP file, so it would follow fairly
easily from the current zipimport mechanism... its top-level
directory would be the contents of a package named by the outer ZIP
file. In other words, suppose we have a ZIP file called
"package.pyp", and at its top level, it contains "__init__.py" and
"blah.py". Anywhere this can be located, it would be equivalent to a
physical directory called "package" containing those two files. So
you can simply do "import package" as usual, regardless of whether
it's a directory or a .pyp.
A while ago I wrote a program called Squisher that does this (it
takes a ZIP file and turns it into an importable .pyc file), but it's
a huge hack. The hackishness mainly comes from my desire to not
require users of Squished packages to install Squisher itself; so
each module basically has to bootstrap itself, adding its own import
hook and then adding its own path to sys.path and shuffling around a
couple of things in sys.modules. All that could be avoided if this
were a core feature; I expect a straightforward import hook would
suffice.
As PEP 302 says, "Distributing lots of source or pyc files around is
not always appropriate, so there is a frequent desire to package all
needed modules in a single file." It's very useful to be able to
download a single file, plop it into a directory, and immediately be
able to import it like any .py or .pyc file. Eggs are nice, but
having to manually add them to sys.path or install them system-wide
with setuptools is not always ideal.
Hi Collin
Thanks for the reply.
| It sounds like what you're looking for is FunctionTestCase
| (http://docs.python.org/lib/unittest-contents.html). Using that, your
| loop above becomes something like
|
| for testFunc, expectedResult in MyTestData:
| def tester():
| self.assertEqual(testFunc(), expectedResult)
| suite.addTest(FunctionTestCase(tester))
I had read about FunctionTestCase but it didn't seem to be what I was
looking for - though it's the closest. FunctionTestCase is intended to
allow people to easily bring a set of pre-existing tests under the umbrella
of unittest. It overrides setUp and tearDown, and doesn't result in the
test being a first-class test like those you get when you write tests for
unittest from scratch (using TestCase directly, or something you write
based on it).
I want to dynamically (i.e. at run time) add functions that are treated
equally with those that are added statically in python code. That could be
really simple (and I can hack around it to achieve it), but the current
method unittest uses to set its self._testMethodName prevents me from doing
this in a nice way (because TestCase.__init__ immediately does a hasattr to
look for the named method, and fails if it's absent).
I wonder if I'm being clear... it's pretty simple, but my explanation may
not be so good.
Regards,
Terry
(This is a fragment of an email that I sent to Guido earlier, I mention
this here so that Guido can skip reading it. Of course, I recognize that
most people here already know all this - but since it relates to the
recent discussion about the value of type checking, I'd like to post it
here as a kind of "manifesto" of why Python is the way it is.)
Strongly-typed languages such as C++ and Java require up-front
declaration of everything. It is the nature of such languages that there
is a lot of cross-checking in the compiler between the declaration of a
thing and its use. The idea is to prevent programmer errors by insuring
internal consistency.
However, we find in practice that much of the programmer's effort is
spent in maintaining this cross-checking structure. To use a building
analogy, a statically-typed program is like a truss structure, where
there's a complex web of cross-braces, and a force applied at any given
point is spread out over the whole structure. Each time such a program
is modified, the programmer must partially dismantle and then
re-assemble the existing structure. This takes time.
It also clutters the code. Reading the source code of a program written
in a statically typed language reveals that a substantial part of the
text serves only to support the compile-time checking of definitions, or
provides visual redundancy to aid the programmer in connecting two
aspects of the program which are defined far apart.
An example of what I mean is the use of variable type declarations -
even in a statically typed language, it would be fairly easy for the
compiler to automatically infer most variable types if the language were
designed that way; The fact that the programmer is required to manually
specify these types serves as an additional consistency check on the code.
However, time spend serving the needs of these consistency checks is
time away from actually serving the functional purpose of the code.
Programmers in Python, on the other hand, not only need not worry about
type declarations, they also spend much less time worrying about
converting from one type to another in order to meet the constraints of
a particular API.
This is one of the reasons why I can generally write Python code about 4
times as fast as the C++ equivalent.
(Understand that this is coming from someone who loves working in C++
and Java and has used them daily for the last 15 years. At the same
time, however, I also enjoy programming in Python and I recognize that
each language has their strengths.)
There is also the question of how much static typing helps improve
program reliability.
In statically typed languages, there are two kinds of ways that types
are used. In languages such as C and Pascal, the type declarations serve
primarily as a consistency check. However, in C++ template
metaprogramming, and in languages like Haskell, there is a second use
for types, which is to provide a kind of type calculus or type
inferencing, which gives additional expressive power to the language.
C++ templates can act as powerful code generators, allowing the
programmer to program in ever higher levels of abstraction and express
the basic ideas even more succinctly and clearly than before.
In a rapid-prototyping environment, the second use of types can be a
major productivity win; However I would argue that the first use of
types, consistency checking, is less beneficial, and is often more of a
distraction to the programmer than a help. Yes, static type checking
does detect some errors; But it also causes errors by making the code
larger and more wordy, because that the programmer cannot hold large
portions of the program in their mind all at once, which can lead to
errors in overall design. It means the programmer spends more time
thinking about the behavior of individual variables and less about the
algorithm as a whole.
At this point, I want to talk about a related matter, another
fundamental design aspect of Python which I call "decriminalization of
minor errors".
An example of this is illustrated by the recent discussion over string
slicing. As you know, when you attempt to index a string with a slice
object that extends outside of the bounds of the string, the range is
silently truncated.
Some argued that Python should be more strict, and report an error when
this occurs - but instead, it was reaffirmed that the current behavior
is correct. I would agree that this current behavior is the more
Pythonic, and is part of a general pattern, which I shall attempt to
describe:
To "decriminalize" an error means to find some alternative
interpretation of the programmer's intent that produces a non-error
result. That is, given a syntactical construct, and a choice of several
interpretations of what that construct should mean, attempt to pick an
interpretation that, when executed, does not produce an error.
In the design of the Python language, it is a regular practice to
decriminalize minor errors, provided that the alternative interpretation
can meet some fairly strict criteria: That it is useful, intuitive,
reasonable, and pedagogically sound.
Note that this is a much more conservative rule than that used by
languages such as Rexx, Javascript, and Perl, languages which make
"heroic efforts" to bend the interpretation of an operation to a
non-error result. Python does not do this.
Nor is decriminalizing errors isn't the same as ignoring errors. Errors
are still, and should be, enforced vigorously and reported. The
distinction is that decriminalizing an error results in code that
produces a useful, logical result, whereas ignoring errors results in
code that produces garbage or nothing. Decriminalization comes about
when we broaden our definitions of what is the correct result of a given
operation.
A couple of other examples of decriminalization:
1) there are languages in which the only valid argument for an 'if'
statement is a boolean. Attempts to say "if 0" are errors. In Python we
relax that rule, allowing any type to be used as the argument to an
if-statement. We do this by having a broader interpretation of what it
means to test an object for 'trueness', and allow 'trueness' to be
implied by 'non-emptiness'.
2) Duck-typing is a decriminalization of the error that polymorphic
types are required to inherit from a common interface. It also
decriminalizes "missing methods", as long as those methods are never
called. Again, this is due to having a broader interpretation of
'polymorphism'.
(In fact, this aspect of Python is so fundamental, that I think that it
deserves its own acronym alongside TOOWTDI and others, but I can't think
of a short, pithy description of it. Maybe IOANEIR - "Interpret
operations as non-errors if reasonable.")
Both static typing and decriminalization serve the same end - telling
the programmer "don't sweat the small stuff". Both are very helpful and
powerful, because they allow programmers to spend much less time
worrying about minor error cases, things that would have to be checked
for in C++. Python code is simply more *concise* than the C++
equivalent, yet it achieves this without being terse and cryptic,
because the text of a Python program more closely embodies the "essence"
of an algorithm, uncluttered by other agendas.
The price we pay for this, of course, is that sometimes errors show up
much later (like, after ship) than they would have otherwise. But unit
testing can catch a lot of the same errors.
And in many cases, the seriousness of such errors depends on what we
mean by "ship". It's one thing to discover a fatal error after you've
pressed thousands of CDs and shipped them all over the world; It's a
much different matter if the program has the ability to automatically
update itself, or is downloaded from some kind of subscription model
such as a package manager.
In many environments, it is far more important to get something done
quickly and validate the general concept, than it is to insure that the
code is 100% correct. In other words, if it would take you 6 months to
write it in a statically typed language, but only 2 months to write it
in a dynamic language - well, that's 4 extra months you have to write
unit tests and make sure it's right! And in the mean time, you can have
real users banging on the code and making sure of something that is far
more important, which is whether what you wrote is the right thing at all.
-- Talin
Hi,
I am wondering if in y'alls opinion the following is "just a dictionary" or
is it a different kind of object that has some of the characteristics of a
dictionary, and has order.
If y'all think that it is "just a dictionary", then how does one override the
notion of a "hash" for the key in a dictionary and make it some other
ordered structure (e.g. a B-Tree, AVL, etc). (Please no flame toss to
some other list -- this is a "use" of an ordered "ordered dict")
I don't know what such a critter would be called (in Python). It has the
name of "array" language where it is central, but don't want to go into that.
The object has the following characteristics:
- It is indexed by keys which are immutable (like dicts)
- Each key has a single value (like dicts)
- The keys are ordered (usually a B-Tree underneath)
- The keys are "sorted" yielding a hierarchy such that (using Python tuples
as an example and pseudo Python):
object = {
(0,): "value of node",
(0,"name") : "name of node",
(0,"name",1): "some data about name",
(1,): "value of another node",
(1,2,3): "some data value",
(2,): 2,
(2,2,"somekey",1): 32,
(3,): 28,
("abc",1,2): 14
}
- Introspection of the object allows walking the keys by hierarchy,
using the above:
key = object.order(None) -> 0
key = object.order(key) -> 1
key = object.order(key) -> 2
key = object.order(key) -> 3
key = object.order(key) -> "abc"
key = object.order(key) -> None
The first key is fetched when (None) is the initial key (or last
key if modifier is -1)
Supplying a modifier (-1, where 1 is default of forward, -1 is
reverse) in the call
traverses the keys in the reverse order from that shown above.
- Introspection of the key results in:
hasdata = object.data(key)
=0 no subkeys no data for 'key' (in the above (39) would have no
subkeys, no data)
=1 no subkeys has data for 'key' (in the above (3) has no
subkeys, but has data)
=10 has subkeys no data for 'key' (in the above (2,2) has subkeys
but no data)
=11 has subkeys has data for 'key' (in the above (2) has subkeys
and has data)
- Introspection of object can yield "depth first" keys
key = object.query(None) -> (0,)
key = object.query(key) -> (0,"name")
key = object.query(key) -> (0,"name",1)
key = object.query(key) -> (1,)
key = object.query(key) -> (1,2,3)
key = object.query(key) -> (2,)
key = object.query(key) -> (2,2,"somekey",1)
key = object.query(key) -> (3,)
key = object.query(key) -> ("abc",1,2)
key = object.query(key) -> None
Like object.order(), object.query() has the same "reverse" (using
-1) option to walk
the keys in a reverse order.
- Having an iterator over order/query:
for key in object.ordered([start[,end]):
for key in object.queryed([start[,end]): (spelling?? other alternative)
- Set/get of
object[(0,"name")] = "new name of node"
print object[(0,"name")]
Cheers,
--ldl
--
LD Landis - N0YRQ - de la tierra del encanto
3960 Schooner Loop, Las Cruces, NM 88012
651/340-4007 N32 21'48.28" W106 46'5.80"
"If a thing is worth doing, it is worth doing badly." –GK Chesterton.
An interpretation: For things worth doing: Doing them, even if badly,
is better than doing nothing perfectly (on them).
There's a simple change that could be made to unittest that would make it
easier to automate some forms of testing.
I want to be able to dynamically add tests to an instance of a class
derived from unittest.TestCase. There are occasions when I don't want to
write my tests upfront in a Python file. E.g., given a bunch of
test/expectedResult data sitting around (below in a variable named
MyTestData), it would be nice to be able to do the following (untested
here, but I did it earlier for real and it works fine):
import unittest
class Test(unittest.TestCase):
def runTest(): pass
suite = unittest.TestSuite()
for testFunc, expectedResult in MyTestData:
newTestFuncName = 'dynamic-test-' + testFunc.__name__
def tester():
self.assertEqual(testFunc(), expectedResult)
test = Test()
setattr(test, newTestFuncName, tester)
# Set the class instance up so that it will be the one run.
test.__init__(newTestFuncName) # ugh!
suite.addTest(test)
suite.run()
The explicit call to __init__ (marked ugh!) is ugly, dangerous, etc. You
could also say test._testMethodName = newTestFuncName (and set
_testMethodDoc too), but that's also ugly.
This would all be very simple though if instead of starting out like:
class TestCase:
def __init__(self, methodName='runTest'):
try:
self._testMethodName = methodName
testMethod = getattr(self, methodName)
self._testMethodDoc = testMethod.__doc__
except AttributeError:
raise ValueError, "no such test method in %s: %s" % \
(self.__class__, methodName)
unittest.TestCase started out like this:
class TestCase:
def __init__(self, methodName='runTest'):
self.setTestMethod(methodName)
def setTestMethod(self, methodName):
try:
self._testMethodName = methodName
testMethod = getattr(self, methodName)
self._testMethodDoc = testMethod.__doc__
except AttributeError:
raise ValueError, "no such test method in %s: %s" % \
(self.__class__, methodName)
That would allow people to create an instance of their Test class, add a
method to it using setattr, and then use setTestMethod to set the method
to be run.
A further improvement would be to have _testMethodName be None or left
undefined (and accessed via __getattr__) for as long as possible rather
than being set to runTest (and looked up with getattr) immediately. That
would allow the removal of the do-nothing runTest method in the above. No
old code need be broken as runTest would still be the default. You'd just
have a chance to get in there earlier so it never saw the light of day.
Programmers like to automate things, especially testing. These changes
don't break any existing code but they allow additional test automation.
Of course you _could_ achieve the above by writing out a brand new temp.py
file, running it, and so on, but that's not very Pythonic, is a bunch more
work, needs cleanup (temp.py needs to go away), etc.
I have some further thoughts about how to make this a bit more flexible,
but I'll save those for later, supposing there's any interest in the above.
Terry Jones