Using itertools in modules that are part of the build chain (Re: r76264 - python/branches/py3k/Lib/tokenize.py)
benjamin.peterson wrote:
Modified: python/branches/py3k/Lib/tokenize.py ============================================================================== --- python/branches/py3k/Lib/tokenize.py (original) +++ python/branches/py3k/Lib/tokenize.py Sat Nov 14 17:27:26 2009 @@ -377,17 +377,12 @@ The first token sequence will always be an ENCODING token which tells you which encoding was used to decode the bytes stream. """ + # This import is here to avoid problems when the itertools module is not + # built yet and tokenize is imported. + from itertools import chain
This is probably a bad idea - calling tokenize.tokenize() from a thread started as a side effect of importing a module will now deadlock on the import lock if the module import waits for that thread to finish. We tell people not to do that (starting and then waiting on threads as part of module import) for exactly this reason, but it is also the reason we avoid embedding import statements inside functions in the standard library (not to mention encouraging third party developers to also avoid embedding import statements inside functions). This does constrain where we can use itertools - if we want carte blanche to use it anywhere in the standard library, even those parts that are imported as part of the build chain, we'll need to bite the bullet and make it a builtin module rather than a separately built extension module. Cheers, Nick. P.S. The problem is easy to demonstrate on the current Py3k branch: 1. Put this in a module file in your py3k directory (e.g. "deadlock.py"): ----------- import threading import tokenize f = open(__file__, 'rU') def _deadlocks(): tokenize.tokenize(f.readline) t = threading.Thread(target=_deadlocks) t.start() t.join() ----------- 2. Then run: ./python -c "import deadlock" It will, as advertised, deadlock and you'll need to use Ctrl-Brk or kill -9 to get rid of it. (Note that preventing this kind of thing is one of the major reasons why direct execution and even the -m switch *don't* hang onto the import lock while running the __main__ module) -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
2009/11/14 Nick Coghlan <ncoghlan@gmail.com>:
This does constrain where we can use itertools - if we want carte blanche to use it anywhere in the standard library, even those parts that are imported as part of the build chain, we'll need to bite the bullet and make it a builtin module rather than a separately built extension module.
I have another unpleasant but slightly less hacky solution. We put detect_encoding in linecache where it is actually used. -- Regards, Benjamin
On Sat, Nov 14, 2009 at 20:01, Benjamin Peterson <benjamin@python.org> wrote:
2009/11/14 Nick Coghlan <ncoghlan@gmail.com>:
This does constrain where we can use itertools - if we want carte blanche to use it anywhere in the standard library, even those parts that are imported as part of the build chain, we'll need to bite the bullet and make it a builtin module rather than a separately built extension module.
I have another unpleasant but slightly less hacky solution. We put detect_encoding in linecache where it is actually used.
Well, it happens to be used by the standard library in linecache, but not all external uses of it necessarily tie into linecache (e.g. importlib uses detect_encoding() in some non-critical code). Might just have to live with sub-optimal code. -Brett
2009/11/15 Brett Cannon <brett@python.org>:
On Sat, Nov 14, 2009 at 20:01, Benjamin Peterson <benjamin@python.org> wrote:
2009/11/14 Nick Coghlan <ncoghlan@gmail.com>:
This does constrain where we can use itertools - if we want carte blanche to use it anywhere in the standard library, even those parts that are imported as part of the build chain, we'll need to bite the bullet and make it a builtin module rather than a separately built extension module.
I have another unpleasant but slightly less hacky solution. We put detect_encoding in linecache where it is actually used.
Well, it happens to be used by the standard library in linecache, but not all external uses of it necessarily tie into linecache (e.g. importlib uses detect_encoding() in some non-critical code). Might just have to live with sub-optimal code.
Well, what I mean is that we'd do: def _detect_encoding(): in linecache and then "from linecache import _detect_encoding as detect_encoding" in tokenize.py. -- Regards, Benjamin
participants (3)
-
Benjamin Peterson
-
Brett Cannon
-
Nick Coghlan