Hello everyone!
We have been encountering several deadlocks in a threaded Python
application which calls subprocess.Popen (i.e. fork()) in some of its
threads.
This has occurred on Python 2.4.1 on a 2.4.27 Linux kernel.
Preliminary analysis of the hang shows that the child process blocks
upon entering the execvp function, in which the import_lock is acquired
due to the following line:
def _ execvpe(file, args, env=None):
from errno import ENOENT, ENOTDIR
...
It is known that when forking from a pthreaded application, acquisition
attempts on locks which were already locked by other threads while
fork() was called will deadlock.
Due to these oddities we were wondering if it would be better to extract
the above import line from the execvpe call, to prevent lock
acquisition attempts in such cases.
Another workaround could be re-assigning a new lock to import_lock
(such a thing is done with the global interpreter lock) at PyOS_AfterFork or
pthread_atfork.
We'd appreciate any opinions you might have on the subject.
Thanks in advance,
Yair and Rotem
On Wed, 10 Nov 2004, John P Speno wrote:
Hi, sorry for the delayed response.
> While using subprocess (aka popen5), I came across one potential gotcha. I've had
> exceptions ending like this:
>
> File "test.py", line 5, in test
> cmd = popen5.Popen(args, stdout=PIPE)
> File "popen5.py", line 577, in __init__
> data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
> OSError: [Errno 4] Interrupted system call
>
> (on Solaris 9)
>
> Would it make sense for subprocess to use a more robust read() function
> which can handle these cases, i.e. when the parent's read on the pipe
> to the child's stderr is interrupted by a system call, and returns EINTR?
> I imagine it could catch EINTR and EAGAIN and retry the failed read().
I assume you are using signals in your application? The os.read above is
not the only system call that can fail with EINTR. subprocess.py is full
of other system calls that can fail, and I suspect that many other Python
modules are as well.
I've made a patch (attached) to subprocess.py (and test_subprocess.py)
that should guard against EINTR, but I haven't committed it yet. It's
quite large.
Are Python modules supposed to handle EINTR? Why not let the C code handle
this? Or, perhaps the signal module should provide a sigaction function,
so that users can use SA_RESTART.
Index: subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/subprocess.py,v
retrieving revision 1.8
diff -u -r1.8 subprocess.py
--- subprocess.py 7 Nov 2004 14:30:34 -0000 1.8
+++ subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -888,6 +888,50 @@
pass
+ def _read_no_intr(self, fd, buffersize):
+ """Like os.read, but retries on EINTR"""
+ while True:
+ try:
+ return os.read(fd, buffersize)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+
+ def _read_all(self, fd, buffersize):
+ """Like os.read, but retries on EINTR, and reads until EOF"""
+ all = ""
+ while True:
+ data = self._read_no_intr(fd, buffersize)
+ all += data
+ if data == "":
+ return all
+
+
+ def _write_no_intr(self, fd, s):
+ """Like os.write, but retries on EINTR"""
+ while True:
+ try:
+ return os.write(fd, s)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
+ def _waitpid_no_intr(self, pid, options):
+ """Like os.waitpid, but retries on EINTR"""
+ while True:
+ try:
+ return os.waitpid(pid, options)
+ except OSError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
def _execute_child(self, args, executable, preexec_fn, close_fds,
cwd, env, universal_newlines,
startupinfo, creationflags, shell,
@@ -963,7 +1007,7 @@
exc_value,
tb)
exc_value.child_traceback = ''.join(exc_lines)
- os.write(errpipe_write, pickle.dumps(exc_value))
+ self._write_no_intr(errpipe_write, pickle.dumps(exc_value))
# This exitcode won't be reported to applications, so it
# really doesn't matter what we return.
@@ -979,7 +1023,7 @@
os.close(errwrite)
# Wait for exec to fail or succeed; possibly raising exception
- data = os.read(errpipe_read, 1048576) # Exceptions limited to 1 MB
+ data = self._read_all(errpipe_read, 1048576) # Exceptions limited to 1 MB
os.close(errpipe_read)
if data != "":
child_exception = pickle.loads(data)
@@ -1003,7 +1047,7 @@
attribute."""
if self.returncode == None:
try:
- pid, sts = os.waitpid(self.pid, os.WNOHANG)
+ pid, sts = self._waitpid_no_intr(self.pid, os.WNOHANG)
if pid == self.pid:
self._handle_exitstatus(sts)
except os.error:
@@ -1015,7 +1059,7 @@
"""Wait for child process to terminate. Returns returncode
attribute."""
if self.returncode == None:
- pid, sts = os.waitpid(self.pid, 0)
+ pid, sts = self._waitpid_no_intr(self.pid, 0)
self._handle_exitstatus(sts)
return self.returncode
@@ -1049,27 +1093,33 @@
stderr = []
while read_set or write_set:
- rlist, wlist, xlist = select.select(read_set, write_set, [])
+ try:
+ rlist, wlist, xlist = select.select(read_set, write_set, [])
+ except select.error, e:
+ if e[0] == errno.EINTR:
+ continue
+ else:
+ raise
if self.stdin in wlist:
# When select has indicated that the file is writable,
# we can write up to PIPE_BUF bytes without risk
# blocking. POSIX defines PIPE_BUF >= 512
- bytes_written = os.write(self.stdin.fileno(), input[:512])
+ bytes_written = self._write_no_intr(self.stdin.fileno(), input[:512])
input = input[bytes_written:]
if not input:
self.stdin.close()
write_set.remove(self.stdin)
if self.stdout in rlist:
- data = os.read(self.stdout.fileno(), 1024)
+ data = self._read_no_intr(self.stdout.fileno(), 1024)
if data == "":
self.stdout.close()
read_set.remove(self.stdout)
stdout.append(data)
if self.stderr in rlist:
- data = os.read(self.stderr.fileno(), 1024)
+ data = self._read_no_intr(self.stderr.fileno(), 1024)
if data == "":
self.stderr.close()
read_set.remove(self.stderr)
Index: test/test_subprocess.py
===================================================================
RCS file: /cvsroot/python/python/dist/src/Lib/test/test_subprocess.py,v
retrieving revision 1.14
diff -u -r1.14 test_subprocess.py
--- test/test_subprocess.py 12 Nov 2004 15:51:48 -0000 1.14
+++ test/test_subprocess.py 17 Nov 2004 19:42:30 -0000
@@ -7,6 +7,7 @@
import tempfile
import time
import re
+import errno
mswindows = (sys.platform == "win32")
@@ -35,6 +36,16 @@
fname = tempfile.mktemp()
return os.open(fname, os.O_RDWR|os.O_CREAT), fname
+ def read_no_intr(self, obj):
+ while True:
+ try:
+ return obj.read()
+ except IOError, e:
+ if e.errno == errno.EINTR:
+ continue
+ else:
+ raise
+
#
# Generic tests
#
@@ -123,7 +134,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stdout.write("orange")'],
stdout=subprocess.PIPE)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_stdout_filedes(self):
# stdout is set to open file descriptor
@@ -151,7 +162,7 @@
p = subprocess.Popen([sys.executable, "-c",
'import sys; sys.stderr.write("strawberry")'],
stderr=subprocess.PIPE)
- self.assertEqual(remove_stderr_debug_decorations(p.stderr.read()),
+ self.assertEqual(remove_stderr_debug_decorations(self.read_no_intr(p.stderr)),
"strawberry")
def test_stderr_filedes(self):
@@ -186,7 +197,7 @@
'sys.stderr.write("orange")'],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT)
- output = p.stdout.read()
+ output = self.read_no_intr(p.stdout)
stripped = remove_stderr_debug_decorations(output)
self.assertEqual(stripped, "appleorange")
@@ -220,7 +231,7 @@
stdout=subprocess.PIPE,
cwd=tmpdir)
normcase = os.path.normcase
- self.assertEqual(normcase(p.stdout.read()), normcase(tmpdir))
+ self.assertEqual(normcase(self.read_no_intr(p.stdout)), normcase(tmpdir))
def test_env(self):
newenv = os.environ.copy()
@@ -230,7 +241,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read(), "orange")
+ self.assertEqual(self.read_no_intr(p.stdout), "orange")
def test_communicate(self):
p = subprocess.Popen([sys.executable, "-c",
@@ -305,7 +316,8 @@
'sys.stdout.write("\\nline6");'],
stdout=subprocess.PIPE,
universal_newlines=1)
- stdout = p.stdout.read()
+
+ stdout = self.read_no_intr(p.stdout)
if hasattr(open, 'newlines'):
# Interpreter with universal newline support
self.assertEqual(stdout,
@@ -343,7 +355,7 @@
def test_no_leaking(self):
# Make sure we leak no resources
- max_handles = 1026 # too much for most UNIX systems
+ max_handles = 10 # too much for most UNIX systems
if mswindows:
max_handles = 65 # a full test is too slow on Windows
for i in range(max_handles):
@@ -424,7 +436,7 @@
'sys.stdout.write(os.getenv("FRUIT"))'],
stdout=subprocess.PIPE,
preexec_fn=lambda: os.putenv("FRUIT", "apple"))
- self.assertEqual(p.stdout.read(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout), "apple")
def test_args_string(self):
# args is a string
@@ -457,7 +469,7 @@
p = subprocess.Popen(["echo $FRUIT"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_shell_string(self):
# Run command through the shell (string)
@@ -466,7 +478,7 @@
p = subprocess.Popen("echo $FRUIT", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertEqual(p.stdout.read().strip(), "apple")
+ self.assertEqual(self.read_no_intr(p.stdout).strip(), "apple")
def test_call_string(self):
# call() function with string argument on UNIX
@@ -525,7 +537,7 @@
p = subprocess.Popen(["set"], shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_shell_string(self):
# Run command through the shell (string)
@@ -534,7 +546,7 @@
p = subprocess.Popen("set", shell=1,
stdout=subprocess.PIPE,
env=newenv)
- self.assertNotEqual(p.stdout.read().find("physalis"), -1)
+ self.assertNotEqual(self.read_no_intr(p.stdout).find("physalis"), -1)
def test_call_string(self):
# call() function with string argument on Windows
/Peter Åstrand <astrand(a)lysator.liu.se>
This may seem like it's coming out of left field for a minute, but
bear with me.
There is no doubt that Ruby's success is a concern for anyone who
sees it as diminishing Python's status. One of the reasons for
Ruby's success is certainly the notion (originally advocated by Bruce
Tate, if I'm not mistaken) that it is the "next Java" -- the language
and environment that mainstream Java developers are, or will, look to
as a natural next step.
One thing that would help Python in this "debate" (or, perhaps simply
put it in the running, at least as a "next Java" candidate) would be
if Python had an easier migration path for Java developers that
currently rely upon various third-party libraries. The wealth of
third-party libraries available for Java has always been one of its
great strengths. Ergo, if Python had an easy-to-use, recommended way
to use those libraries within the Python environment, that would be a
significant advantage to present to Java developers and those who
would choose Ruby over Java. Platform compatibility is always a huge
motivator for those looking to migrate or upgrade.
In that vein, I would point to JPype (http://jpype.sourceforge.net).
JPype is a module that gives "python programs full access to java
class libraries". My suggestion would be to either:
(a) include JPype in the standard library, or barring that,
(b) make a very strong push to support JPype
(a) might be difficult or cumbersome technically, as JPype does need
to build against Java headers, which may or may not be possible given
the way that Python is distributed, etc.
However, (b) is very feasible. I can't really say what "supporting
JPype" means exactly -- maybe GvR and/or other heavyweights in the
Python community make public statements regarding its existence and
functionality, maybe JPype gets a strong mention or placement on
python.org....all those details are obviously not up to me, and I
don't know the workings of the "official" Python organizations enough
to make serious suggestions.
Regardless of the form of support, I think raising people's awareness
of JPype and what it adds to the Python environment would be a Good
Thing (tm).
For our part, we've used JPype to make PDFTextStream (our previously
Java-only PDF text extraction library) available and supported for
Python. You can read some about it here:
http://snowtide.com/PDFTextStream.Python
And I've blogged about how PDFTextStream.Python came about, and how
we worked with Steve Ménard, the maintainer of JPype, to make it all
happen (watch out for this URL wrapping):
http://blog.snowtide.com/2006/08/21/working-together-pythonjava-open-
sourcecommercial
Cheers,
Chas Emerick
Founder, Snowtide Informatics Systems
Enterprise-class PDF content extraction
cemerick(a)snowtide.com
http://snowtide.com | +1 413.519.6365
Phillip.eby wrote:
> Author: phillip.eby
> Date: Tue Apr 18 02:59:55 2006
> New Revision: 45510
>
> Modified:
> python/trunk/Lib/pkgutil.py
> python/trunk/Lib/pydoc.py
> Log:
> Second phase of refactoring for runpy, pkgutil, pydoc, and setuptools
> to share common PEP 302 support code, as described here:
>
> http://mail.python.org/pipermail/python-dev/2006-April/063724.html
Shouldn't this new module be named "pkglib" to be in line with
the naming scheme used for all the other utility modules, e.g. httplib,
imaplib, poplib, etc. ?
> pydoc now supports PEP 302 importers, by way of utility functions in
> pkgutil, such as 'walk_packages()'. It will properly document
> modules that are in zip files, and is backward compatible to Python
> 2.3 (setuptools installs for Python <2.5 will bundle it so pydoc
> doesn't break when used with eggs.)
Are you saying that the installation of setuptools in Python 2.3
and 2.4 will then overwrite the standard pydoc included with
those versions ?
I think that's the wrong way to go if not made an explicit
option in the installation process or a separate installation
altogether.
I bothered by the fact that installing setuptools actually changes
the standard Python installation by either overriding stdlib modules
or monkey-patching them at setuptools import time.
> What has not changed is that pydoc command line options do not support
> zip paths or other importer paths, and the webserver index does not
> support sys.meta_path. Those are probably okay as limitations.
>
> Tasks remaining: write docs and Misc/NEWS for pkgutil/pydoc changes,
> and update setuptools to use pkgutil wherever possible, then add it
> to the stdlib.
Add setuptools to the stdlib ? I'm still missing the PEP for this
along with the needed discussion touching among other things,
the change of the distutils standard "python setup.py install"
to install an egg instead of a site package.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Apr 18 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
I'm interested in how builtins could be more efficient. I've read over
some of the PEPs having to do with making global variables more
efficient (search for "global"):
http://www.python.org/doc/essays/pepparade.html
But I think the problem can be simplified by focusing strictly on
builtins.
One of my assumptions is that only a small fractions of modules override
the default builtins with something like:
import mybuiltins
__builtins__ = mybuiltins
As you probably know each access of a builtin requires two hash table
lookups. First, the builtin is not found in the list of globals. It is
then found in the list of builtins.
Why not have a means of referencing the default builtins with some sort
of index the way the LOAD_FAST op code currently works? In other words,
by default each module gets the default set of builtins indexed (where
the index indexes into an array) in a certain order. The version stored
in the pyc file would be bumped each time the set of default builtins
is changed.
I don't have very strong feelings whether things like True = (1 == 1)
would be a syntax error, but assigning to a builtin could just do the
equivalent of STORE_FAST. I also don't have very strong feelings about
whether the array of default builtins would be shared between modules.
To simulate the current behavior where attempting to assign to builtin
actually alters that module's global hashtable a separate array of
builtins could be used for each module.
As to assigning to __builtins__ (like I mentioned at the beginning of
this post) perhaps it could assign to the builtin array for those items
that have a name that matches a default builtin (such as "True" or
"len"). Those items that don't match a default builtin would just
create global variables.
Perhaps what I'm suggesting isn't feasible for reasons that have already
been discussed. But it seems like it should be possible to make "while
True" as efficient as "while 1".
--
-----------------------------------------------------------------------
| Steven Elliott | selliott4(a)austin.rr.com |
-----------------------------------------------------------------------
Should GeneratorExit inherit from Exception or BaseException?
Currently, a generator that catches Exception and continues on to yield
another value can't be closed properly (you get a runtime error pointing out
that the generator ignored GeneratorExit).
The only decent reference I could find to it in the old PEP 348/352
discussions is Guido writing [1]:
> when GeneratorExit or StopIteration
> reach the outer level of an app, it's a bug like all the others that
> bare 'except:' WANTS to catch.
(at that point in the conversation, I believe bare except was considered the
equivalent of "except Exception:")
While I agree with what Guido says about GeneratorExit being a bug if it
reaches the outer level of an app, it seems like a bit of a trap that a
correctly written generator can't write "except Exception:" without preceding
it with an "except GeneratorExit:" that reraises the exception. Isn't that
exactly the idiom we're trying to get rid of for SystemExit and KeyboardInterrupt?
Regards,
Nick.
[1] http://mail.python.org/pipermail/python-dev/2005-August/055173.html
--
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
At 11:09 AM 11/22/2006 -0500, Ian Murdock wrote:
>The first question we have to answer is: What does it mean to "add
>Python to the LSB"? Is it enough to say that Python is present
>at a certain version and above, or do we need to do more than that
>(e.g., many distros ship numerous Python add-ons which apps
>may or may not rely on--do we need to specific some of these too)?
Just a suggestion, but one issue that I think needs addressing is the FHS
language that leads some Linux distros to believe that they should change
Python's normal installation layout (sometimes in bizarre ways) or that
they should remove and separately package different portions of the
standard library. Other vendors apparently also patch Python in various
ways to support their FHS-based theories of how Python should install
files. These changes are detrimental to compatibility.
Another issue is specifying dependencies. The existence of the Cheeseshop
as a central registry of Python project names has not been taken into
account in vendor packaging practices, for example. (Python 2.5 also
introduced the ability to install metadata alongside installed Python
packages, supporting runtime checking for package presence and versions.)
I don't know how closely these issues tie into what the LSB is tying to do,
as I've only observed these issues in the breach, where certain
distribution policies require e.g. that project names be replaced with
internal package names, demand separation of package data files from their
packages, or other procrustean chopping that makes mincemeat of any attempt
at multi-distribution compatibility for an application or multi-dependency
library. Some clarification at the LSB level of what is actually
considered standard for Python might perhaps be helpful in motivating
updates to some of these policies.
I've been looking once again over the docs for distutils and setuptools,
and thinking to myself "this seems a lot more complicated than it ought
to be".
Before I get into detail, however, I want to explain carefully the scope
of my critique - in particular, why I am talking about setuptools on the
python-dev list. You see, in my mind, the process of assembling,
distributing, and downloading a package is, or at least ought to be, a
unified process. It ought to be a fundamental part of the system, and
not split into separate tools with separate docs that have to be
mentally assembled in order to understand it.
Moreover, setuptools is the defacto standard these days - a novice
programmer who googles for 'python install tools' will encounter
setuptools long before they learn about distutils; and if you read the
various mailing lists and blogs, you'll sense a subtle aura of
deprecation and decay that surrounds distutils.
I would claim, then, that regardless of whether setuptools is officially
blessed or not, it is an intrinstic part of the "Python experience".
(I'd also like to put forward the disclaimer that there are probably
factual errors in this post, or errors of misunderstanding; All I can
claim as an excuse is that it's not for lack of trying, and corrections
are welcome as always.)
Think about the idea of module distribution from a pedagogical
standpoint - when does a newbie Python programmer start learning about
module distribution and what do they learn first? A novice Python user
will begin by writing scripts for themselves, and not thinking about
distribution at all. However, once they reach the point where they begin
to think about packaging up their module, the Python documentation ought
to be able to lead them, step by step, towards a goal of making a
distributable package:
-- It should teach them how to organize their code into packages and
modules
-- It should show them how to write the proper setup scripts
-- If there is C code involved, it should explain how that fits into
the picture.
-- It should explain how to write unit tests and where they should go.
So how does the current system fail in this regard? The docs for each
component - distutils, setuptools, unit test frameworks, and so on, only
talk about that specific module - not how it all fits together.
For example, the docs for distutils start by telling you how to build a
setup script. It never explains why you need a setup script, or why
Python programs need to be "installed" in the first place. [1]
The distutils docs never describe how your directory structure ought to
look. In fact, they never tell you how to *write* a distributable
package; rather, it seems to be more oriented towards taking an
already-working package and modifying it to be distributable.
The setuptools docs are even worse in this regard. If you look carefully
at the docs for setuptools, you'll notice that each subsection is
effectively a 'diff', describing how setuputils is different from
distutils. One section talks about the "new and changed keywords",
without explaining what the old keywords were or how to find them.
Thus, for the novice programmer, learning how to write a setup script
ends up being a process of flipping back and forth between the distutils
and setuptools docs, trying to hold in their minds enough of each to be
able to achieve some sort of understanding.
What we have now does a good job of explaining how the individual tools
work, but it doesn't do a good job of answering the question "Starting
from an empty directory, how do I create a distributable Python
package?" A novice programmer wants to know what to create first, what
to create next, and so on.
This is especially true if the novice programmer is creating an
extension module. Suppose I have a C library that I need to wrap. In
order to even compile and test it, I'm going to need a setup script.
That means I need to understand distutils before I even think about
distribution, before I even begin writing the code!
(Sure, I could write a Makefile, but I'd only end up throwing it away
later -- so why not cut to the chase and *start* with a setup script?
Ans: Because it's too hard!)
But it isn't just the docs that are at fault here - otherwise, I'd be
posting this on a different mailing list. It seems like the whole
architecture is 'diff'-based, a series of patches on top of patches,
which are in need of some serious refactoring.
Except that nobody can do this refactoring, because there's no formal
list of requirements. I look at distutils, and while some parts are
obvious, there are other parts where I go "what problem were they trying
to solve here?" In my experience, you *don't* go mucking with someone's
code and trying to fix it unless you understand what problem they were
trying to solve - otherwise you'll botch it and make a mess. Since few
people ever bother to write down what problem they were trying to solve
(although they tend to be better at describing their clever solution),
usually this ends up being done through a process of reverse engineering
the requirements from the code, unless you are lucky enough to have
someone around who knows the history of the thing.
Admittedly, I'm somewhat in ignorance here. My perspective is that of an
'end-user developer', someone who uses these tools but does not write
them. I don't know the internals of these tools, nor do I particularly
want to - I've got bigger fish to fry.
I'm posting this here because what I'd like folks to think about is the
whole process of Python development, not just the documentation. What is
the smoothest path from empty directory to a finished package on PyPI?
What can be changed about the current standard libraries that will ease
this process?
[1] The answer, AFAICT, is that 'setup' is really a Makefile - in other
words, its a platform-independent way of describing how to construct a
compiled module from sources, and making it available to all programs on
that system. Although this gets confusing when we start talking about
"pure python" modules that have no C component - because we have all
this language that talks about compiling and installing and such, when
all that is really going on underneath is a plain old file copy.
-- Talin
At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
>I've got a small tweak to tokenize.py that I'd like to run by folks here.
>
>I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
>and my approach is to build a full parse tree with annotations that
>show where the whitespace and comments go. I use the tokenize module
>to scan the input. This is nearly perfect (I can render code from the
>parse tree and it will be an exact match of the input) except for
>continuation lines -- while the tokenize gives me pseudo-tokens for
>comments and "ignored" newlines, it doesn't give me the backslashes at
>all (while it does give me the newline following the backslash).
The following routine will render a token stream, and it automatically
restores the missing \'s. I don't know if it'll work with your patch, but
perhaps you could use it instead of changing tokenize. For the
documentation and examples, see:
http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to…
def detokenize(tokens, indent=0):
"""Convert `tokens` iterable back to a string."""
out = []; add = out.append
lr,lc,last = 0,0,''
baseindent = None
for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
# Insert trailing line continuation and blanks for skipped lines
lr = lr or sr # first line of input is first line of output
if sr>lr:
if last:
if len(last)>lc:
add(last[lc:])
lr+=1
if sr>lr:
add(' '*indent + '\\\n'*(sr-lr)) # blank continuation lines
lc = 0
# Re-indent first token on line
if lc==0:
if tok==INDENT:
continue # we want to dedent first actual token
else:
curindent = len(line[:sc].expandtabs())
if baseindent is None and tok not in WHITESPACE:
baseindent = curindent
elif baseindent is not None and curindent>=baseindent:
add(' ' * (curindent-baseindent))
if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
add(' ' * indent)
# Not at start of line, handle intraline whitespace by retaining it
elif sc>lc:
add(line[lc:sc])
if val:
add(val)
lr,lc,last = er,ec,line
return ''.join(out)
On 09:34 am, jack.jansen(a)cwi.nl wrote:
>There's another standard place that is searched on MacOS: a per-user
>package directory ~/Library/Python/2.5/site-packages (the name "site-
>packages" is a misnomer, really). Standardising something here is
>less important than for vendor-packages (as the effect can easily be
>gotten by adding things to PYTHONPATH) but it has one advantage:
>distutils and such could be taught about it and provide an option to
>install either systemwide or for the current user only.
Yes, let's do that, please. I've long been annoyed that site.py sets up a local user installation directory, a very useful feature, but _only_ on OS X. I've long since promoted my personal hack to add a local user installation directory into a public project -- divmod's "Combinator" -- but it would definitely be preferable for Python to do something sane by default (and have setuptools et. al. support it).
I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the "official" UNIX installation location, since it's what we're already using, and ~/.local seems like a convention being slowly adopted by GNOME and the like. I don't know the cultural equivalent in Windows - "%USERPROFILE%\Application Data\PythonXX" maybe?
It would be nice if site.py would do this in the same place as it sets up the "darwin"-specific path, and to set that path as a module global, so packaging tools could use "site.userinstdir" or something. Right now, if it's present, it's just some random entry on sys.path.