On Thu, 2013-01-24 at 16:17 +0000, Mark Hackett wrote:
> > That said, what are your feelings on adding a CarefulDictReader?
> It's as good a solution to me as any.
> However, I'm not that good a programmer, and therefore what *I'd* do
> necessarily a good idea, it's just one of the better ones out of the
> toolbox I have available.
> I'd prefer (for aesthetic reasons) some sort of stream converter. Much
> freeze/thaw serialisation of data, it'd be a step between the raw csv
> and the
> reader that reads it.
I think my reason for wanting to have a CarefulDictReader (or a careful
DictReader), and why I think a stream converter isn't the best solution,
is that CSVs are very commonly used by people just starting to get their
feet wet with programming. Consider the use case: I've got my excel
file, and I'm just getting to the point where excel isn't cutting it
anymore. I want to start manipulating my data with python, and everyone
is telling me to use the csv library. DictReader sounds cool, because I
don't want to have to remember column numbers, and this is going make my
code much more readable. But I can't make it read my headers simply
because I put some blank space at the top of my excel file, above my
A stream converter is another layer of complexity that keeps this
potential new programmer from having a good experience with programming,
for what gain? So that the csv library can "properly" (?) treat a line
without data as a header? I think it would be fully reasonable (and add
little to no complexity to the code) to have a DictReader that treats
the first non-empty line as the header row.
The csv module is one of the big gateways into python programming for a
lot of people. That's also one of the reasons I think the sockets
library is a poor analogue here. A new programmer is unlikely to reach
the sockets library until they've been through a few of the urllibs, the
httplibs, requests, some part of http or an external web framework,
smtplib, or some other higher-level networking-related libraries.
For the same reason, I think if the solution isn't something handled
automatically by the library, it needs to be accompanied by improvements
to the documentation. If we're going to provide a DictReader that is
this easy to break, we need to answer the question: How do I fix it?
Currently, multiprocessing.Queue put() and get() methods hold locks
for the entire duration of the writing/reading to the backing
Connection (which can be a pipe, unix domain socket, or whatever it's
called on Windows).
For example, here's what the feeder thread does:
# Delete references to object. See issue16284
Connection.send() and Connection.recv() have to serialize the data
using pickle before writing them to the underlying file descriptor.
While the locking is necessary to guarantee atomic read/write (well,
it's not necessary if you're writing to a pipe less than PIPE_BUF, and
writes seem atomic on Windows), the locks don't have to be held while
the data is serialized.
Although I didn't make any measurement, my gut feeling is that this
serializing can take a non negligible part of the overall
sending/receiving time, for large data items. If that's the case, then
simply holding the lock for the duration of the read()/write() syscall
(and not during serialization) could reduce contention in case of
large data sending/receiving.
One way to do that would be to refactor the code a bit to provide
maybe a (private) AtomicConnection, which would encapsulate the
necessary locking: another advantage is that this would hide the
platform-dependent code inside Connection (right now, Queue only uses
a lock for ending on Unix platforms, since write is apparently atomic
I've so far been lurking on the tulip/async discussions, as although
I'm interested, I have no specific need for writing high-performance
However, I hit a use case today which seems to me to be ideal for an
async-style approach, and yet I don't think it's covered by the
current PEP. Specifically, I am looking at monitoring a
subprocess.Popen object. This is basically an IO loop, but monitoring
the 3 pipes to the subprocess (well, only stdout and stderr in my
case...). Something like add_reader/add_writer would be fine, except
for the fact that (a) they are documented as low-level not for the
user, and (b) they don't work in all cases (e.g. in a select-based
loop on Windows).
I'd like PEP 3156 to include some support for waiting on IO from (one
or more) subprocesses like this in a cross-platform way. If there's
something in there to do this at the moment, that's great, but it
wasn't obvious to me when I looked...
The following is a common pattern (used by, for example,
save_cwd = os.getcwd()
I suggest this deserves a context manager:
Initial feedback on IRC suggests shutil as where this functionality
should live (other suggestions were made, such as pathlib). Hence,
attached patch implements this as shutil.saved_cwd, based on os.fchdir.
The patch also adds os.chdir to os.supports_dir_fd and documents the
context manager abilities of builtins.open() in its reference.
diff -r 74b0461346f0 Doc/library/functions.rst
--- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000
@@ -828,6 +828,9 @@ are always available. They are listed h
Open *file* and return a corresponding :term:`file object`. If the file
cannot be opened, an :exc:`OSError` is raised.
+ This function can be used as a :term:`context manager` that closes the
+ file when it exits.
*file* is either a string or bytes object giving the pathname (absolute or
relative to the current working directory) of the file to be opened or
an integer file descriptor of the file to be wrapped. (If a file descriptor
diff -r 74b0461346f0 Doc/library/os.rst
--- a/Doc/library/os.rst Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/os.rst Sat Jan 19 09:39:27 2013 +0000
@@ -1315,6 +1315,9 @@ features:
This function can support :ref:`specifying a file descriptor <path_fd>`. The
descriptor must refer to an opened directory, not an open file.
+ See also :func:`shutil.saved_cwd` for a context manager that restores the
+ current working directory.
Availability: Unix, Windows.
.. versionadded:: 3.3
diff -r 74b0461346f0 Doc/library/shutil.rst
--- a/Doc/library/shutil.rst Fri Jan 18 17:53:18 2013 -0800
+++ b/Doc/library/shutil.rst Sat Jan 19 09:39:27 2013 +0000
@@ -36,6 +36,19 @@ copying and removal. For operations on i
Directory and files operations
+.. function:: saved_cwd()
+ Return a :term:`context manager` that restores the current working directory
+ when it exits. See :func:`os.chdir` for changing the current working
+ The context manager returns an open file descriptor for the saved directory.
+ Only available when :func:`os.chdir` supports file descriptor arguments.
+ .. versionadded:: 3.4
.. function:: copyfileobj(fsrc, fdst[, length])
Copy the contents of the file-like object *fsrc* to the file-like object *fdst*.
diff -r 74b0461346f0 Lib/os.py
--- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000
@@ -120,6 +120,7 @@ if _exists("_have_functions"):
_set = set()
+ _add("HAVE_FCHDIR", "chdir")
diff -r 74b0461346f0 Lib/shutil.py
--- a/Lib/shutil.py Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/shutil.py Sat Jan 19 09:39:27 2013 +0000
@@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c
"ignore_patterns", "chown", "which"]
# disk_usage is added later, if available on the platform
+ # saved_cwd is added later, if available on the platform
@@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p
if _access_check(name, mode):
+# Define the chdir context manager.
+if os.chdir in os.supports_dir_fd:
+ class saved_cwd:
+ def __init__(self):
+ def __enter__(self):
+ self.dh = os.open(os.curdir,
+ os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0))
+ return self.dh
+ def __exit__(self, exc_type, exc_value, traceback):
+ return False
diff -r 74b0461346f0 Lib/test/test_shutil.py
--- a/Lib/test/test_shutil.py Fri Jan 18 17:53:18 2013 -0800
+++ b/Lib/test/test_shutil.py Sat Jan 19 09:39:27 2013 +0000
@@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase):
rv = shutil.copytree(src_dir, dst_dir)
+ def test_saved_cwd(self):
+ if hasattr(os, 'fchdir'):
+ temp_dir = self.mkdtemp()
+ orig_dir = os.getcwd()
+ with shutil.saved_cwd() as dir_fd:
+ new_dir = os.getcwd()
+ self.assertIsInstance(dir_fd, int)
+ final_dir = os.getcwd()
+ self.assertEqual(orig_dir, final_dir)
+ self.assertEqual(temp_dir, new_dir)
+ self.assertFalse(hasattr(shutil, 'saved_cwd'))
PEP 3156 currently lists *29* proposed methods for the event loop API.
These methods serve quite different purposes and I think a bit more
structure in the overall API could help clarify that.
First proposal: clearly split the abstract EventLoop API from concrete
DescriptorEventLoop and IOCPEventLoop subclasses.
The main benefit here is to help clarify that:
1. the additional methods defined on DescriptorEventLoop and
IOCPEventLoop are not available on all event loop implementations, so
any code using them is necessarily event loop specific
2. the goal of the transport abstraction is to mask the differences
between these low level platform specific APIs
3. other event loops are free to use a completely different API
between their low level transports and the event loop
Second proposal: better separate the "event loop management", "event
monitoring" and "do things" methods
I don't have a clear idea of how to do this yet (beyond restructuring
the documentation of the event loop API in the PEP), but I can at
least describe the split I see (along with a few name changes that may
be worth considering).
Event loop management:
- run() # Perhaps "run_until_idle()"?
- run_forever() # Perhaps "run_until_stop()"?
- start_serving() # (The "stop serving" API is TBD in the PEP)
Do things (fire and forget):
Do things (and get the result with "yield from"):
- wrap_future() # Perhaps "wrap_executor_future"?
Low level transport creation:
- create_pipe() # Once it exists in the PEP
P.S. Off-topic for the thread, but I think the existence of run_once
vs run (or run_until_idle) validates the decision to stick with only
running one generation of ready callbacks per iteration. I forgot
about it when we were discussing that question.
Nick Coghlan | ncoghlan(a)gmail.com | Brisbane, Australia
any() and all() are very useful small functions, and I am wondering if
it could be interesting to have them work
with different operators, by using a callable.
e.g. something like:
def any(iterable, filter=operator.truth):
for element in iterable:
For instance I could then us any() to find out if there's a None in the
if any(iterable, op=lambda x: x is None):
raise SomeError("There's a none in that list")
Granted, it's easy to do it myself in a small util function - but since
any() and all() are in Python...
Tarek Ziadé · http://ziade.org · @tarek_ziade
Nick Coghlan <ncoghlan@...> writes:
> -1 from me
> I consider caring about the current directory to be an anti-pattern
I would agree, but in some places we unfortunately have to care about this,
because of stdlib history - for example, distutils. Wherever you have to do
"python setup.py ..." there is an implicit assumption that anything setup.py
looks at will be relative to wherever the setup.py is - it's seldom invoked
as "python /path/to/setup.py", and from what I've seen, very few projects
do the right thing in their setup.py and code called from it in terms of
getting an absolute path for the directory setup.py is in, and then using it
in subsequent operations.
I agree that we shouldn't encourage this kind of behaviour :-)
I'm looking through PEP 3156 and the Tulip code, and either something is
missing or I'm not looking in the right places.
I can't find any sort of callback / future return for asynchronous writes,
e.g. in transport.
Should there be no "data_sent" parallel to "data_received" somewhere? Or,
alternatively, "write" returning some sort of future that can be checked
later for status? For connections that aren't infinitely fast it's useful
to know when the data was actually sent/written, or alternatively if an
error has occurred. This is also important for when writing would actually
block because of full buffers. boost::asio has such a handler for