[ python-Bugs-780354 ] socket.makefile() isn't compatible with marshal/cPickle/etc
SourceForge.net
noreply at sourceforge.net
Sun Mar 11 19:45:53 CET 2007
Bugs item #780354, was opened at 2003-07-30 12:02
Message generated for change (Comment added) made by montanaro
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=780354&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David M. Grimes (dmgrime)
>Assigned to: Nobody/Anonymous (nobody)
Summary: socket.makefile() isn't compatible with marshal/cPickle/etc
Initial Comment:
Even on platforms where the underlying C libraries and associated
code in
socketmodule.c support it, socket object's .makefile() method no
longer
return "real" file objects. This breaks support for cPickle, marshal,
and
any other C modules which expect a file object as an argument.
The change initially appears in socket.py v1.36 - the changelog
entry
states:
"""
The socket module now always uses the _socketobject wrapper
class, even on
platforms which have dup(2). The makefile() method is built directly
on top
of the socket without duplicating the file descriptor, allowing timeouts
to
work properly. Includes a new test case (urllibnet) which requires
the
network resource.
Closes bug 707074.
"""
I'm not sure of the implication WRT timeouts, but the patch is very
small -
quite obviously removing the platform check which prevented the
redefinition
of the symbol socket in the exports of socket.py. It now is always
the
_socketobject class, and not the underlying _socket.socket type
(when it is
known the platform supports all functionality in _socket).
For now, I can workaround (since I don't need the timeout
semantics) by
using _socket.socket() directly (I'm on a Linux platform), but I
wondered if
this is something which can/should be addressed differently?
----------------------------------------------------------------------
>Comment By: Skip Montanaro (montanaro)
Date: 2007-03-11 13:45
Message:
Logged In: YES
user_id=44345
Originator: NO
Crap. Wrong socket.makefile bug report.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2007-03-11 13:45
Message:
Logged In: YES
user_id=44345
Originator: NO
I'm not going to get to this. It may well be obsolete now anyway.
Facundo, I'm assigning to you simply because you are diving into the
timeout code right now. Maybe you can make a more informed decision.
--Skip
----------------------------------------------------------------------
Comment By: David M. Grimes (dmgrime)
Date: 2003-08-04 09:34
Message:
Logged In: YES
user_id=699265
Thanks Skip. I see that the ripple effect here is enormous
(not surprising, given that socket is used by many standard
modules). As for #3 - I understand you can use
marshal.dumps() and then sock.sendall() to achieve the write
side of a socket-based marshalling transaction, but for a long-
lived connection, marshal.load() on the .makefile()'d file of the
read-side provided a very nice message-boundary mechanism,
i.e. - marshal was the "protocol" on the wire...
Obviously (I've already done this in my application) I can just
use a thin length header on the wire, but it was just very nice
that marshal accomplished this by itself. I can also (since my
platform scope is known, controlled, and limited) create a
socket.py local to my application which is simply (I don't use
getfqdn()):
from _socket import *
And leave everything else as-is. Again, I appreciate the
attention and effort - I think the ramifications of eliminating
the "native C" implementation which previously
backed .makefile() (where supported) are more than just the
impact on marshal - I see potential performance loss for
things which rely on it. In any event, generic object support
for marshal.load() and .dump() would be great!
Thanks again, Dave.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2003-08-04 08:53
Message:
Logged In: YES
user_id=44345
The original socket.makefile dup()s the socket to create a new
file descriptor. If you have a timeout set on the original socket
the new file descriptor loses this property. This was a big
problem for the various modules which sit atop socket (httplib,
etc) because they generally all just call s.makefile() and then
treat the connected socket as a file. These are precisely the
modules/classes which benefit most from socket timeouts.
Note also that the current Unix implementation is what Windows
(and any other platforms without a dup() system call) use to
emulate makefile().
In the long run, I see a few other possibilities. Note that I
haven't considered the first two in any detail. I'm in the midst
of working on the third.
1. Add timeout functionality to the various higher-level modules
such as httplib. Once they call s.makefile() they could propagate
their current timeout to the new file object. Of course, this
means file objects need to be able to timeout (not obvious that
this is generally possible).
2. Store timeout information in the socket, so that if it's dup()'d
it propagates the timeout to the new descriptor. Again, it's not
obvious that this is possible.
3. Extend more general object support to marshal. I am working
on this and hope to have something available for review later
this week. As Martin indicated, it's not unreasonable to expect
people wanting to marshal data to non-file objects to first pass
them through marshal.dumps(). You can't, for example,
marshal data to StringIO objects directly.
----------------------------------------------------------------------
Comment By: David M. Grimes (dmgrime)
Date: 2003-08-04 07:37
Message:
Logged In: YES
user_id=699265
I know the wrapper has been there for a long time - some
platforms don't support the "native" underlying FILE
implementation, but many (most) platforms did.
What was the problem that necessitated the change? I'd be
more than willing to spend some time looking into alternative
solutions which keep the optimized implementation where
possible. Can you provide me with some history of the
motivations for the current solution?
Thanks!
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 15:52
Message:
Logged In: YES
user_id=21627
I don't know of any other approach to solve the problem.
Also notice that the wrapper object has existed for a long
time already, on some systems.
----------------------------------------------------------------------
Comment By: David M. Grimes (dmgrime)
Date: 2003-08-01 14:31
Message:
Logged In: YES
user_id=699265
It is clear that something which used to work no longer
works, and I'm just wondering if the breakage is necessary.
I'm not sure what all was involved with respect to timeouts
and why the makefile() implementation had to change. I
guess I see 3 problems:
1) Previously functional code breaks. While marshal is
not "general purpose" in the sense it can't deal with
instances, etc. it is still very useful and for data which
doesn't contain many redundant references (where cPickle
reduces them) it is the most efficient way to serialize and
unserialize "native" python types.
2) The performance of everything which uses the results of
socket.makefile() now suffers the cost of a pure-python
implementation where it had perviously been optimized as a C
implementation on platforms which supported it (most).
3) Any other 3rd party C extensions which use the
PyFile_Check (and would previously have worked) break.
Is there another way to solve the timeout problem? Breaking
something which has worked since 1.5 somehow feels wrong
to me.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 14:09
Message:
Logged In: YES
user_id=21627
I don't think anymore that marshal has a problem. There is
nothing wrong with it breaking when operating on a socket;
marshal is not for general application use, anyway. Anybody
who wants to transmit marshalled data over a socket can
still dump them into a string first.
So I propose to fix this aspect by documenting this intended
limitation.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:52
Message:
Logged In: YES
user_id=44345
Agreed, marshal has problems. It's problems are a bit
deeper than the technique in cPickle can solve, however. It
seems to me that the WFILE struct is a bad hack. It should
contain a union with three elements, one is the regular FILE *
stuff, one is the current write-to-string stuff, and a third is
a "generic object which has a write method".
I'll see if I can whip something up and pass along to Martin
for review.
----------------------------------------------------------------------
Comment By: David M. Grimes (dmgrime)
Date: 2003-08-01 09:27
Message:
Logged In: YES
user_id=699265
Skip --
I stand corrected (for cPickle, anyway) - but I imagine there is a
performance degredation over the "old" way of doing things - if
I get a
chance, I'll test that.
The problem does still exist at least in marshal (.load and .dump only
accept a "real" file object). I did a grep for PyFile_Check in
the source tree
prior to posting the bug report, and was a little over zealous to include
cPickle in the subject. There are other places in the source ( and
possibly
in 3rd party extensions) which use PyFile_Check, and I haven't had a
change to analyze those yet.
In any case, marshal doesn't work - in your test, if you change the
cPickle
to marshal, you'll get:
Traceback (most recent call last):
File "x.py", line 10, in ?
marshal.dump("Hello, world", f)
TypeError: marshal.dump() 2nd arg must be file
Again, I'm not sure of the impact other places where PyFile_Check is
used, and I don't know the performance loss in cPickle (as compared to
the "real" file implementation in pre-2.3 socket.makefile())
Let me know if there is anything I can do to help debug/fix!
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:18
Message:
Logged In: YES
user_id=44345
I modified the Demo/sockets/unixclient.py script to the following:
# Echo client demo using Unix sockets
# Piet van Oostrum
from socket import *
FILE = 'blabla'
s = socket(AF_UNIX, SOCK_STREAM)
s.connect(FILE)
import cPickle
f = s.makefile("wb")
cPickle.dump("Hello, world", f)
f.close()
#s.send('Hello, world')
data = s.recv(1024)
s.close()
print 'Received', `data`
It seemed to work fine when connected to the corresponding
unixserver.py script.
----------------------------------------------------------------------
Comment By: Skip Montanaro (montanaro)
Date: 2003-08-01 09:11
Message:
Logged In: YES
user_id=44345
Looking at the code in cPickle.c it appears that as long as the
object being written to has a 'write' method it should do the
right thing. Can you provide a simple test case which fails?
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2003-08-01 02:14
Message:
Logged In: YES
user_id=21627
See the bug that this patch closes; I doubt there is an
alternative approach.
Instead, I think the C modules that expect file objects need
to be fixed.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=780354&group_id=5470
More information about the Python-bugs-list
mailing list