[ python-Bugs-756104 ] Calling socket.recv() with a large number breaks

SourceForge.net noreply at sourceforge.net
Sat Jan 15 20:07:01 CET 2005


Bugs item #756104, was opened at 2003-06-17 15:25
Message generated for change (Comment added) made by facundobatista
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756104&group_id=5470

>Category: Documentation
>Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Alex R.M. Turner (plexq)
Assigned to: Nobody/Anonymous (nobody)
Summary: Calling socket.recv() with a large number breaks

Initial Comment:
I have a backup script that calls socket.recv() passing
the amount of data that is left to get for a given file
as the argument.  For very large files (I have one here
that is 1.9Gig) it seems to break horribly after
receiving about 130Meg.  I have tried to track it down
precisely with basic debugging (putting in print
statements in the python code) but it just seams to
freak out around the 130meg mark, and it just doesn't
make sense.  If I change the argument passed to recv to
32767 it works just fine.  I have attatched the loop
that reads data from the socket (In a working state,
the bad code is commented out).

----------------------------------------------------------------------

>Comment By: Facundo Batista (facundobatista)
Date: 2005-01-15 16:07

Message:
Logged In: YES 
user_id=752496

This is a documentation bug; something the user should be
"warned" about. 

This caught me once, and two different persons asked about
this in #python, so maybe we should put something like the
following in the recv() docs.

"""
For best match with hardware and network realities, the
value of "buffer" should be a relatively small power of 2,
for example, 4096.
"""

If you think the wording is right, just assign the bug to
me, I'll take care of it.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-09-13 11:11

Message:
Logged In: YES 
user_id=31435

As Jeremy implied at the start, someone needs to 
demonstrate that "the bug" is actually in Python, not in your 
platform's implementation of sockets.  If a C program displays 
the same behavior, your complaint is with your platform 
socket implementation.  Sockets are low-level gimmicks, 
which is why Jeremy expected a C program to fail the same 
way.

----------------------------------------------------------------------

Comment By: Dmitry Dvoinikov (targeted)
Date: 2004-09-13 08:17

Message:
Logged In: YES 
user_id=1120792

I've also been hit by this problem, not at a 130Meg read,
but at a mere 10Meg. Because of that I had to change my code
to read many small chunks rather than a single blob and that
fixed it.

Still, I disagree with tim_one. If recv() is limited with
the amount of data it can read per call, it should be set
and documented, otherwise it is a bug and the call is
unreliable. Nobody has to follow the decribed
"best-practice" of reading small chunks, it actually worsens
code quality, and what's worse - it makes other stuff break.
Example - I was using the SimpleXMLRPCServer
(Lib/SimpleXMLRPCServer.py), and it contains the following
line at it's heart:

data = self.rfile.read(int(self.headers["content-length"]))

Guess what was happening with requests with content-length
larger than 10Meg (in my case).

I've aso thought the problem was with max() instead of
min(), but as tim_one rightfully pointed out, this was a
desired behaviour. Ironically though, replacing max() with
min() fixes the problem as there is no more huge reads at
lower level.


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2004-07-26 21:41

Message:
Logged In: YES 
user_id=31435

potterru, I don't believe plexq was using _fileobject.read() -- 
he said he was using socket.recv(), and all the comments 
later were consistent with that.

The code you found does appear to *intend* max():  code 
following the snippet you quoted clearly expects that it may 
have read more than "left" bytes, and it would not be 
worrying about that had min() been intended.  I agree the 
code is pretty inscrutable regardless, but we'd have to ask 
Guido why he wrote it that way.

In any case, since this bug report is about socket.recv(), if 
you want to make a case that _fileobject.read() is buggy you 
should open a new report for that.

----------------------------------------------------------------------

Comment By: Igor E. Poteryaev (potterru)
Date: 2004-07-26 08:23

Message:
Logged In: YES 
user_id=48596

It looks like bug in module socket.py in _fileobject.read method. 
... 
while True: 
    left = size - buf_len 
    recv_size = max(self._rbufsize, left) 
    data = self._sock.recv(recv_size) 
 
This code should read not more than *left* or *buffer size* 
 
So, it should be min instead of max ! 
 
Should I file a bug/patch for this ? 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-06-18 15:51

Message:
Logged In: YES 
user_id=31435

I assume you're running on Linux.  Python's implementation of 
recv asks the platform malloc for a buffer of the size 
requested, and raises an exception if malloc returns NULL.  
Unfortunately, malloc on Linux has a nasty habit of returning 
non-NULL even if there's no chance you can actually use the 
amount of memory requested.  There's really nothing Python 
can do about that.

So back to Jeremy's comment:  try the same thing in C.  If 
you get ridiculous behavior there too, it's a platform C/OS 
bug, and Python won't be able to hide it from you.

----------------------------------------------------------------------

Comment By: Alex R.M. Turner (plexq)
Date: 2003-06-18 15:22

Message:
Logged In: YES 
user_id=60034

That as maybe - it might be worth putting a suggested
maximum in the docs.  However I would say that given that an
IPv6 packet could be as large as 2Gig, it's not unreasonable
to ask for a packet as large as 1 Gig.  Wether the problem
is in glibc or python I don't know, although it seems that
asking for a buffer of 1.3 Gig in size, and passing that to
recv() would be odd behaviour on a current system in C given
that most systems couldn't allocate that much memory to a
buffer ;).  I have written fairly extensive socket code in
C/C++ before, and I never used anything larger than 65536
for the obvious reason that you can't receive anything
bigger than that in IPv4 (and most NICs can't handle
anything that big either).  I figured it would be
interesting to see what happened :).  I have a penchant for
being the only person in history to do quite a few things
apparently!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-06-18 13:29

Message:
Logged In: YES 
user_id=31435

BTW, if you're new to socket programming, do read Python's 
socket programming HOWTO:

http://www.amk.ca/python/howto/sockets/

I expect you're the only person in history to try passing such 
a large value to recv() -- even if it worked, you'd almost 
certainly run out of memory trying to allocate buffer space for 
1.9GB.  sockets are a low-level facility, and it's common to 
pass a relatively small power of 2 (for best match with 
hardware and network realities).

----------------------------------------------------------------------

Comment By: Jeremy Hylton (jhylton)
Date: 2003-06-18 13:02

Message:
Logged In: YES 
user_id=31392

What happens when you write a C program that does the same
thing?  I expect you'll see similar problems.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=756104&group_id=5470


More information about the Python-bugs-list mailing list