size of socket.recv buffer

Mon Feb 25 23:05:27 EST 2002

"Locke" <brown.2053 at osu.edu> writes:

> The argument to recv is supposed to be the buffer, acording to the
> documentation. But the doc doesnt say: buffer measured in what? Is it bytes?
> Octets? (by the way, what is the diff between a byte and an octet?)

An octet is formally an 8-bit value.  A byte was traditionally defined
as the unit of data used to represent a character (thus the term DBCS
- double byte character set for those sets that needed two bytes to
represent a character), and less than a word.  On most typical systems
nowadays they tend to be the same thing (8-bits), but historically
that wasn't always true.

Network protocols are often defined in terms of octets because it
doesn't have the potential to be platform specific.

> See, I am writing a pop3 client type thing, after requesting a message from
> the server, my first s.recv tells me how long the message will be in octets.
> So then I call s.recv again passing the number of octets as the argument to
> s.recv. Shouldn't this download the entire message? What actually happens is
> that it cuts off right in the middle of the message.

This is standard behavior for a stream socket (like TCP, which is what
you are using to communicate with the POP server).

Between the two endpoints of a socket, and then within the network in
the middle, data traversing the connection is being subjected to
various latencies and buffering.  So there need not be - and generally
is not - any direct relationship to how data is written at one end
(e.g., chunks or number of I/O operations) and how data is received at
the other end.

In other words, if the POP server tells you there are 1000 octets
coming, and even if it issues a single write() call for all 1000
octets, you may end up receiving them all at once, in 10 chunks of
100, 5 chunks of 200, or 1000 chunks of 1.  And this is even if you
tell recv() to accept up to 1000 octets in each call.  You have no way
of knowing, and its crucial that you should avoid creating any
dependencies on such behavior in your code.

> What's my problem? Is there an upper limit on what the buffer argument can
> be to recv? Or does pop3 just refuse to send the whole message with one call
> to recv? I don't think it is that, because what it sent me was like 600some
> characters, which isnt a power of two or any other computer-ish number.

How much you happen to receive is going to be dictated by a
combination of the socket buffering on the POP server, the latency
between the server and your client (and any data loss or retries that
occur along the way which TCP takes care of transparently), and then
socket buffering in your protocol stack.  If you were to, for example,
delay a few seconds before issuing your recv() call you might happen
to let your stack receive all the data and give it to you in one
chunk, but it would be a bad design.  And if the total amount of data
exceeded the socket buffers and TCP window, then it would get held up
at the sending end until you started draining some out at your client.

As you note in a later post, asking recv() for data if there is none
available on the socket can create blocking (unless you mark the
socket as non-blocking).  So the way you'd get around that in this
case is build a loop that worked to receive any remaining data (out of
the total expected) until it was done, and then stopped calling
receive.  Something like:

   # Assume msg_octets tells you how many octets you expect on socket s

   # Keep receiving until we have it all.  Append to list for efficiency
   data = []
   while msg_octets:
      data.append(s.recv(msg_octets))
      msg_octets = msg_octets - len(data[-1])

   # Merge list contents into a single data buffer   
   data = string.join(data,'')

This will keep receiving until it runs out of the total message size,
decreasing the possible maximum size based on the data already
received.  Note that it destroys msg_octets and assumes that you
aren't limiting your maximum size up front (e.g., that you're going to
hold the entire message in memory), but it should give you the idea.
It ends up with the received contents buffer referenced by 'data'.

In this case, you're also depending on TCP (as a stream protocol) to
guarantee that data written in at one end of the connection will
arrive intact, and in sequence, at the other.  So as long as the
server has properly told you how many octets it is sending, the recv()
calls will be safe even on a blocking mode socket.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/