Lightwight socket IO wrapper
James Harris
james.harris.1 at gmail.com
Tue Sep 22 15:45:24 EDT 2015
"Dennis Lee Bieber" <wlfraed at ix.netcom.com> wrote in message
news:mailman.12.1442794762.28679.python-list at python.org...
> On Sun, 20 Sep 2015 23:36:30 +0100, "James Harris"
> <james.harris.1 at gmail.com> declaimed the following:
>
>
>>
>>There are a few things and more crop up as time goes on. For example,
>>over TCP it would be helpful to have a function to receive a specific
>>number of bytes or one to read bytes until reaching a certain
>>delimiter
>>such as newline or zero or space etc. Even better would be to be able
>>to
>>use the iteration protocol so you could just code next() and get the
>>next such chunk of read in a for loop. When sending it would be good
>>to
>>just say to send a bunch of bytes but know that you will get told how
>>many were sent (or didn't get sent) if it fails. Sock.sendall()
>>doesn't
>>do that.
>
> Note that the "buffer size" option on a TCP socket.recv() gives you
> your "specific number of bytes" -- if available at that time.
"If" is a big word!
AIUI the buffer size is not guaranteed to relate to the number of bytes
returned except that you won't/shouldn't(!) get more than the buffer
size.
> I wouldn't want to user .recv(1) though to implement your "reaching a
> certain delimiter"... Much better to read as much as available and
> search
> it for the delimiter.
Yes, that's what I do at the moment. I keep a block of bytes, add any
new stuff to it and scan it for delimiters.
> I'll confess, adding a .readln() FOR TCP ONLY, might
> be a nice extension over BSD sockets (might need to allow option for
> whether line-ends are Internet standard <cr><lf> or some other marker,
> and
> whether they should be converted upon reading to the native format for
> the
> host).
Akira Li pointed out that there is just such an extension: makefile.
Scanning to <lf> is what I do just now as that includes <cr><lf> too and
I leave them on the string. IIRC file.readline works in the same way.
>>I thought UDP would deliver (or drop) a whole datagram but cannot find
>>anything in the Python documentaiton to guarantee that. In fact
>>documentation for the send() call says that apps are responsible for
>>checking that all data has been sent. They may mean that to apply to
>>stream protocols only but it doesn't state that. (Of course, UDP
>>datagrams are limited in size so the call may validly indicate
>>incomplete transmission even when the first part of a big message is
>>sent successfully.)
>>
> Looking in the wrong documentation <G>
>
> You probably should be looking at the UDP RFC. Or maybe just
>
> http://www.diffen.com/difference/TCP_vs_UDP
>
> """
> Packets are sent individually and are checked for integrity only if
> they
> arrive. Packets have definite boundaries which are honored upon
> receipt,
> meaning a read operation at the receiver socket will yield an entire
> message as it was originally sent.
> """
I would rather see it in the Python docs because we program to the
language standard and there can be - and often are, for good reason -
areas where Python does not work in the same way as underlying systems.
> Even if the IP layer has to fragment a UDP packet to meet limits of
> the
> transport media, it should put them back together on the other end
> before
> passing it up to the UDP layer. To my knowledge, UDP does not have a
> size
> limit on the message (well -- a 16-bit length field in the UDP
> header). But
> since it /is/ "got it all" or "dropped" with no inherent confirmation,
> one
> would have to embed their own protocol within it -- sequence numbers
> with
> ACK/NAK, for example. Problem: if using LARGE UDP packets, this
> protocol
> would mean having LARGE resends should packets be dropped or arrive
> out of
> sequence (and since the ACK/NAK could be dropped too, you may have to
> handle the case of a duplicated packet -- also large).
Yes, it was the 16-bit limitation that I was talking about.
> TCP is a stream protocol -- the protocol will ensure that all data
> arrives, and that it arrives in order, but does not enforce any
> boundaries
> on the data; what started as a relatively large packet at one end may
> arrive as lots of small packets due to intermediate transport limits
> (one
> can visualize a worst case: each TCP packet is broken up to fit
> Hollerith
> cards; 20bytes for header and 60 bytes of data -- then fed to a reader
> and
> sent on AS-IS). Boundaries are the end-user responsibility... line
> endings
> (look at SMTP, where an email message ends on a line containing just a
> ".")
> or embedded length counter (not the TCP packet length).
Yes.
>>Receiving no bytes is taken as indicating the end of the
>>communication.
>>That's OK for TCP but not for UDP so there should be a way to
>>distinguish between the end of data and receiving an empty datagram.
>>
> I don't believe UDP supports a truly empty datagram (length of 0) --
> presuming a sending stack actually sends one, the receiving stack will
> probably drop it as there is no data to pass on to a client (there is
> a PR
> at work because we have a UDP driver that doesn't drop 0-length
> messages,
> but also can't deliver them -- so the circular buffer might fill with
> undeliverable headers)
As others have pointed out, UDP implementations do seem to work with
zero-byte datagrams properly. Again, I would rather see that in the
Python documentation which is what, effectively, forms a contract that
we should be able to rely on.
James
More information about the Python-list
mailing list