[Tutor] reading an input stream

Cameron Simpson cs at zip.com.au
Thu Jan 7 17:07:58 EST 2016


On 08Jan2016 08:52, Cameron Simpson <cs at zip.com.au> wrote:
[...]
>Instead, gather the data progressively and emit XML chunks. You've got a TCP 
>stream - the TCPServer class will do an accept and handle you an _unbuffered_ 
>binary stream file from which you can just .read(), ignoring any arbitrary 
>"packet" sizes.  For example (totally untested) using a generator:
[...]

Just a few followup remarks:

This is all Python 3, where bytes and strings are cleanly separated. You've got 
a binary stream with binary delimiters, so we're reading binary data and 
returning the binary XML in between. We separately decode this into a string 
for handing to your XML parser. Just avoid Python 2 altogether; this can all be 
done in Python 2 but it is not as clean, and more confusing.

The socketserver module is... annoyingly vague about what the .rfile property 
gets you. It says a "a file-like object". That should be a nice io.BytesIO 
subclass with a .read1() method, but conceivably it is not. I'm mentioning this 
because I've noticed that the code I lifted the TCPServer setup from seems to 
make a BytesIO from whole cloth by doing:

  fp = os.fdopen(os.dup(request.fileno()),"rb")

You'd hope that isn't necessary here, and that request.rfile is a nice BytesIO 
already.

In xml_extractor, the "# locate start of XML chunk" loop could be better by 
using .find exactly as in the "# gather XML chunk"; I started with .read(1) 
instead of .read1(8192), which is why it does things byte by byte.

Cheers,
Cameron Simpson <cs at zip.com.au>


More information about the Tutor mailing list