[Tutor] reading an input stream

richard kappler richkappler at gmail.com
Thu Jan 7 11:23:35 EST 2016


Hi James,

I've actually come a ways since this message was posted, but still have
some related struggles.


>Raw sockets is almost never the right solution, while a basic socket to
socket connection is easy enough to program, handling failure and
concurrency can very quickly make >the solution a lot more complex than it
needs to be, so perhaps you could supply more information? (I realise I'm
venturing outside the realm of learning python, but I'm a >pedant for doing
things right).

>You said you need to read XML in from a socket connection. You've not
mentioned what's generating the data? Is that data sent over HTTP in which
case is this part of a SOAP >or REST API? Is the data being generated by
something you've written or a 3rd party software package? Is REST an
option? Is there a reason to serialise to XML? (If I was >performing the
serialisation I would go with JSON if being human readable was a
requirement. )

The method of receiving data is neither optional nor under our control. The
XML is generated by a camera tunnel and contains data about the packages
that pass through the tunnel. It comes in over the network (tcp) to a
specific port (2008). This is all within an internal vpn so security, for
the most part, is not an issue, though efficiency is. This one script can
have as many as 30+ connections to it. The xml messages have STX (\x02) and
ETX (\x03) bookending them and come as fast as 3-4 per second per
connection. At the moment we're in the initial dev stages. We have a
general framework decided upon, now are trying to work out the code. Here's
what works thus far (sorta-kinda):

#!/usr/bin/env python

import socket
import lxml.etree as ET

def dataParse(data):
    print 'parsing'
    xslt = ET.parse('stack13.xsl')
    dom = ET.XML(data)
    transform = ET.XSLT(xslt)
    newdom = transform(dom)
    f1.write(str(newdom))

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock_addr = ('', 2008)
sock.bind(sock_addr)
sock.listen(40)
print 'listening'

f1 = open('parser.out', 'a')
print "opening parser.out"

while True:
    # wait for a connection
    connection, client_address = sock.accept()
    while True:
        data = connection.recv(8192)
        if data:
            dataParse(data)

f1.close()

There is a tunnel simulator in our test environment that reads from a file,
each message on a separate line, and sends each line out through the socket
to the above. I say 'sorta-kinda' because this only works if there is a
time delay between each line/message being sent. If there is no time delay,
the dataParser throws a

lxml.etree.XMLSyntaxError: Extra content at the end of the document, line
2, column 1

error. This makes sense as the parser is currently just reading whatever is
in the buffer and the messages being sent DO NOT currently have the STX/ETX
on them. We are currently working on several fronts: changing the test
tunnelSim file to include STX/ETX; making the server multi-threaded, one
thread per incoming connection (working with numerous examples found on the
web); setting up a buffer within each thread into which 'data' will go in
one end (data += buffer?) and comes out the other - using a generator that
reads from STX to ETX to take a full 'message' out of the buffer that will
then be yielded to dataParse; and sending the parsed data out to via a
splunk event writer.

Or at least that's the idea. We're still far away from figuring it out, but
moving closer.

regards, Richard


More information about the Tutor mailing list