Intermittent Failure on Serial Port (Other thread code)
H J van Rooyen
mail at microcorp.co.za
Tue Jun 13 04:24:44 EDT 2006
I would like to publicly thank Serge Orloff for the effort he has put in so far
and his patience...
He is a Scholar and a Gentleman.
Serge Orloff wrote:
| H J van Rooyen wrote:
|
| > Note that the point of failure is not the same place in the python file, but
it
| > is according to the traceback, again at a flush call...
|
| Yes, traceback is bogus. Maybe the error is raised during garbage
| collection, although the strace you've got doesn't show that. The main
| reason of the failure seems to be a workaround in python's function
| new_buffersize, it doesn't clear errno after lseek and then this errno
| pops up somewhere else. There are two places I can clearly see that
| don't clear errno: file_dealloc and get_line. Obviously this stuff
| needs to be fixed, so you'd better file a bug report.
Ouch! - I am new in this neck of the woods - what are the requirements for
something like this and where should I send it to so its useful? - so far its so
very vague in my mind that I am not sure that I can actually tell someone else
properly what's wrong - except for a "it does not work" bleat which is not very
illuminating...
| I'm not sure how
| to work around this bug in the meantime, since it is still not clear
| where this error is coming from. Try to pin point it.
I will put in a lot of try - except stuff looking for this errno 29 and see what
comes up and where.
Not sure if this will catch it but it may give a clue..
|For example, if
| your code relies on garbage collection to call file.close, try to close
| all files in your program explicitly. It seems like a good idea anyway,
| since your program is long running, errors during close are not that
| significant. Instead of standard close I'd call something like this:
|
| def soft_close(f):
| try:
| f.close()
| except IOError, e:
| print >>stderr, "Hmm, close of file failed. Error was: %s" %
| e.errno
As you remark - the code is long running - its supposed to work for ever and
come back up again if the power has failed - so for now the serial port is never
explicitly closed - I open and close the other files as I use them to try to
make sure the data is written to disk instead of just cached to memory. I will
put this sort of thing in everywhere now to try and isolate whatever it is that
is biting me, not only on the close statements.
|
| > The "close failed" is explicable - it seems to happen during closedown, with
the
| > port already broken..,
|
| It is not clear who calls lseek right before close. lseek is called by
| new_buffersize that is called by file.read. But who calls file.read
| during closedown?
When I said closedown - I meant whatever the system does after the exception was
raised - I have not yet gotten as far as writing a clean close... - so far I am
concentrating on the polling protocol, to safely get the data from the readers
to the disk - port to file.... hence the name :-)
Now there is another thread running - it accesses files, (disk and a fifo to
trigger the disk write) but not the serial port - I have not laid any stress on
it because I thought it was irrelevant, but now I am not so sure - the code
follows below -
So question - is this error number a process global thing or is it local to a
thread or an object? - it could be this thread that calls read while the other
one is in the process of dying after the exception - it should not access the
port, though, although it repetitively reads a fifo... - come to think of it -
it could be this thread that first raises the ESPIPE for all I know (that is if
its global and not thread specific)...
def maintain_onsite(fifoname, filename):
"""Here we keep track of who is in, and who out of the site"""
j = thread.get_ident()
print 'New Thread identity printed by new thread is:', j
pfifo = open(fifoname,'r',1) # Reading, line buffered
unblock(pfifo) # call some magic
global on_site #use top level dictionary to avoid a lot of copying
s = ""
d = {}
while True:
try:
s = pfifo.readline()
except IOError:
time.sleep(1)
continue
if s == '':
continue
if s != 'goon\n': # see if we got a go on signal
continue
d = on_site # make a copy of the on site dictionary
pfile = open(filename,'w',1) # The file of people on site
for x in d:
pfile.write(x + ' ' + d[x] + '\n') # rewrite it - a bit brute force...
pfile.close()
s = '' # clean out the receive string again
Here is unblock code:
# Some magic to make a file non blocking - from the internet
def unblock(f):
"""Given file 'f', sets its unblock flag to true."""
fcntl.fcntl(f.fileno(), fcntl.F_SETFL, os.O_NONBLOCK)
- Hendrik
More information about the Python-list
mailing list