Intermittent Failure on Serial Port (Trace Result)

H J van Rooyen mail at microcorp.co.za
Mon Jun 12 12:55:48 CEST 2006


8<-------------- (snip)

old fail:

| | > | > Traceback (most recent call last):
| | > | >   File "portofile.py", line 232, in ?
| | > | >     ret_val = main_routine(port, pollstruct, pfifo)
| | > | >   File "portofile.py", line 108, in main_routine
| | > | >     send_nak(port, timeout)      # so bad luck - comms error
| | > | >   File "/home/hvr/Polling/lib/readerpoll.py", line 125, in send_nak
| | > | >     port.flush()
| | > | > IOError: [Errno 29] Illegal seek
| | > | > close failed: [Errno 29] Illegal seek
| | > | >
| | > |
| | > |
| | > | > Where can I find out what the Errno 29 really means?
| | > | > Is this Python, the OS or maybe hardware?
| | > |
| | > | It is from kernel: grep -w 29 `locate errno`
| | > | /usr/include/asm-generic/errno-base.h: #define   ESPIPE          29
| | > |  /* Illegal seek */
| | > |
| | > | man lseek:
| | > |
| | > | ERRORS:
| | > |         ESPIPE fildes is associated with a pipe, socket, or FIFO.
| | > |
| | > | RESTRICTIONS:
| | > |         Linux  specific  restrictions:  using  lseek  on  a  tty device
| | > | returns ESPIPE.
| | >
| | >
| | > Thanks for the info - so the Kernel sometimes bombs me out - does anybody
| know
| | > why the python flush sometimes calls lseek?
| |
| | I thought it was your own flush method. If it is file.flush method that
| | makes the issue more complicated, since stdlib file.flush doesn't call
| | lseek method. I suggest you run your program using strace to log system
| | calls, without such log it's pretty hard to say what's going on. The
| | most interesting part is the end, but make sure you have enough space
| | for the whole log, it's going to be big.

8<---------------- (snip)

New Konsole output:

../logs/composite/rawlog
Pipe exists already
we get here - thread identity is: 1079298992
New Thread identity printed by new thread is: 1079298992
we get here too
5 0123456789012345 Sefie Sewenstein is in
5 02700000021348 Error record - catch her, catch him
3 9E00000049592001 Antonie de Kompaan is in
2 8A00000870BEDE01 Bertus Bierdrinker is in

At 2006/06/12 10:03:20  The following people were in:

0123456789012345 Sefie Sewenstein
9E00000049592001 Antonie de Kompaan
8A00000870BEDE01 Bertus Bierdrinker

Traceback (most recent call last):
  File "portofile.py", line 232, in ?
    ret_val = main_routine(port, pollstruct, pfifo)
  File "portofile.py", line 90, in main_routine
    s, ret_val  = poll_one(port, addy, dtype, timeout)  # poll one reader
  File "/home/hvr/Polling/lib/readerpoll.py", line 81, in poll_one
    port.flush()        # make sure it goes
IOError: [Errno 29] Illegal seek
close failed: [Errno 29] Illegal seek

---------------------end of Konsole output --------------------

We have the whole of that session captured by strace - some 45 Mb of it.

Note that the point of failure is not the same place in the python file, but it
is according to the traceback, again at a flush call...

The "close failed" is explicable - it seems to happen during closedown, with the
port already broken..,

Here is the the last bit of the trace - I cant figure out who or what calls the
lseek on file 3 to get the ESPIPE result...

There are some comments added to the trace and the code below like this
<<<<<<<<<[comment]

--------------------------strace output (tail
of) -------------------------------

gettimeofday({1150099428, 256932}, NULL) = 0
gettimeofday({1150099428, 257043}, NULL) = 0
gettimeofday({1150099428, 311401}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
write(3, "\6~7\1", 4)                   = 4     <<<<<<<<<<<<<<<< [This was last
write and it came out on port]

[I would have expected the flush here soon]

[but it does not seem to be a system call]
gettimeofday({1150099428, 409245}, NULL) = 0 <<<<<<<<[this looks like the start
of waiting for the echo]
gettimeofday({1150099428, 409350}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 409821}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 411038}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 411530}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 412005}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 412478}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 412950}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 413417}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
gettimeofday({1150099428, 413991}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 414557}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 415029}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 415507}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 415979}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 416451}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 416923}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
gettimeofday({1150099428, 417517}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, 0x4003a000, 4096)               = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 418024}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(3, "\6~7\1\4", 4096)               = 5 <<<<<<<<<< [This looks like the echo
from external hardware]
gettimeofday({1150099428, 568654}, NULL) = 0
gettimeofday({1150099428, 568830}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
gettimeofday({1150099428, 569357}, NULL) = 0
gettimeofday({1150099428, 624793}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
read(4, 0x80adbf8, 8192)                = -1 EAGAIN (Resource temporarily
unavailable)
gettimeofday({1150099428, 625385}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
gettimeofday({1150099428, 667735}, NULL) = 0
gettimeofday({1150099428, 667898}, NULL) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
_llseek(3, -3, 0xbfbc05c4, SEEK_CUR)    = -1 ESPIPE (Illegal seek)
<<<<<<<[This is where it breaks]
write(2, "Traceback (most recent call last"..., 35) = 35
open("portofile.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"portofile.py\", line 232,"..., 38) = 38
fstat64(8, {st_mode=S_IFREG|0777, st_size=7949, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x4003b000
read(8, "#!/usr/bin/python\n#\tModule to re"..., 4096) = 4096
read(8, "num):\t\t\t\t# see if we know him\n\t\t"..., 4096) = 3853
write(2, "    ", 4)                     = 4
write(2, "ret_val = main_routine(port, pol"..., 48) = 48
close(8)                                = 0
munmap(0x4003b000, 4096)                = 0
open("portofile.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"portofile.py\", line 90, "..., 48) = 48
fstat64(8, {st_mode=S_IFREG|0777, st_size=7949, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x4003b000
read(8, "#!/usr/bin/python\n#\tModule to re"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "s, ret_val\t= poll_one(port, addy"..., 68) = 68
close(8)                                = 0
munmap(0x4003b000, 4096)                = 0
open("/home/hvr/Polling/lib/readerpoll.py", O_RDONLY|O_LARGEFILE) = 8
write(2, "  File \"/home/hvr/Polling/lib/re"..., 67) = 67
fstat64(8, {st_mode=S_IFREG|0644, st_size=6691, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x4003b000
read(8, "#!/usr/bin/python\n#\tModule to ke"..., 4096) = 4096
write(2, "    ", 4)                     = 4
write(2, "port.flush()\t\t\t\t\t\t\t# make sure i"..., 39) = 39
close(8)                                = 0
munmap(0x4003b000, 4096)                = 0
write(2, "IOError", 7)                  = 7
write(2, ": ", 2)                       = 2
write(2, "[Errno 29] Illegal seek", 23) = 23
write(2, "\n", 1)                       = 1
rt_sigaction(SIGINT, {SIG_DFL}, {0x400e7680, [], 0}, 8) = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 1
close(4)                                = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
close(6)                                = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
close(5)                                = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
_llseek(3, -3, 0xbfbc0c44, SEEK_CUR)    = -1 ESPIPE (Illegal seek)
close(3)                                = 0
munmap(0x4003a000, 4096)                = 0
write(2, "close failed: [Errno 29] Illegal"..., 38) = 38
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
futex(0x8061488, FUTEX_WAKE, 1)         = 0
futex(0x80a8918, FUTEX_WAKE, 1)         = 0
close(7)                                = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
futex(0x804a198, FUTEX_WAKE, 1)         = 0
munmap(0x40039000, 4096)                = 0
exit_group(1)                           = ?

------------------------------------end of trace output -----------------------

for what its worth, the code that reportedly failed this time, looks like this:

8<------------(snip)

    port.write(poll)    # write the poll string out    <<<<<<[ This is the write
referred to above]
    port.flush()          # make sure it goes            <<<<<<[ This is the
reported point of failure]

    start_time = time.time()                                <<<<<<[ But we got
here ]
    s = ''
    gotchar = 3        # assume nothing will happen - no answer

    while (time.time() - start_time < timeout):        <<<<<<[ and we got here
too]
        s, ret_val =read_char(port, s)  #
        if ret_val == 2:
            continue    # nothing there
        if ret_val != 0:   # kbd int and other rubbish
            return s, ret_val
        if s == '':     # empty srring is nfg
            continue
        if s == eot:    # report on eot
            gotchar = 2
            return s, gotchar
        if s == poll:    # eat echoed poll string    <<<<<<[ And this looks true
above too - the echoed string]
            s = ''
            gotchar = 2    # nothing there yet
            start_time = time.time()
            continue
        start_time = time.time() # we had a char - restart time out
        gotchar = 0     # note the fact that there was something
    return s, gotchar    # time out return, with or without something in string

come to think of it - when isolated like that - the failures have consistently
been of the same form - a write followed immediately by a flush - could there be
timing issues here? remember that we are talking about a serial port at 9600 -
it takes about 1 Millisec to transmit a char - but then - if you dont follow the
write with a flush, then nothing happens unless there is a line feed char in the
string - that seems to force a real transmit...

But I am already reading, looking for the echo, before the first char has even
gone out...

But the failing flush (if that is it) does not show up in the strace until much
later....

Any help would be greatly appreciated - I am stuck!

- Hendrik





More information about the Python-list mailing list