[Tutor] intercepting and recored I/O function calls
Martin A. Brown
martin at linux-ip.net
Fri Sep 17 00:45:29 CEST 2010
[apologies in advance for an answer that is partially off topic]
Hi there JoJo,
: I could begin with tracing I/O calls in my App.. if its
: sufficient enough i may not need i/o calls for the OS.
What do you suspect? Filesystem I/O?
* open(), close(), opendir() closedir() filesystem latency?
* read(), write() latency?
* low read() and write() throughput?
Network I/O?
* Are name lookups taking a long time?
* Do you have slow network throughput? (Consider tcpdump.)
Rather than writing code (at first glance), why not use a system
call profiler to check this out. It is very unlikely that python
itself is the problem. Could it be the filesystem/network? Could
it be DNS? A system call profiler can help you find this.
Are you asking this because you plan on diagnosing I/O performance
issues in your application? Is this a one time thing in a
production environment that is sensitive to application latency?
If so, you might try tickling the application and attaching to the
process with a system call tracer. Under CentOS you should be able
to install 'strace'. If you can run the proggie on the command
line:
strace -o /tmp/trace-output-file.txt -f python yourscript.py args
Then, go learn how to read the /tmp/trace-output-file.txt.
Suggested options:
-f follow children
-ttt sane Unix-y timestamps
-T total time spent in each system call
-s 256 256 byte limit on string output (default is 32)
-o file store trace data in a file
-p pid attach to running process of pid
-c only show a summary of cumulative time per system call
: > But this is extremely dependant on the Operating System - you will
: > basically have to intercept the system calls. So, which OS are
: > you using? And how familiar are you with its API?
:
: I am using centos, however i don't even have admin privileges.
: Which API are you referring to?
You shouldn't need admin privileges if you can run the program as
yourself. If you have setuid/setgid bits, then you will need
somebody with administrative privileges to help you.
OK, so let's say that you have already done this and understand all
of the above, you know it's not the system and you really want to
understand where your application is susceptible to bad performance
or I/O issues. Now, we're back to python land.
* look at the profile module
http://docs.python.org/library/profile.html
* instrument your application by using the logging module
http://docs.python.org/library/logging.html
You might ask how it is a benefit to use the logging module. Well,
if your program generates logging data (let's say to STDERR) and you
do not include timestamps on each log line, you can trivially add
timestamps to the logging data using your system's logging
facilities:
{ python thingy.py >/dev/null ; } 2>&1 | logger -ist 'thingy.py' --
Or, if you like DJB tools:
{ python thingy.py >/dev/null ; } 2>&1 | multilog t ./directory/
Either of which solution leaves you (implicitly) with timing
information.
: > Also, While you can probably do this in Python but its likely
: > to have a serious impact on the OS performance, it will slow
: > down the performamce quite noticeably. I'd normally recommend
: > using C for something like this.
Alan's admonition bears repeating. Trapping all application I/O is
probably just fine for development, instrumenting and diagnosing,
but you may wish to support that in an easily removable manner,
especially if performance is paramount.
Good luck,
-Martin
--
Martin A. Brown
http://linux-ip.net/
More information about the Tutor
mailing list