[Tutor] parse emails as they come in

linuxian iandsd pylinuxian at gmail.com
Wed Apr 2 08:52:48 CEST 2008


ok - as i mentioned in my first email i use procmail to put THE BODY of all
incoming mail into a file (that is one per incoming email as i use the
variable $date-$time in the name).

now this file can contain only one email but it can also contain 2 or more
(this happens if for example there is a dns problem in the internet, so mail
can't make it, but once internet recovers from the dns problem mail rushes
in & we may have multiple messages per file. this is also true is i do this
cmd :
for i in 1 2 3 4 5 6 ; do echo $i | mail -s 'test mail'  john ; done

well, this file is processed by my script to extract data.

the problem is : i can parse 18 lines (that is one email per file) fine, but
i need suggestions in parsing it when it contains two emails (that is 18
lines + 18 lines ....)

i hope i have explained my problem well this time.

i will make the optimizations you told me (directly inserting data into
mysql & the lines loop as well)

thanks a lot.


On Tue, Apr 1, 2008 at 9:17 PM, Steve Willoughby <steve at alchemy.com> wrote:

> On Tue, Apr 01, 2008 at 09:07:04PM +0000, linuxian iandsd wrote:
> > a=open('/home/john/data/file_input.tmp', 'r')
> > b=open('/home/john/data/file_output', 'w')
>
> This is collecting mail as it comes in?  If you have a mail
> rule in place to dump mail into this file_input.tmp file,
> you could run into trouble if multiple messages arrive close
> enough together that you get a race condition.
>
> I'd suggest just using something like procmail to invoke
> your Python script directly on the incoming message, so
> you don't have to dump it to a temporary input file.
> You'll be guaranteed to see one and only one mail per
> invocation of your script (although it may invoke
> several copies of your script at the same time, so plan
> for that, e.g., don't write to the same output filename
> every time--or don't write to a file at all, just have
> your script put the data into MySQL or whatever directly).
>
> > aa=a.readlines()
> > n=0
> > for L in aa:
>
> Generally speaking, it's better to let Python iterate
> through the lines of a file.  The above code sucks in
> the entire (possibly huge) file into memory and then
> iterates over that list.  Better:
>
> for L in a:
>
> or better yet:
>
> for lines in input_file:
>
> > # a little secret : this little script helps me load data from mail to a
> > mysql database by converting it into ; separated values :)
>
> I'd look at just gathering the raw data into Python variables and then
> connecting to MySQL directly and executing a SQL statement to import the
> data straight in.  You'll avoid a host of problems with properly quoting
> data (what if a ';' is in one of the data fields?), as well as making it
> unnecessary to carry out another post-processing step of gathering this
> script's output and stuffing it into MySQL.
>
> --
> Steve Willoughby    |  Using billion-dollar satellites
> steve at alchemy.com   |  to hunt for Tupperware.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20080402/f7cdf61e/attachment.htm 


More information about the Tutor mailing list