[Tutor] parse emails as they come in
pylinuxian at gmail.com
Wed Apr 2 08:52:48 CEST 2008
ok - as i mentioned in my first email i use procmail to put THE BODY of all
incoming mail into a file (that is one per incoming email as i use the
variable $date-$time in the name).
now this file can contain only one email but it can also contain 2 or more
(this happens if for example there is a dns problem in the internet, so mail
can't make it, but once internet recovers from the dns problem mail rushes
in & we may have multiple messages per file. this is also true is i do this
for i in 1 2 3 4 5 6 ; do echo $i | mail -s 'test mail' john ; done
well, this file is processed by my script to extract data.
the problem is : i can parse 18 lines (that is one email per file) fine, but
i need suggestions in parsing it when it contains two emails (that is 18
lines + 18 lines ....)
i hope i have explained my problem well this time.
i will make the optimizations you told me (directly inserting data into
mysql & the lines loop as well)
thanks a lot.
On Tue, Apr 1, 2008 at 9:17 PM, Steve Willoughby <steve at alchemy.com> wrote:
> On Tue, Apr 01, 2008 at 09:07:04PM +0000, linuxian iandsd wrote:
> > a=open('/home/john/data/file_input.tmp', 'r')
> > b=open('/home/john/data/file_output', 'w')
> This is collecting mail as it comes in? If you have a mail
> rule in place to dump mail into this file_input.tmp file,
> you could run into trouble if multiple messages arrive close
> enough together that you get a race condition.
> I'd suggest just using something like procmail to invoke
> your Python script directly on the incoming message, so
> you don't have to dump it to a temporary input file.
> You'll be guaranteed to see one and only one mail per
> invocation of your script (although it may invoke
> several copies of your script at the same time, so plan
> for that, e.g., don't write to the same output filename
> every time--or don't write to a file at all, just have
> your script put the data into MySQL or whatever directly).
> > aa=a.readlines()
> > n=0
> > for L in aa:
> Generally speaking, it's better to let Python iterate
> through the lines of a file. The above code sucks in
> the entire (possibly huge) file into memory and then
> iterates over that list. Better:
> for L in a:
> or better yet:
> for lines in input_file:
> > # a little secret : this little script helps me load data from mail to a
> > mysql database by converting it into ; separated values :)
> I'd look at just gathering the raw data into Python variables and then
> connecting to MySQL directly and executing a SQL statement to import the
> data straight in. You'll avoid a host of problems with properly quoting
> data (what if a ';' is in one of the data fields?), as well as making it
> unnecessary to carry out another post-processing step of gathering this
> script's output and stuffing it into MySQL.
> Steve Willoughby | Using billion-dollar satellites
> steve at alchemy.com | to hunt for Tupperware.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Tutor