regexp pattern in _isrealfromline in class UnixMailbox

Marcel Lanz marcel.lanz at dkmp.org
Sun Apr 29 11:17:36 EDT 2001


Hi all,

I use the class UnixMailbox in Python 2.0 from the mailbox.py module to
parse mailboxes but never got some messages.

I found that the regexp in the method _isrealfromline in class UnixMailbox
doesn't match with my 'From' lines I have in my files.

All 'From' lines in the file looks like this one:

>From lanzm at dkmp.org Thu Mar  1 10:55:15 2001 +0100

but the pattern in _isrealfromline doesn't match:

_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
                   r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$"

There might be a problem with the + in the time zone shift: +0100 which is
the reason why the pattern doesn't match:

After I changed the pattern to:

_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \
                   r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+[+-]?\d\d\d\d\s*$"
                                                      ^^^^^
I can parse mailbox files for hours ...


Best regards
Marcel

-- 
Marcel Lanz
http://www.ds9.ch/lanz/
marcel.lanz at dkmp.org | marcel.lanz at computer.org
GnuPG: F975 C6F7 04C8 642B 6DF4  4DF4 2945 F02A 797E 7DAB




More information about the Python-list mailing list