[Tutor] extract meaningful data from garbage

Shashwat Anand anand.shashwat at gmail.com
Sun Jan 3 06:58:18 CET 2010


I need to extract some meaningful data from grabages.
Here are four examples. I need to get date, company name and address from
these.
For date i used regex but I'm unable to find any definite pattern for
address and company name
the format is more or less :
garbage
id - date
garbage
company name
garbage
company address
garbage

How should I parse info if I'm not certain of any definite rules. This is my
first time dealing with real-life data.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100103/7f164633/attachment.htm>


More information about the Tutor mailing list