a text processing problem: are regexpressions necessary?
Sandy Norton
sandskyfly at hotmail.com
Sun Mar 17 08:29:18 EST 2002
Hi,
I thought I'd share this problem which has just confronted me.
The problem
An automatic way to tranform urls to articles on various news sites to
their printerfriendly counterparts is complicated by the fact that
different sites have different schemes for doing this. (see examples
below)
Now given two examples for each site: a regular link to an article and
its printer-friendly counterpart, is there a way to automatically
generate transformation code that is specific to each site, but which
generalizes across all article urls within that site?
Here are a few examples from several online publications:
http://news.bbc.co.uk/hi/english/world/africa/newsid_1871000/1871611.stm
http://news.bbc.co.uk/low/english/world/africa/newsid_1871000/1871611.stm
http://www.economist.com/agenda/displayStory.cfm?Story_ID=1043688
http://www.economist.com/agenda/PrinterFriendly.cfm?Story_ID=1043688
http://www.nationalreview.com/ponnuru/ponnuru031502.shtml
http://www.nationalreview.com/ponnuru/ponnuruprint031502.html
http://www.thenation.com/doc.mhtml?i=20020204&s=said
http://www.thenation.com/docPrint.mhtml?i=20020204&s=said
I'm kinda heading in the direction of attempting to generate regular
expressions for each site... But I'm a bit apprehensive about doing
this. Is there a more pythonic way to approach this problem?
Any advice would be appreciated.
regards,
Sandy
More information about the Python-list
mailing list