Testing for changes on a web page (was: how to find difference in number of characters)

Emmanuel Surleau emmanuel.surleau at gmail.com
Sat Oct 9 09:21:11 EDT 2010


> On Oct 9, 5:41 pm, Stefan Behnel <stefan... at behnel.de> wrote:
> > "Number of characters" sounds like a rather useless measure here.
> 
> What I meant by number of characters was the number of edits happened
> between the two versions..Levenshtein distance may be one way for
> this..but I was wondering if difflib could do this
> regards

As pointed out above, you also need to consider how the structure of the web 
page has changed. If you are only looking at plain text, the Levenshtein 
distance measures the number of edit operations (insertion, deletion or 
substition) necessary to transform string A into string B.

Cheers,

Emm



More information about the Python-list mailing list