how to find difference in number of characters

Diez B. Roggisch deets at web.de
Sat Oct 9 20:39:11 CEST 2010


harryos <oswald.harry at gmail.com> writes:

> On Oct 9, 4:52 pm, Peter Otten <__pete... at web.de> wrote:
>
>>
>> You might get more/better answers if you tell us more about the context of
>> the problem and add some details that may be relevant.
>>
>> Peter
>
> I am trying to determine if a wep page is updated by x number of
> characters..Mozilla firefox plugin 'update scanner' has a similar
> functionality ..A user can specify the x ..I think this would be done
> by reading from the same url at two different times and finding the
> change in body text..I was wondering if difflib could offer something
> in the way of determining the size of delta..

If you normalize the data, this might be worth trying.

Make all tags appear on one single line, possibly re-order attributes so
that they are in alphabetical order. Each text child git's also
normalized, by replacing all whitespace with a single space.

Then run difflib over these, and count the number of diffrences.


Diez



More information about the Python-list mailing list