ANN: WebVal 1.0 -- URL scanner, maintainer, and validator

Erik Max Francis max at
Mon Aug 19 02:41:30 CEST 2002


    webval is a system that will scan documents for fully-qualified
    HTTP URLs, keeping its database fresh with newly-seen URLs.  It
    can then be requested to validate the URLs, whereby it will
    attempt to access each URL via an HTTP request and record the
    response code; it maintains a list of the most recent codes that
    have been retrieved.  Response codes are classified as "good" (URL
    is correct and a valid page is there) and "bad" (URL is invalid or
    outdated).  By default any code other than a 2xx code is
    considered bad, but this can be changed (*e.g.*, to ignore 3xx
    redirection codes).

    webval can then be used in report mode where it will scan
    documents for URLs as before, but will report invalid URLs (that
    is, URLs in the database which have a number of "bad" codes
    exceeding a certain threshhold).  These are then printed to stderr
    in a format that shows the file and line number the URLs were seen
    in so that they can be corrected.

    webval's reporting output is designed to be GNU make friendly; the
    database itself is a simple text file, containing one record per
    line, which can be easily grepped and manipulated manually.

Getting the software

    The current version of webval is 1.0.

    The latest version of the software is available in a tarball here:

    The official URL for this Web site is


    Python 2.2 or greater is required.  Threading is used by the
    validator, so Python must be configured with threads.


    This code is released under the GPL.


 Erik Max Francis / max at /
 __ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/  \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
    Church /
 A lambda calculus explorer in Python.

More information about the Python-list mailing list