ANN: WebVal 1.0 -- URL scanner, maintainer, and validator
Erik Max Francis
max at alcyone.com
Sun Aug 18 20:41:30 EDT 2002
Summary
webval is a system that will scan documents for fully-qualified
HTTP URLs, keeping its database fresh with newly-seen URLs. It
can then be requested to validate the URLs, whereby it will
attempt to access each URL via an HTTP request and record the
response code; it maintains a list of the most recent codes that
have been retrieved. Response codes are classified as "good" (URL
is correct and a valid page is there) and "bad" (URL is invalid or
outdated). By default any code other than a 2xx code is
considered bad, but this can be changed (*e.g.*, to ignore 3xx
redirection codes).
webval can then be used in report mode where it will scan
documents for URLs as before, but will report invalid URLs (that
is, URLs in the database which have a number of "bad" codes
exceeding a certain threshhold). These are then printed to stderr
in a format that shows the file and line number the URLs were seen
in so that they can be corrected.
webval's reporting output is designed to be GNU make friendly; the
database itself is a simple text file, containing one record per
line, which can be easily grepped and manipulated manually.
Getting the software
The current version of webval is 1.0.
The latest version of the software is available in a tarball here:
http://www.alcyone.com/pyos/webval/webval-latest.tar.gz.
The official URL for this Web site is
http://www.alcyone.com/pyos/webval/.
Requirements
Python 2.2 or greater is required. Threading is used by the
validator, so Python must be configured with threads.
License
This code is released under the GPL.
...
--
Erik Max Francis / max at alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ There is nothing so subject to the inconstancy of fortune as war.
\__/ Miguel de Cervantes
Church / http://www.alcyone.com/pyos/church/
A lambda calculus explorer in Python.
More information about the Python-list
mailing list