[XML-SIG] xml / html parsing for web

Bastian Kleineidam calvin@cs.uni-sb.de
Tue, 12 Dec 2000 16:35:20 +0100 (CET)


Kent,

> contain a very smart regular expression to parse almost all links. What
> I found missing is a javascript driven or form driven links : some site
> have <option .... value="link1"...
> Which linkchecker can not follow.
Yes. In general you can not tell if the option "value" is a link or if it
is just some data. The same is with Javascript. I can construct links out
of many parts:
<script>
mybase = "mydata/sub1"
if browser=="IE" {
   url = mybase+"/ieblubb.html"
else {
   url = mybase+"/netscapeblubb.html"
}
</script>
It is difficult to extract such dynamic urls.

> Moreover, I would like to extract the form data and link them with
> labels found on the page. Associating the link with the hot text or
> image. Which linkchecker can not. 
Yes, its the same.

Generally I think you can not always extract dynamic URLs out of forms or
Javascript because you never know if they are really URLs or just data.

Bastian