I think that launchpad ticket is what I need to understand the issue better! Great :) I will look into it in the weekend.
I did try setting other xml catalogs. And I did manage to set up a catalog and local files for my use case such that nothing is downloaded from the Internet. So that's not my mission right now.
But in my first version I _thought_ I managed to change everything so nothing was downloaded. But in fact two files were downloaded from
w3c.com. There was no noticeable delay so everything seemed fine. Until one point, when a bunch of files were validated in succession. After around 20-60 successful validations, the rest would fail. w3c has some kind of filter/firewall. If you download the resources in rapid succession (e.g. roughly 1 per second, for 10-40 seconds) it will start rejecting requests. It only takes 5-20 seconds for the firewall to forgive you and let you download again. This meant that I got some random / intermittent failures.
Thats why I want to _know_ that I have disabled networking. So that any error with incorrectly set up catalog will give an error now, and not later.
The above happened with xmllint. With lxml I can load the schema once and use it for validating hundreds of xml files, so I can easily circumvent the w3c filter. But in any case, I would like to set up my lxml code such that any attempt to download resources will result in an error now, and not when that resource is one day unavailable :)
Thanks for helping out!