
Pedro Andres Aranda Gutierrez schrieb am 10.01.2018 um 08:21:
I have a program written in Python3 that uses lxml. It parses Web pages and creates ebooks out ot them. This is quite handy when transferring bigger manuals to my eReader. If I start from a local copy (i.e. all the files in my harddisk), it works flawlessly - after a lot of tries related to Unicode :-)
But when I try to follow the document structure from a life server (i.e. download using http) it will fail.
I'm in Python3 and using the following libs:
from http.client import HTTPConnection,HTTPSConnection import urllib.request, urllib.error, urllib.parse from urllib.parse import urlparse, urlsplit, urljoin
The main magic is performed with connection.getresponse() and response.read(), response.status and response.data
The error is always:
Error reading file '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
That's not a file path, so I'll assume that you didn't copy the error message exactly. My guess is that your server lacks XML catalogue files. http://xmlsoft.org/catalog.html https://stackoverflow.com/questions/7228583/python-lxml-catalog-lookup BTW, lxml does not currently provide an API for the catalogue support in libxml2 (e.g. extending or directly querying it). Should be easy to implement, though. Pull request welcome. Stefan