Hi folks

I have a program written in Python3 that uses lxml. It parses Web pages and creates ebooks out ot them. This is quite handy when transferring bigger manuals to my eReader. If I start from a local copy (i.e. all the files in my harddisk), it works flawlessly - after a lot of tries related to Unicode :-)

But when I try to follow the document structure from a life server (i.e. download using http) it will fail.

I'm in Python3 and using the following libs:


from http.client     import HTTPConnection,HTTPSConnection
import urllib.request, urllib.error, urllib.parse
from urllib.parse    import urlparse, urlsplit, urljoin

The main magic is performed with connection.getresponse() and response.read(), response.status and response.data

The error is always:

Error reading file '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Thanks for any help
--
Fragen sind nicht da um beantwortet zu werden,
Fragen sind da um gestellet zu werden
Georg Kreisler