New subject: What is the correct way to avoid the W3C DTD _not_ to be served

10 Jan 2018

      Hi folks

I have a program written in Python3 that uses lxml. It parses Web pages and
creates ebooks out ot them. This is quite handy when transferring bigger
manuals to my eReader. If I start from a local copy (i.e. all the files in
my harddisk), it works flawlessly - after a lot of tries related to Unicode
:-)

But when I try to follow the document structure from a life server (i.e.
download using http) it will fail.

I'm in Python3 and using the following libs:

from http.client     import HTTPConnection,HTTPSConnection
import urllib.request, urllib.error, urllib.parse
from urllib.parse    import urlparse, urlsplit, urljoin

The main magic is performed with connection.getresponse() and
response.read(), response.status and response.data

The error is always:

Error reading file '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Thanks for any help
-- 
Fragen sind nicht da um beantwortet zu werden,
Fragen sind da um gestellet zu werden
Georg Kreisler

What is the correct way to avoid the W3C DTD _not_ to be served

Pedro Andres Aranda Gutierrez

Stefan Behnel

Marius Gedminas

Pedro Andres Aranda Gutierrez

tags

participants (3)