
Hi Marius I think you gave me the right answer. Thx a ton Enviado desde mi iPhone
El 12 ene 2018, a las 20:05, Marius Gedminas <marius@gedmin.as> escribió:
On Wed, Jan 10, 2018 at 08:21:47AM +0100, Pedro Andres Aranda Gutierrez wrote: I have a program written in Python3 that uses lxml. It parses Web pages and creates ebooks out ot them. This is quite handy when transferring bigger manuals to my eReader. If I start from a local copy (i.e. all the files in my harddisk), it works flawlessly - after a lot of tries related to Unicode :-)
But when I try to follow the document structure from a life server (i.e. download using http) it will fail.
I'm in Python3 and using the following libs:
from http.client import HTTPConnection,HTTPSConnection import urllib.request, urllib.error, urllib.parse from urllib.parse import urlparse, urlsplit, urljoin
The main magic is performed with connection.getresponse() and response.read(), response.status and response.data
The error is always:
Error reading file '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "[1]http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
This looks like you're passing the XML contents to lxml.html.parse() instead of calling lxml.html.fromstring()?
Can you show us the actual code?
Marius Gedminas -- Committee, n.: A group of men who individually can do nothing but as a group decide that nothing can be done. -- Fred Allen _________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml