[lxml-dev] Failure to open http urls on Windows

Hi, I'm seeing an IOError when resolving http urls on Windows (it works fine on Mac OS X):
from lxml import etree etree.parse('http://codespeak.net/', parser=etree.HTMLParser()) Traceback (most recent call last): File "<console>", line 1, in ? File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493) IOError: Error reading file 'http://codespeak.net/': failed to load external entity "http://codespeak.net/"

I've encountered the same problem. As far as I can tell, the problem lies with the libxml2 version that is used. This bug apparently is fixed in the libxml2 2.7.7 (from the release notes: "598785 Fix nanohttp on Windows"), but there are no windows binaries available for that version yet. Apparently the bug was introduced in libxml2 2.7.4 (with "59501 avoid select and use poll for nanohttp"?) The compiled windows versions of lxml seem to be affected from lxml 2.3 and up. I've tried building later versions of lxml with older libxml2 (using the instructions from the lxml website), and this seems to work. So I'm pretty confident that I'm right about the cause of the problem. I wouldn't know how to compile libxml2 itself though. Hope this helps, or at least clarifies the issue. greetings, Steven 2010/5/25 Laurence Rowe <l@lrowe.co.uk>:
Hi,
I'm seeing an IOError when resolving http urls on Windows (it works fine on Mac OS X):
from lxml import etree etree.parse('http://codespeak.net/', parser=etree.HTMLParser()) Traceback (most recent call last): File "<console>", line 1, in ? File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493) IOError: Error reading file 'http://codespeak.net/': failed to load external entity "http://codespeak.net/"
lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev

Thanks Steven. Which compiler do you use for building lxml on Windows? (I didn't have any luck with MinGW.) Laurence On 25 May 2010 13:27, Steven Vereecken <steven.vereecken@gmail.com> wrote:
I've encountered the same problem. As far as I can tell, the problem lies with the libxml2 version that is used. This bug apparently is fixed in the libxml2 2.7.7 (from the release notes: "598785 Fix nanohttp on Windows"), but there are no windows binaries available for that version yet. Apparently the bug was introduced in libxml2 2.7.4 (with "59501 avoid select and use poll for nanohttp"?)
The compiled windows versions of lxml seem to be affected from lxml 2.3 and up. I've tried building later versions of lxml with older libxml2 (using the instructions from the lxml website), and this seems to work. So I'm pretty confident that I'm right about the cause of the problem. I wouldn't know how to compile libxml2 itself though.
Hope this helps, or at least clarifies the issue.
greetings,
Steven
2010/5/25 Laurence Rowe <l@lrowe.co.uk>:
Hi,
I'm seeing an IOError when resolving http urls on Windows (it works fine on Mac OS X):
from lxml import etree etree.parse('http://codespeak.net/', parser=etree.HTMLParser()) Traceback (most recent call last): File "<console>", line 1, in ? File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493) IOError: Error reading file 'http://codespeak.net/': failed to load external entity "http://codespeak.net/"
lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev

This is only peripherally related, but XInclude hasn't worked for me (as in dumps core) with lxml since upgrading to libxml2 2.7.4. If the windows binaries get rebuilt, might be two good reasons to link against 2.7.3. Haven't filed a bug report yet, as it's not a critical part of my system so I haven't yet been able to free up time to come up with a good test case. On Tue, 2010-05-25 at 14:27 +0200, Steven Vereecken wrote:
I've encountered the same problem. As far as I can tell, the problem lies with the libxml2 version that is used. This bug apparently is fixed in the libxml2 2.7.7 (from the release notes: "598785 Fix nanohttp on Windows"), but there are no windows binaries available for that version yet. Apparently the bug was introduced in libxml2 2.7.4 (with "59501 avoid select and use poll for nanohttp"?)
The compiled windows versions of lxml seem to be affected from lxml 2.3 and up. I've tried building later versions of lxml with older libxml2 (using the instructions from the lxml website), and this seems to work. So I'm pretty confident that I'm right about the cause of the problem. I wouldn't know how to compile libxml2 itself though.
Hope this helps, or at least clarifies the issue.
greetings,
Steven
2010/5/25 Laurence Rowe <l@lrowe.co.uk>:
Hi,
I'm seeing an IOError when resolving http urls on Windows (it works fine on Mac OS X):
from lxml import etree etree.parse('http://codespeak.net/', parser=etree.HTMLParser()) Traceback (most recent call last): File "<console>", line 1, in ? File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 563, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64493) IOError: Error reading file 'http://codespeak.net/': failed to load external entity "http://codespeak.net/"
lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev
_______________________________________________ lxml-dev mailing list lxml-dev@codespeak.net http://codespeak.net/mailman/listinfo/lxml-dev -- John Krukoff <jkrukoff@ltgc.com> Land Title Guarantee Company

John Krukoff, 26.05.2010 01:01:
This is only peripherally related, but XInclude hasn't worked for me (as in dumps core) with lxml since upgrading to libxml2 2.7.4. If the windows binaries get rebuilt, might be two good reasons to link against 2.7.3.
The next binaries will hopefully use libxml2 2.7.7. No idea if this is fixed there, though. Stefan

On Wed, 2010-05-26 at 08:27 +0200, Stefan Behnel wrote:
John Krukoff, 26.05.2010 01:01:
This is only peripherally related, but XInclude hasn't worked for me (as in dumps core) with lxml since upgrading to libxml2 2.7.4. If the windows binaries get rebuilt, might be two good reasons to link against 2.7.3.
The next binaries will hopefully use libxml2 2.7.7. No idea if this is fixed there, though.
Stefan
I've been able to test against 2.7.4, 2.7.6 and 2.7.7, and XInclude is broken on all of them for me. Same code worked on 2.7.3 (and same version of lxml across 2.7.3 to 2.7.4 upgrade), so I'm pretty sure that's the magic version. I suspect the changes made for this bug fix, since it's effecting the right part of the code, and the timing is right: https://bugzilla.gnome.org/show_bug.cgi?id=584220 But like I said, mostly I need to come up with a simple test case so I can report it to the libxml2 folks. -- John Krukoff <jkrukoff@ltgc.com> Land Title Guarantee Company
participants (4)
-
John Krukoff
-
Laurence Rowe
-
Stefan Behnel
-
Steven Vereecken