Unicode decoding is not supported on this platform with python3.3 debug
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working... Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.
for line in x: ... print(x)
[126070 refs]
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]
any idea on what is causing this and how i can fix it? ~mark
Mark Grandi, 09.12.2012 23:17:
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...
Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.
for line in x: ... print(x)
[126070 refs]
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]
any idea on what is causing this and how i can fix it?
How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building. Here's how lxml figures out how to parse Unicode strings: https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s... Stefan
Well, the thing is, it seems to be a bug with either lxml or libxml2 with
python3. As lxml works fine both on the release and debug builds of python
3.2, but on python3.3,neither works.
I looked at the source for parser.pxi, and its basically just comes down to
libxml2 can't find a suitable encoding?
if enchandler is not NULL:
global _UNICODE_ENCODING
tree.xmlCharEncCloseFunc(enchandler)
_UNICODE_ENCODING = enc
and
py_buffer_len = python.PyBytes_GET_SIZE(data)
elif python.PyUnicode_Check(data):
if _UNICODE_ENCODING is NULL:
raise ParserError, \
u"Unicode parsing is not supported on this platform"
On Thu, Dec 13, 2012 at 11:00 PM, Stefan Behnel
Mark Grandi, 09.12.2012 23:17:
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...
Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.
for line in x: ... print(x)
[126070 refs]
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]
any idea on what is causing this and how i can fix it?
How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building.
Here's how lxml figures out how to parse Unicode strings:
https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s...
Stefan
_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml
Hi, please don't top-post. Mark Grandi, 14.12.2012 07:29:
On Thu, Dec 13, 2012 at 11:00 PM, Stefan Behnel wrote:
Mark Grandi, 09.12.2012 23:17:
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...
Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.
for line in x: ... print(x)
[126070 refs]
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]
any idea on what is causing this and how i can fix it?
How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building.
Well, the thing is, it seems to be a bug with either lxml or libxml2 with python3. As lxml works fine both on the release and debug builds of python 3.2, but on python3.3,neither works.
That's because the way Unicode works has changed in Py3.3. So, again: how is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.
Here's how lxml figures out how to parse Unicode strings:
https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s...
I looked at the source for parser.pxi, and its basically just comes down to libxml2 can't find a suitable encoding?
if enchandler is not NULL: global _UNICODE_ENCODING tree.xmlCharEncCloseFunc(enchandler) _UNICODE_ENCODING = enc
and
py_buffer_len = python.PyBytes_GET_SIZE(data) elif python.PyUnicode_Check(data): if _UNICODE_ENCODING is NULL: raise ParserError, \ u"Unicode parsing is not supported on this platform"
Correct. Stefan
That's because the way Unicode works has changed in Py3.3. So, again: how is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.
I was using the http://www.python.org builds both times, so if anything changed, then whatever build settings that python.org is using to build their mac os x binaries changed. On my mac, cpp -dM then ctrl+d says that "#define __WCHAR_MAX__ 2147483647", so wchar_t is 4 bytes. I also printed sys.maxunicode on both the python.org build of python3.3, and my own build of python3.2.3 (default settings): Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys sys.maxunicode 1114111
Python 3.2.3 (default, Aug 28 2012, 06:42:49) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys [56239 refs] sys.maxunicode 65535
So it seems something is going wrong when python is using a 4 bytes for unicode? ~mark
On Mon, Dec 17, 2012 at 6:27 PM, Mark Grandi
That's because the way Unicode works has changed in Py3.3. So, again: how
is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.
I was using the http://www.python.org builds both times, so if anything changed, then whatever build settings that python.org is using to build their mac os x binaries changed.
On my mac, cpp -dM then ctrl+d says that "#define __WCHAR_MAX__ 2147483647", so wchar_t is 4 bytes.
I also printed sys.maxunicode on both the python.org build of python3.3, and my own build of python3.2.3 (default settings):
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys sys.maxunicode 1114111
Python 3.2.3 (default, Aug 28 2012, 06:42:49) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin Type "help", "copyright", "credits" or "license" for more information.
import sys [56239 refs] sys.maxunicode 65535
So it seems something is going wrong when python is using a 4 bytes for unicode?
~mark
Hello, any update on this? ~Mark
participants (2)
-
Mark Grandi
-
Stefan Behnel