Mailman 3 Unicode decoding is not supported on this platform with python3.3 debug - lxml - The Python XML Toolkit

Unicode decoding is not supported on this platform with python3.3 debug

Mark Grandi

9 Dec 2012 9 Dec '12

4:17 p.m.

Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working... Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.

...

...
...
for line in x: ... print(x)

[126070 refs]

...

...
...
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]

any idea on what is causing this and how i can fix it? ~mark

Attachments:

attachment.htm (text/html — 1.2 KB)

Show replies by date

Stefan Behnel

14 Dec 14 Dec

midnight

New subject: Unicode decoding is not supported on this platform with python3.3 debug

Mark Grandi, 09.12.2012 23:17:

...

Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...

Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.

...
...
...
for line in x: ... print(x)

[126070 refs]

...
...
...
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]

any idea on what is causing this and how i can fix it?

How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building. Here's how lxml figures out how to parse Unicode strings: https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s... Stefan

Mark Grandi

12:29 a.m.

New subject: Unicode decoding is not supported on this platform with python3.3 debug

Well, the thing is, it seems to be a bug with either lxml or libxml2 with python3. As lxml works fine both on the release and debug builds of python 3.2, but on python3.3,neither works. I looked at the source for parser.pxi, and its basically just comes down to libxml2 can't find a suitable encoding? if enchandler is not NULL: global _UNICODE_ENCODING tree.xmlCharEncCloseFunc(enchandler) _UNICODE_ENCODING = enc and py_buffer_len = python.PyBytes_GET_SIZE(data) elif python.PyUnicode_Check(data): if _UNICODE_ENCODING is NULL: raise ParserError, \ u"Unicode parsing is not supported on this platform" On Thu, Dec 13, 2012 at 11:00 PM, Stefan Behnel wrote:

...

Mark Grandi, 09.12.2012 23:17:

...
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...

Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.

...
...
...
for line in x: ... print(x)

[126070 refs]

...
...
...
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]

any idea on what is causing this and how i can fix it?

How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building.

Here's how lxml figures out how to parse Unicode strings:

https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s...

Stefan

_________________________________________________________________ Mailing list for the lxml Python XML toolkit - http://lxml.de/ lxml@lxml.de https://mailman-mail5.webfaction.com/listinfo/lxml

Stefan Behnel

12:40 a.m.

New subject: Unicode decoding is not supported on this platform with python3.3 debug

Hi, please don't top-post. Mark Grandi, 14.12.2012 07:29:

...

On Thu, Dec 13, 2012 at 11:00 PM, Stefan Behnel wrote:

...
Mark Grandi, 09.12.2012 23:17:

...
Im trying to debug a python problem and I built lxml using the pydebug compiled version of python3.3, but the XMLParser.feed() method isn't working...

Corvidae:Python-3.3.0 markgrandi$ ./python.exe Python 3.3.0 (default, Dec 9 2012, 14:01:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information.

...
...
...
for line in x: ... print(x)

[126070 refs]

...
...
...
for line in x: ... parser.feed(x.readline()) ... Traceback (most recent call last): File "<stdin>", line 2, in <module> File "parser.pxi", line 1105, in lxml.etree._FeedParser.feed (src/lxml/lxml.etree.c:87183) lxml.etree.ParserError: Unicode parsing is not supported on this platform [126026 refs]

any idea on what is causing this and how i can fix it?

How is wchar_t defined on your platform? That's what it's currently trying to parse (which is more of a missing feature - this could be more efficient in Py3.3 but currently isn't). Also, make sure you linked against libiconv when building.

Well, the thing is, it seems to be a bug with either lxml or libxml2 with python3. As lxml works fine both on the release and debug builds of python 3.2, but on python3.3,neither works.

That's because the way Unicode works has changed in Py3.3. So, again: how is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.

...

...
Here's how lxml figures out how to parse Unicode strings:

https://github.com/lxml/lxml/blob/7eca2bb4b704058c0430ded3d1c05ed418ac7223/s...

I looked at the source for parser.pxi, and its basically just comes down to libxml2 can't find a suitable encoding?

if enchandler is not NULL: global _UNICODE_ENCODING tree.xmlCharEncCloseFunc(enchandler) _UNICODE_ENCODING = enc

and

py_buffer_len = python.PyBytes_GET_SIZE(data) elif python.PyUnicode_Check(data): if _UNICODE_ENCODING is NULL: raise ParserError, \ u"Unicode parsing is not supported on this platform"

Correct. Stefan

Mark Grandi

17 Dec 17 Dec

7:27 p.m.

New subject: Unicode decoding is not supported on this platform with python3.3 debug

...

That's because the way Unicode works has changed in Py3.3. So, again: how is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.

I was using the http://www.python.org builds both times, so if anything changed, then whatever build settings that python.org is using to build their mac os x binaries changed. On my mac, cpp -dM then ctrl+d says that "#define __WCHAR_MAX__ 2147483647", so wchar_t is 4 bytes. I also printed sys.maxunicode on both the python.org build of python3.3, and my own build of python3.2.3 (default settings): Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import sys sys.maxunicode 1114111

Python 3.2.3 (default, Aug 28 2012, 06:42:49) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import sys [56239 refs] sys.maxunicode 65535

So it seems something is going wrong when python is using a 4 bytes for unicode? ~mark

Mark Grandi

28 Dec 28 Dec

3:09 p.m.

New subject: Unicode decoding is not supported on this platform with python3.3 debug

On Mon, Dec 17, 2012 at 6:27 PM, Mark Grandi wrote:

...

That's because the way Unicode works has changed in Py3.3. So, again: how

...
is wchar_t defined on your system? Is it two bytes or four bytes long? And are you using a two-bytes Unicode build of Py3.2 or a four-bytes one? I would guess that both are different on your system.

I was using the http://www.python.org builds both times, so if anything changed, then whatever build settings that python.org is using to build their mac os x binaries changed.

On my mac, cpp -dM then ctrl+d says that "#define __WCHAR_MAX__ 2147483647", so wchar_t is 4 bytes.

I also printed sys.maxunicode on both the python.org build of python3.3, and my own build of python3.2.3 (default settings):

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import sys sys.maxunicode 1114111

Python 3.2.3 (default, Aug 28 2012, 06:42:49) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)] on darwin Type "help", "copyright", "credits" or "license" for more information.

...
...
...
import sys [56239 refs] sys.maxunicode 65535

So it seems something is going wrong when python is using a 4 bytes for unicode?

~mark

Hello, any update on this? ~Mark

4136

Age (days ago)

4155

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Mark Grandi
Stefan Behnel