[issue3590] sax.parser considers XML as text rather than bytes
report at bugs.python.org
Mon Aug 18 18:00:57 CEST 2008
Antoine Pitrou <pitrou at free.fr> added the comment:
> Just to be clear, I am at present totally confused about io streams :-)
Python 3.0 distincts more clearly between unicode strings (called "str"
in 3.0) and bytes strings (called "bytes" in 3.0). The most important
point being that there is no more any implicit conversion between the
two: you must explicitly use .encode() or .decode().
Files opened in binary ("rb") mode returns byte strings, but files
opened in text ("r") mode return unicode strings, which means you can't
give a text file to 3.0 library expecting a binary file, or vice-versa.
What is more worrying is that XML, until decoded, should be considered a
byte stream, so sax.parser should accept binary files rather than text
files. I took a look at test_sax and indeed it considers XML as text
rather than bytes :-(
Bumping this as critical because it needs a decision very soon (ideally
priority: -> critical
title: sax.parser hangs on byte streams -> sax.parser considers XML as text rather than bytes
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list