[issue11489] json.dumps not parsable by json.loads (on Linux only)
Brian
report at bugs.python.org
Mon Mar 14 21:21:07 CET 2011
Brian <brian at merrells.org> added the comment:
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger
<report at bugs.python.org>wrote:
>
> Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:
>
> > We seem to be in the worst of both worlds right now
> > as I've generated and stored a lot of json that can
> > not be read back in
>
> This is unfortunate. The dumps() should have never worked in the first
> place.
>
> I don't think that loads() should be changed to accommodate the dumps()
> error though. JSON is UTF-8 by definition and it is a useful feature that
> invalid UTF-8 won't load.
>
I may be wrong but it appeared that json actually encoded the data as the
string "u\da00" ie (6-bytes) which is slightly different than the encoding
of the utf-8 encoding of the json itself. Not sure if this is relevant but
it seems less severe than actually invalid utf-8 coding in the bytes.
Unfortunately I don't believe this does anything on python 2.x as only
python 3.x encode/decode flags this as invalid.
> ----------
> nosy: +rhettinger
> priority: normal -> high
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <http://bugs.python.org/issue11489>
> _______________________________________
>
----------
nosy: +merrellb
Added file: http://bugs.python.org/file21135/unnamed
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11489>
_______________________________________
-------------- next part --------------
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger <span dir="ltr"><<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Raymond Hettinger <<a href="mailto:rhettinger at users.sourceforge.net">rhettinger at users.sourceforge.net</a>> added the comment:<br>
<div class="im"><br>
> We seem to be in the worst of both worlds right now<br>
> as I've generated and stored a lot of json that can<br>
> not be read back in<br>
<br>
</div>This is unfortunate. Â The dumps() should have never worked in the first place.<br>
<br>
I don't think that loads() should be changed to accommodate the dumps() error though. Â JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load.<br></blockquote><div>Â </div><div>I may be wrong but it appeared that json actually encoded the data as the string "u\da00" ie (6-bytes) which is slightly different than the encoding of the utf-8 encoding of the json itself. Â Not sure if this is relevant but it seems less severe than actually invalid utf-8 coding in the bytes.</div>
<div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid:<br>
<br>
 bs.decode('utf-8', errors='ignore').encode('utf-8')<br></blockquote><div>Unfortunately I don't believe this does anything on python 2.x as only python 3.x encode/decode flags this as invalid.</div>
<div><br></div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
----------<br>
nosy: +rhettinger<br>
priority: normal -> high<br>
<div><div></div><div class="h5"><br>
_______________________________________<br>
Python tracker <<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>><br>
<<a href="http://bugs.python.org/issue11489" target="_blank">http://bugs.python.org/issue11489</a>><br>
_______________________________________<br>
</div></div></blockquote></div><br>
More information about the Python-bugs-list
mailing list