[issue11489] json.dumps not parsable by json.loads (on Linux only)

Mon Mar 14 21:21:07 CET 2011

Brian <brian at merrells.org> added the comment:

On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger
<report at bugs.python.org>wrote:

>
> Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:
>
> > We seem to be in the worst of both worlds right now
> > as I've generated and stored a lot of json that can
> > not be read back in
>
> This is unfortunate.  The dumps() should have never worked in the first
> place.
>
> I don't think that loads() should be changed to accommodate the dumps()
> error though.  JSON is UTF-8 by definition and it is a useful feature that
> invalid UTF-8 won't load.
>

I may be wrong but it appeared that json actually encoded the data as the
string "u\da00" ie (6-bytes) which is slightly different than the encoding
of the utf-8 encoding of the json itself.  Not sure if this is relevant but
it seems less severe than actually invalid utf-8 coding in the bytes.

Unfortunately I don't believe this does anything on python 2.x as only
python 3.x encode/decode flags this as invalid.

> ----------
> nosy: +rhettinger
> priority: normal -> high
>
> _______________________________________
> Python tracker <report at bugs.python.org>
> <http://bugs.python.org/issue11489>
> _______________________________________
>

----------
nosy: +merrellb
Added file: http://bugs.python.org/file21135/unnamed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11489>
_______________________________________
-------------- next part --------------
On Mon, Mar 14, 2011 at 4:09 PM, Raymond Hettinger <span dir="ltr">&lt;<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>&gt;</span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<br>
Raymond Hettinger &lt;<a href="mailto:rhettinger at users.sourceforge.net">rhettinger at users.sourceforge.net</a>&gt; added the comment:<br>
<div class="im"><br>
&gt; We seem to be in the worst of both worlds right now<br>
&gt; as I&#39;ve generated and stored a lot of json that can<br>
&gt; not be read back in<br>
<br>
</div>This is unfortunate. Â The dumps() should have never worked in the first place.<br>
<br>
I don&#39;t think that loads() should be changed to accommodate the dumps() error though. Â JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won&#39;t load.<br></blockquote><div>Â </div><div>I may be wrong but it appeared that json actually encoded the data as the string &quot;u\da00&quot; ie (6-bytes) which is slightly different than the encoding of the utf-8 encoding of the json itself. Â Not sure if this is relevant but it seems less severe than actually invalid utf-8 coding in the bytes.</div>

<div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
To fix the data you&#39;ve already created (one that other compliant JSON readers wouldn&#39;t be able to parse), I think you need to repreprocess those file to make them valid:<br>
<br>
 Â  bs.decode(&#39;utf-8&#39;, errors=&#39;ignore&#39;).encode(&#39;utf-8&#39;)<br></blockquote><div>Unfortunately I don&#39;t believe this does anything on python 2.x as only python 3.x encode/decode flags this as invalid.</div>

<div><br></div><div>Â </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
----------<br>
nosy: +rhettinger<br>
priority: normal -&gt; high<br>
<div><div></div><div class="h5"><br>
_______________________________________<br>
Python tracker &lt;<a href="mailto:report at bugs.python.org">report at bugs.python.org</a>&gt;<br>
&lt;<a href="http://bugs.python.org/issue11489" target="_blank">http://bugs.python.org/issue11489</a>&gt;<br>
_______________________________________<br>
</div></div></blockquote></div><br>