[issue11489] json.dumps not parsable by json.loads (on Linux only)

Raymond Hettinger report at bugs.python.org
Mon Mar 14 21:09:35 CET 2011


Raymond Hettinger <rhettinger at users.sourceforge.net> added the comment:

> We seem to be in the worst of both worlds right now 
> as I've generated and stored a lot of json that can 
> not be read back in

This is unfortunate.  The dumps() should have never worked in the first place.

I don't think that loads() should be changed to accommodate the dumps() error though.  JSON is UTF-8 by definition and it is a useful feature that invalid UTF-8 won't load.

To fix the data you've already created (one that other compliant JSON readers wouldn't be able to parse), I think you need to repreprocess those file to make them valid:

   bs.decode('utf-8', errors='ignore').encode('utf-8')

Then we need to fix dumps so that it doesn't silently create invalid JSON.

> This on the other hand should probably be 
> fixed by either rejecting lone surrogates 
> in json.dumps or accepting them in json.loads or both.

Rejection is the right way to go.  For the most part,
it is never helpful to create invalid JSON files that
other readers can't and shouldn't read.

----------
nosy: +rhettinger
priority: normal -> high

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue11489>
_______________________________________


More information about the Python-bugs-list mailing list