[Nevow] json serializer and strings
Hi. Why the json serializer does not support plain strings? I'm having problems because I want to serialize keywords arguments and the dictionary keys are str objects, not unicode. There are some problems with this: elif isinstance(obj, str): w('"') w(stringEncode(obj.decode("us-ascii")) w('"') ? I can use simplejson, but I don't want to add too many dependencies. Thanks Manlio Perillo
On Sat, 09 Sep 2006 20:08:16 +0200, Manlio Perillo <manlio_perillo@libero.it> wrote:
Hi.
Why the json serializer does not support plain strings?
I'm having problems because I want to serialize keywords arguments and the dictionary keys are str objects, not unicode.
There are some problems with this:
elif isinstance(obj, str): w('"') w(stringEncode(obj.decode("us-ascii")) w('"') ?
Yes. What if it is not an ASCII string? If you know that your strings are ASCII strings, decode them before you give them to Athena.
I can use simplejson, but I don't want to add too many dependencies.
Thanks Manlio Perillo
Jean-Paul
Jean-Paul Calderone ha scritto:
On Sat, 09 Sep 2006 20:08:16 +0200, Manlio Perillo <manlio_perillo@libero.it> wrote:
Hi.
Why the json serializer does not support plain strings?
I'm having problems because I want to serialize keywords arguments and the dictionary keys are str objects, not unicode.
There are some problems with this:
elif isinstance(obj, str): w('"') w(stringEncode(obj.decode("us-ascii")) w('"') ?
Yes. What if it is not an ASCII string?
Raise an arror? Is this really a problem? Unfortunalety(?) plain strings are widely used in the CPython implementation (keywords arguments stores keys as str not unicode).
If you know that your strings are ASCII strings, decode them before you give them to Athena.
I'm not using json for Athena, just to serialize a dictionary (keywords arguments) in JSON format. Regards Manlio Perillo
On Sat, 09 Sep 2006 22:40:53 +0200, Manlio Perillo <manlio_perillo@libero.it> wrote:
Jean-Paul Calderone ha scritto:
On Sat, 09 Sep 2006 20:08:16 +0200, Manlio Perillo <manlio_perillo@libero.it> wrote:
Hi.
Why the json serializer does not support plain strings?
I'm having problems because I want to serialize keywords arguments and the dictionary keys are str objects, not unicode.
There are some problems with this:
elif isinstance(obj, str): w('"') w(stringEncode(obj.decode("us-ascii")) w('"') ?
Yes. What if it is not an ASCII string?
Raise an arror? Is this really a problem?
Yes.
Unfortunalety(?) plain strings are widely used in the CPython implementation (keywords arguments stores keys as str not unicode).
If you know that your strings are ASCII strings, decode them before you give them to Athena.
I'm not using json for Athena, just to serialize a dictionary (keywords arguments) in JSON format.
You should probably use another json library, then. nevow.json is primarily a support library for athena. Jean-Paul
On 9/9/06, Manlio Perillo <manlio_perillo@libero.it> wrote:
There are some problems with this:
elif isinstance(obj, str): w('"') w(stringEncode(obj.decode("us-ascii")) w('"') ?
Yes. What if it is not an ASCII string?
Raise an arror? Is this really a problem?
Yes. Yes it is. Javascript strings are unicode. Therefore the implementation must be able to convert the encoded string (byte representation) into Unicode when it arrives. In order to convert the parameter to unicode, the API has to know what encoding the original string was; or it must have it in Unicode form already. If the API accepts 8-bit str objects, then it must guess at the encoding to produce a unicode object. It will guess wrong very often, which leads to bugs. Therefore, it does not accept 8-bit str objects. You must provide Unicode objects to the API so that it does not have to guess. The errors you get are essentially the API telling you "I refuse to guess." It forces the programmer to tell the API what encoding the original string had; the way you answer it is by decoding it yourself with the right encoding argument into a Unicode object. http://gedcom-parse.sourceforge.net/doc/encoding.html C
Cory Dodt ha scritto:
On 9/9/06, *Manlio Perillo* <manlio_perillo@libero.it <mailto:manlio_perillo@libero.it>> wrote:
>> There are some problems with this: >> >> elif isinstance(obj, str): >> w('"') >> w(stringEncode(obj.decode("us-ascii")) >> w('"') >> ? > > Yes. What if it is not an ASCII string?
Raise an arror? Is this really a problem?
Yes. Yes it is. Javascript strings are unicode. Therefore the implementation must be able to convert the encoded string (byte representation) into Unicode when it arrives.
In order to convert the parameter to unicode, the API has to know what encoding the original string was; or it must have it in Unicode form already. If the API accepts 8-bit str objects, then it must guess at the encoding to produce a unicode object. It will guess wrong very often, which leads to bugs. Therefore, it does not accept 8-bit str objects.
I don't agree, but that's not a problem; I just can decode the keywords dictionary by myself (or use another library). However you say: "it does not accept 8-bit str objects". Well, all I can say is that by doing obj.decode("us-ascii") we are accepting 7-bit str objects! If you like: try: obj.decode("us-ascii") except UnicodeDecodeError: raise ValueError("8-bit strings not supported") This implementation is just a little more friendly since str objects are the default in CPython:
def foo(**kwargs): print type(kwargs.keys()[0])
foo(a=1) <type 'str'>
Regards Manlio Perillo
participants (3)
-
Cory Dodt
-
Jean-Paul Calderone
-
Manlio Perillo