On 9/9/06, Manlio Perillo <manlio_perillo@libero.it> wrote:

>> There are some problems with this:
>>
>>    elif isinstance(obj, str):
>>        w('"')
>>        w(stringEncode(obj.decode("us-ascii"))
>>        w('"')
>> ?
>
> Yes.  What if it is not an ASCII string?

Raise an arror?
Is this really a problem?


Yes.  Yes it is.  Javascript strings are unicode.  Therefore the implementation must be able to convert the encoded string (byte representation) into Unicode when it arrives.

In order to convert the parameter to unicode, the API has to know what encoding the original string was; or it must have it in Unicode form already.  If the API accepts 8-bit str objects, then it must guess at the encoding to produce a unicode object.  It will guess wrong very often, which leads to bugs.  Therefore, it does not accept 8-bit str objects.

You must provide Unicode objects to the API so that it does not have to guess.  The errors you get are essentially the API telling you "I refuse to guess."  It forces the programmer to tell the API what encoding the original string had; the way you answer it is by decoding it yourself with the right encoding argument into a Unicode object.

http://gedcom-parse.sourceforge.net/doc/encoding.html


C