[I18n-sig] Re: gettext in the standard library

04 Sep 2000 20:44:12 -0400

[Martin v. Loewis]

> > Python takes care of what needs care, anyway.
> No, it doesn't. It will in some cases, but won't in others.

> > It should be fairly transparent to the programmer, and our API
> > should be just as transparent.  Shouldn't it?
> It should, but I feel it isn't.

OK.  My good prejudice for Unicode support in Python was a bit exaggerated,
then.

> >>> header = '\xFF\x01'
> >>> body   = u'warning'
> >>> message = header + body
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> UnicodeError: ASCII decoding error: ordinal not in range(128)

> Is that proper?

Sounds proper to me.

> Is it what the user expected?  If not, how should the user modify her
> code so it does what she wanted?

I do not know what the user wanted, so I cannot say how to modify the code.
If she wants to play with bits and bytes, rather than strings, she would
have to make explicit the conversions she wants.  Python cannot guess them.

> I suggest you play around with the Unicode type somewhat before
> recommending that API functions should blindly return it...

Oh, I should surely read and try a lot more before saying anything.
I was invited in this discussion only recently.  Today as a deadline
was not giving me enough time to as careful as I usually like to be.
So, I merely tried contributing my best given the circumstances, with my
limited experience and knowledge.  I think it was better that I risk a few
suggestions and opinions, than stay silent and regret having said nothing.
I hope having been a bit useful, somewhat, despite all the noise I made :-).

> So I'd rather not return a Unicode string representing an error message
> from gettext: the user expecting an error message may be surprised about
> the totally unrelated UnicodeError.

I would have hoped that one could merely replace STRING by _(STRING), and
get a working program.  If I read you correctly, you say that it has more
chance to work _if_ we avoid the Unicode string route, and mimick what we
dumbly do in C.

Instead of:

    _ = locale.translator(DOMAIN)

could we have:

    _, _u = locale.translator(DOMAIN)

and use _(TEXT) or _u(TEXT) for the flat byte string out of the PO file,
or the string converted to a Unicode string from the PO `msgstr' encoding?
Or maybe:

    _, _e = locale.translator(DOMAIN)

with the above _u(TEXT) being rather written unicode(_(TEXT), _e) ?
Or maybe even:

    _, _e, _u = locale.translator(DOMAIN)

But I'm not sure I like any of these things.  Maybe nicer would be that
`_` is the class instance itself, with a __call__ method for implementing
_(TEXT).  One could then use _.charset or such to get then `msgstr'
encoding, and the convenience:

    _.unicode(TEXT)

would be equivalent to:

    unicode(_(TEXT), _.charset)

Better ideas?  I am still under the shock! :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard