[Tutor] joining Indic and English Text

Peter Otten __peter__ at web.de
Mon Jun 26 03:35:22 EDT 2017


Evuraan wrote:

> Greetings!
> 
> I've a case where I need to put lines with both Indic and English Text
> to a file ( and optionally to stdout):

With...

> What am I doing wrong? My locale and LANG (en_US.UTF-8) etc seem to be
> setup.

> When I attempt this in python3:

...that...

> ml_text = u"മലയാളം"
> en_text = "Malayalam"
> print("{} = {}".format(ml_text, en_text))

...should work.

> I sometimes (not always, that's the strange part for now..) get errors
> like: UnicodeEncodeError: 'ascii' codec can't encode character '\u0d2b' in
> position 42: ordinal not in range(128)

I'm unable to guess how that happened. When you encounter that error again, 
can you post the exact code that triggered it and how exactly you invoked 
it? That will increase our chance to find the source of the problem.

Did you perhaps the PYTHONIOENCODING environment variable?

$ PYTHONIOENCODING=ascii python3.5 -c 'print("äöü")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)

But that's close to breaking things intentionally...

> Searches on that error seem to suggest an .encode('utf-8),
> 
> print("{} = {}".format(ml_text.encode("utf-8"), en_text))
> 
> I am afraid that would munge up my output line as :
> 
b'\xe0\xb4\xae\xe0\xb4\xb2\xe0\xb4\xaf\xe0\xb4\xbe\xe0\xb4\xb3\xe0\xb4\x82'
> = Malayalam, instead of the desired:
> മലയാളം = Malayalam

You have to encode the whole text, not just the non-ascii part of it and to 
write into a file opened in binary mode.

However, I'd refrain from encoding manually when Python's default handling 
of text should work out of the box.



More information about the Tutor mailing list