[issue14176] Fix unicode literals

Sat Mar 3 12:03:40 CET 2012

Jean-Michel Fauth <wxjmfauth at gmail.com> added the comment:

2012/3/3 Terry J. Reedy <report at bugs.python.org>

>
> Terry J. Reedy <tjreedy at udel.edu> added the comment:
>
> > That would mean in Python 3, '...' works and u'...' will not work.
>
> You misunderstand the PEP: in 3.3, '...' and u'...' will be *exactly* the
> same. The only change is that the interpreter will ignore the u prefix
> instead of raising SyntaxError. It will be as if 'u' were not there. The
> only purpose is to let 2.x code run in 3.x without requiring the user to
> erase the 'u'.
>
> I can see how you could misunderstand and think that the 'u' prefix must
> have some meaning. But is does not. The addition is a bit controversial but
> Guido approved it with the expectation that it will encourage more
> conversion of 2.x libraries to run on 3.3. In any case, the tracker is not
> the place for further discussion of the value of the PEP.
>
> > Once again, an *illustration* with IDLE / Py2.
> ...
> > Of course, this is actually a no problem with Py 3.
> ...
> > It still remains that this is a serious problem on Py 2.
>
> We are painfully aware that 2.x has problems with unicode. You do not need
> to tell us. I believe that most of the problems that could be sensibly
> fixed in 2.x have been fixed. 3.0 fixed more problems by changing the
> language. 3.3 fixes still more problems by changing the internal
> implementation of unicode, along with the C api, and the meaning of the
> language on some systems. People who want to avoid all the problems that
> have been fixed should use 3.3 either from the repository or when it is
> released.
>
> > So, if this (u'...') works in Py 3.3, the problem can
> be considered as "solved".
>
> I am glad you agree and I will close the issue.
>
>

Preliminary remark. I'm sending this via gmail, so it
may happen the glyphs you see are illformed or
transfomred by Google. Be ensured I'm typing the
"right" glyphs.

No, no and no. This is not a tkinter issue. This
"strange" behaviour, I do not find a better word,
happens with many libraries, can be Python core libs
or external libs.
To tell you the truth and dispite my experience,
I never succeeded to narrow excatly the problem.
In Python 2 sometimes, understand with some pieces
of code / software, it "works" and somtimes it
simply does not. The libs used here a just the
first ones, that came to my mind.

-----

wxPython 2.8-ansi build.

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\py\shell.py", line
1242, in writeOut
    self.write(text)
  File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\py\shell.py", line
1000, in write
    self.AddText(text)
  File "c:\python27\lib\site-packages\wx-2.8-msw-ansi\wx\stc.py", line
1425, in AddText
    return _stc.StyledTextCtrl_AddText(*args, **kwargs)
  File "c:\python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position
4-5: character maps to <undefined>

abcéœ€
>>>

----

PySide, passing "unicode" to a text widdget.

Passing u'abcéœ€' works.
Passing unicode('abcéœ€', 'cp1252') works.
Passing 'abcé€œ' doesn't !  'œ€' are missing.

---

My interactive wx interpreter using wxPython. Strings
as frame title.

True

ok

Traceback (most recent call last):
  File "<psi last command>", line 1, in <module>
  File
"c:\Python27\lib\site-packages\wx-2.8-msw-ansi\wx\_windows.py",
line 505, in __init__
    _windows_.Frame_swiginit(self,_windows_.new_Frame(*args,
**kwargs))
  File "c:\Python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in
position 5-6: character maps to <undefined>

True

ok

---

And so on with many libs.

You may argue that these libs are guilty.

I may argue that Python is somehow guilty, because it
let users write non working code.
And practically in all the cases, the main problem is due
to the usage of unicode literals.

Just to show you, I'm quite comfortable with all this
coding stuff. The results my interactive intepreter.
Special hack, unfortunatelly non portable, works
only with Windows and cp1252.

abcé??
>>> unicode('abcéœ€', sys.stdout.encoding)
abcéœ€
>>> print u'abcéœ€'
abcé??
>>> print unicode('abcéœ€', sys.stdout.encoding)
abcéœ€

As I am aware of this "feature", all my code is
perfectly working. I'm paying attention to the
necessity of the usage of u'...' or unicode(...).
Unfortunatelly, this not a general case in a lot of
code I see, supposed to deal with texts.

To draw a conclusion.

You are wise enough to understand that, when I'm
saying "Python just does not work", I'm unforunatelly
not so far away form the reality.

I really, very really, expect all this mess (sorry
for the word) will not reappear in Py 3.3.

Let's wait.

'abcéœ€'
>>> print('abcéœ€')
abcéœ€
>>>

Regards,
Jean-Michel Fauth

PS The u() trick does not help.

----------
nosy: +jmfauth

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14176>
_______________________________________