[IPython-dev] Pasting fix, unicode woes

Tue Sep 7 20:16:51 EDT 2010

I appreciate the desire for Unicode support (although I firmly believe
that source code itself should *always* be in ASCII). Unfortunately,
you're correct that there may be fair amount of effort involved in
supporting Unicode robustly.

In short, I may not have time to get this done in the next week and a
half. We should discuss this further off-line, though.

Evan

On Mon, Sep 6, 2010 at 11:06 PM, Fernando Perez <fperez.net at gmail.com> wrote:
> Hey Evan,
>
> I just fixed the paste-trailing-newline annoyance:
>
> http://github.com/ipython/ipython/commit/92971904bc9fd2b988d8c16e9502edc39a70ff25
>
> I think that approach is good, because it gives the user a chance to
> edit the code before actually executing, but otherwise just needs  a
> simple return to execute.
>
> I do have one question though: why disallow unicode paste?  People are
> quite likely to have non-ascii in their examples, and it seems odd to
> block them from pasting it in.  Consider for example that I can't
> paste this:
>
> name = "Fernando Pérez"
>
> I consider the fact that I can't type my own name into ipython a bug :)
>
> I think the solution is to set the GUI encoding by default to UTF-8,
> with an option for the user to change that according to their
> preferences later.  I had a quick go at it, but it was getting too
> complicated so I didn't commit anything anywhere.  Here's the diff in
> case you find it useful as a starting point (I just reverted locally):
>
> ####
> (newkernel)amirbar[qt]> git diff
> diff --git a/IPython/frontend/qt/console/console_widget.py
> b/IPython/frontend/qt/console/console_widget.py
> index d78cd63..f6ae9fd 100644
> --- a/IPython/frontend/qt/console/console_widget.py
> +++ b/IPython/frontend/qt/console/console_widget.py
> @@ -10,7 +10,7 @@ from PyQt4 import QtCore, QtGui
>  # Local imports
>  from IPython.config.configurable import Configurable
>  from IPython.frontend.qt.util import MetaQObjectHasTraits
> -from IPython.utils.traitlets import Bool, Enum, Int
> +from IPython.utils.traitlets import Bool, Enum, Int, Str
>  from ansi_code_processor import QtAnsiCodeProcessor
>  from completion_widget import CompletionWidget
>
> @@ -37,6 +37,9 @@ class ConsoleWidget(Configurable, QtGui.QWidget):
>     # non-positive number disables text truncation (not recommended).
>     buffer_size = Int(500, config=True)
>
> +    # The default encoding used by the GUI.
> +    encoding = Str('utf-8')
> +
>     # Whether to use a list widget or plain text output for tab completion.
>     gui_completion = Bool(False, config=True)
>
> @@ -233,7 +236,7 @@ class ConsoleWidget(Configurable, QtGui.QWidget):
>             text = QtGui.QApplication.clipboard().text()
>             if not text.isEmpty():
>                 try:
> -                    str(text)
> +                    text.encode(self.encoding)
>                     return True
>                 except UnicodeEncodeError:
>                     pass
> @@ -421,7 +424,8 @@ class ConsoleWidget(Configurable, QtGui.QWidget):
>             try:
>                 # Remove any trailing newline, which confuses the GUI and
>                 # forces the user to backspace.
> -                text = str(QtGui.QApplication.clipboard().text(mode)).rstrip()
> +                raw = QtGui.QApplication.clipboard().text(mode).rstrip()
> +                text = raw.encode(self.encoding)
>             except UnicodeEncodeError:
>                 pass
>             else:
> @@ -1034,7 +1038,7 @@ class ConsoleWidget(Configurable, QtGui.QWidget):
>         cursor.movePosition(QtGui.QTextCursor.StartOfBlock)
>         cursor.movePosition(QtGui.QTextCursor.EndOfBlock,
>                             QtGui.QTextCursor.KeepAnchor)
> -        return str(cursor.selection().toPlainText())
> +        return unicode(cursor.selection().toPlainText()).encode(self.encoding)
>
>     def _get_cursor(self):
>         """ Convenience method that returns a cursor for the current position.
> ####
>
> By the way, this isn't an odd corner case: in other countries, people
> are likely to have files and directories with unicode in them *all the
> time*, so this problem will hit us immediately once the code is out,
> I'm afraid.
>
> I saw multiple calls of the form str(some.Qt.Code()) that were
> throwing exceptions and decided to stop before I get myself too deep
> into Qt code I don't know well.  But the right approach is probably to
> encapsulate all those into a single common call that manages the
> encoding.
>
> The tricky part, I suspect, will be to do the cursor positioning logic
> with unicode in play: you need to correctly compute the lengths in
> terms of characters on the unicode string (more precisely, the number
> of glyphs that the code points map to), not bytes on the raw one.
>
> Welcome to the wonderful world of unicode!
>
> Cheers,
>
> f
>
> ps - and on py3k it's *only* unicode everywhere, so we might as well
> get this code right from the get go.  Now that we have people starting
> to help towards py3, the last thing we should do is write a ton of new
> code that is unicode-unsafe for a py3 transition.  We're not writing
> py3 code yet, but we should write *with an eye towards py3*.
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>