[IPython-dev] IPython and unicode (planning ahead for Py3k)

Robert Kern robert.kern at gmail.com
Wed Nov 11 22:06:39 EST 2009


Brian Granger wrote:
> 
>     These should be enormously rare, I think. By and large, we are
>     mostly concerned
>     with representing ~/, right? I think it is reasonable to only
>     support (encoded)
>     Unicode file paths and not support *completely* arbitrary file
>     paths. I doubt we
>     will get a single bug report.
> 
> 
> Yes, I think they are rare.  I need to learn more about unicode.  When 
> you say we should
> support encoded unicode paths,  does that mean the current get_home_dir 
> (which calls
> decode) would have to change?

What I meant was that we should support the subset of UNIX paths (which are 
bytes on the file system) that are validly encoded forms of Unicode strings 
under some encoding and not just arbitrary bytes.

> It does things like:
> 
> return homedir.decode(sys.getfilesystemencoding())
> 
> Should we just return unicode(homedir) instead? 

unicode(some_bytes) will only work if some_bytes are pure ASCII. You need to 
explicitly .decode() in order to decode the correct encoding. I think that our 
functions should consume and return unicode strings. If they internally call 
APIs that only consume or return bytes, then we would encode/decode as above.

> I guess I can no longer 
> avoid learning
> about unicode...

Bad developer! :-)

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco




More information about the IPython-dev mailing list