[Python-Dev] py3k: accept unicode for 'c' and byte for 'C' in getarg?

Victor Stinner victor.stinner at haypocalc.com
Tue Mar 17 18:03:32 CET 2009


Le Tuesday 17 March 2009 17:27:39, vous avez écrit :
> > The "C" format (get a character) has the opposite problem: it accepts
> > both byte and unicode, whereas byte should be rejected. Example:
> > mmap.write_byte('é') should be a TypeError.
>
> YEah, mmap should be defined exclusively in terms of bytes.

It's already the fix (only use bytes) choosen for the mmap issue:
   http://bugs.python.org/issue5391
(the problem is bigger than mmap.write_byte, other methods have to be changed)

> > Usage of "c" format:
> >  msvcrt.putch(char)
> >  msvcrt.ungetch(char)
>
> ISTM that putch() and ungetch() are text operations so should use 'C'.

The low level functions use the C type "char":
  _putch(char)=>void
  _ungetch(char)=>char

For text, we have unicode versions of these functions:
  msvcrt.ungetwch(unicode string of 1 character)
  msvcrt.putwch(unicode string of 1 character)

So "c" looks to be the right format for putch() and ungetch().

See also http://bugs.python.org/issue5410

-- 
Victor Stinner aka haypo
http://www.haypocalc.com/blog/


More information about the Python-Dev mailing list