[Python-Dev] py3k: accept unicode for 'c' and byte for 'C' in getarg?

Tue Mar 17 18:31:22 CET 2009

On Tue, Mar 17, 2009 at 10:03 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Le Tuesday 17 March 2009 17:27:39, vous avez écrit :
>> > The "C" format (get a character) has the opposite problem: it accepts
>> > both byte and unicode, whereas byte should be rejected. Example:
>> > mmap.write_byte('é') should be a TypeError.
>>
>> YEah, mmap should be defined exclusively in terms of bytes.
>
> It's already the fix (only use bytes) choosen for the mmap issue:
>   http://bugs.python.org/issue5391
> (the problem is bigger than mmap.write_byte, other methods have to be changed)
>
>> > Usage of "c" format:
>> >  msvcrt.putch(char)
>> >  msvcrt.ungetch(char)
>>
>> ISTM that putch() and ungetch() are text operations so should use 'C'.
>
> The low level functions use the C type "char":
>  _putch(char)=>void
>  _ungetch(char)=>char

Where did you find these signatures? I looked them up on microsoft.com
and found definitions for _putch() taking an int and _getch()
returning an int.

http://msdn.microsoft.com/en-us/library/azb6c04e.aspx
http://msdn.microsoft.com/en-us/library/078sfkak.aspx

AFAIK the versions without leading underscores are the same.

Also, just because it only takes an ASCII character doesn't mean it's
meant for binary I/O. putch() and getch() are clearly meant for text
I/O.

> For text, we have unicode versions of these functions:
>  msvcrt.ungetwch(unicode string of 1 character)
>  msvcrt.putwch(unicode string of 1 character)
>
> So "c" looks to be the right format for putch() and ungetch().
>
> See also http://bugs.python.org/issue5410

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)