[Python-Dev] py3k: accept unicode for 'c' and byte for 'C' in getarg?
Guido van Rossum
guido at python.org
Tue Mar 17 17:27:39 CET 2009
On Tue, Mar 17, 2009 at 5:52 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> I realised with the issue #3446 that getarg('c') (get a byte) accepts not only
> a byte string of 1 byte, but also an unicode string of 1 character (if the
> character code is in [0; 255]). I don't think that it's a good idea to accept
> unicode here. Example: b"x".center(5, "\xe9") should be a TypeError.
Agreed.
> The "C" format (get a character) has the opposite problem: it accepts both
> byte and unicode, whereas byte should be rejected. Example:
> mmap.write_byte('é') should be a TypeError.
YEah, mmap should be defined exclusively in terms of bytes.
> The problem was already discuss in the email thread "What type of object
> mmap.read_byte should return on py3k?" started by Hirokazu Yamamoto related
> to issue #5391.
>
> Short history:
> - r55109: Guido changes 'c' format to accept unicode (struni branch).
> getarg('c') => char accepts byte and character
> - r56044: walter.doerwald changes the 'c' format to return an int (an
> unicode character) for datetime.datetime.isoformat().
> getarg('c') => int accepts byte and character
> - r56140: Revert r56044 and creates 'C' format
> getarg('c') => char accepts byte and character
> getarg('C') => int accepts byte and character
>
> So we have:
> - getarg('c') -> one byte (integer in [0; 255])
> - getarg('C') -> one character (code in [0; INTMAX])
> Note: Why not using Py_UNICODE instead of int?
>
> Usage of "C" format:
> datetime.datetime.isoformat(sep)
> array.array(type, data): type
>
> Usage of "c" format:
> msvcrt.putch(char)
> msvcrt.ungetch(char)
ISTM that putch() and ungetch() are text operations so should use 'C'.
> <mmap object>.write_byte(char)
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list