Mailman 3 What type of object mmap.read_byte should return on py3k? - Python-Dev

What type of object mmap.read_byte should return on py3k?

Hirokazu Yamamoto

28 Feb 2009 28 Feb '09

11:19 a.m.

Hello. I noticed mmap.read_byte returns 1-length unicode on py3k. I felt this was strange, so I created issue on bug tracker (http://bugs.python.org/issue5391) and Martin proposed this is suitable for discussion on python-dev. I'll quote messages on bug tracker here. I wrote:

...

On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte accepts str. Is this intended behavior?

...
...
...
import mmap m = mmap.mmap(-1, 10) type(m.read_byte()) m.write_byte("a") m.write_byte(b"a")

Maybe another possibility. read_byte() returns int which represents byte, write_byte accepts int which represents byte. (Like b"abc"[0] returns int not 1-length bytes)

Martin wrote:

...

Indeed, I think it should use the "b" code, instead of the "c" code. Please discuss this on python-dev, though.

It might not be ok to backport this to 3.0, since it may break existing code.

...

Furthermore, all other uses of the "c" code might need to be reconsidered.

Show replies by date

Nick Coghlan

28 Feb 28 Feb

12:01 p.m.

Hirokazu Yamamoto wrote:

...

Hello. I noticed mmap.read_byte returns 1-length unicode on py3k. I felt this was strange, so I created issue on bug tracker (http://bugs.python.org/issue5391) and Martin proposed this is suitable for discussion on python-dev. I'll quote messages on bug tracker here.

I wrote:

...
On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte accepts str. Is this intended behavior?

...
...
...
import mmap m = mmap.mmap(-1, 10) type(m.read_byte()) m.write_byte("a") m.write_byte(b"a")

Maybe another possibility. read_byte() returns int which represents byte, write_byte accepts int which represents byte. (Like b"abc"[0] returns int not 1-length bytes)

Martin wrote:

...
Indeed, I think it should use the "b" code, instead of the "c" code. Please discuss this on python-dev, though.

It might not be ok to backport this to 3.0, since it may break existing code.

...
Furthermore, all other uses of the "c" code might need to be reconsidered.

It certainly seems like mmap should be playing in an all-bytes world (where only already encoded strings are allowed). On the specific question of whether it would be better for read_byte()/write_byte to use 1-length bytes objects or integers, I have no strong opinion (the former is closer to the 2.x class API, the later more consistent with the operation of the 3.x bytes class). However, as Martin says, it wouldn't be reasonable to backport the fixes in this to 3.0 - the associated API changes would almost certainly break otherwise working code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Hirokazu Yamamoto

2:06 p.m.

...

It certainly seems like mmap should be playing in an all-bytes world (where only already encoded strings are allowed).

Agreed.

...

On the specific question of whether it would be better for read_byte()/write_byte to use 1-length bytes objects or integers, I have no strong opinion (the former is closer to the 2.x class API, the later more consistent with the operation of the 3.x bytes class).

Personally, I was surprised when I saw b"0123"[1] != b"1". But I don't have strong opinion neither.

...

However, as Martin says, it wouldn't be reasonable to backport the fixes in this to 3.0 - the associated API changes would almost certainly break otherwise working code.

Agreed. I greped py3k source tree with "c", I found another Py_BuildValue("c" in curse module. But this function returns unicode in else clause, so probably this is correct usage. Modules\mmapmodule.c(207): return Py_BuildValue("c", value); Modules\_cursesmodule.c(893): return Py_BuildValue("c", rtn); Modules\_dbmmodule.c(380): else if ( strcmp(flags, "c") == 0 ) Modules\_ctypes\cfield.c(112): if (idict->getfunc == getentry("c")->getfunc) { Modules\_ctypes\stgdict.c(459): if (dict->getfunc != getentry("c")->getfunc Modules\_ctypes\_ctypes.c(1372): if (itemdict->getfunc == getentry("c")->getfunc) { Modules\_ctypes\_ctypes.c(1536): if (dict && (dict->setfunc == getentry("c")->setfunc)) { Modules\_ctypes\_ctypes.c(1545): if (dict && (dict->setfunc == getentry("c")->setfunc)) { Modules\_ctypes\_ctypes.c(4197): if (itemdict->getfunc == getentry("c")->getfunc) { Modules\_ctypes\_ctypes.c(4890): if (itemdict->getfunc == getentry("c")->getfunc) { PC\os2emx\getpathp.c(128): strcat(filename, Py_OptimizeFlag ? "o" : "c"); Python\import.c(1756): strcpy(buf+i, Py_OptimizeFlag ? "o" : "c");

Victor Stinner

2:47 p.m.

New subject: [Python-Dev] What type of object mmap.read_byte should return on py3k?

Le Saturday 28 February 2009 15:06:38 Hirokazu Yamamoto, vous avez écrit :

...

I greped py3k source tree with "c", I found another Py_BuildValue("c" in curse module. But this function returns unicode in else clause, so probably this is correct usage.

I used different regex on to catch "...c..." with Py_BuildValue and PyArg_Parse... because a function may have other arguments or specify the function name with "...:name": http://bugs.python.org/issue5391 It looks like msvcrt.putch(char) and msvcrt.ungetch(char) use the wrong types. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/

Victor Stinner

2:21 p.m.

New subject: What type of object mmap.read_byte should return on py3k?

About m.read_byte(), we have two choices: (a) Py_BuildValue("b", value) => 0 (b) Py_BuildValue("y#", &value, 1) => b"\x00" About m.write_byte(x), we have also two choices: (a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0) (b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and check for length=1: write_byte(b"\x00") (b) choices are close to Python 2.x API. But we can already use m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte. So it would be better to break the API and use integers, (a) choices which require also documentation changes: mmap.read_byte() Returns a string of length 1 containing the character at the current file position, and advances the file position by 1. mmap.write_byte(byte) Write the single-character string byte into memory at the current position of the file pointer; the file position is advanced by 1. If the mmap was created with ACCESS_READ, then writing to it will throw a TypeError exception. -- Victor Stinner aka haypo http://www.haypocalc.com/blog/

Hirokazu Yamamoto

3:23 p.m.

New subject: What type of object mmap.read_byte should return on py3k?

Victor Stinner wrote:

...

About m.read_byte(), we have two choices: (a) Py_BuildValue("b", value) => 0 (b) Py_BuildValue("y#", &value, 1) => b"\x00"

About m.write_byte(x), we have also two choices: (a) PyArg_ParseTuple(args, "b:write_byte", &value): write_byte(0) (b) PyArg_ParseTuple(args, "y#:write_byte", &value, &length) and check for length=1: write_byte(b"\x00")

(b) choices are close to Python 2.x API. But we can already use m.read(1)->b"\x00" and m.write(b"\x00") to use byte string of 1 byte. So it would be better to break the API and use integers, (a) choices which require also documentation changes:

I'm +1 for (a) because mmap.__getitem__ already returns integer not 1-length bytes. And as I wrote in http://bugs.python.org/msg82912, it seems that more bytes cleanup is needed in mmap documentaion/implementation. I hope someone else will look into other modules' ones. ;-)

Hirokazu Yamamoto

1 Mar 1 Mar

6:45 p.m.

New subject: What type of object mmap.read_byte should return on py3k?

I uploaded the patch with choice (a) http://bugs.python.org/file13215/py3k_mmap_and_bytes.patch If (b) is suitable, I'll rewrite the patch.

Josiah Carlson

5 Mar 5 Mar

8 a.m.

New subject: What type of object mmap.read_byte should return on py3k?

On Sun, Mar 1, 2009 at 10:45 AM, Hirokazu Yamamoto wrote:

...

I uploaded the patch with choice (a) http://bugs.python.org/file13215/py3k_mmap_and_bytes.patch If (b) is suitable, I'll rewrite the patch. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com

Has anyone been using mmap in python 3k to know what is more intuitive? When I was using mmap in python 2.4, I never used the read/write methods, I stuck with slicing, which was very convenient with 2.4 non-unicode strings. I don't really have an intuition on 3.x bytes. - Josiah

5530

Age (days ago)

5535

Last active (days ago)

List overview

Download

7 comments

4 participants

participants (4)

Hirokazu Yamamoto
Josiah Carlson
Nick Coghlan
Victor Stinner

What type of object mmap.read_byte should return on py3k?

Hirokazu Yamamoto

Nick Coghlan

Hirokazu Yamamoto

Victor Stinner

Victor Stinner

Hirokazu Yamamoto

Hirokazu Yamamoto

Josiah Carlson

tags

participants (4)