fileobj.read(float): warning or error?

Hi, Since Python 2.4 (maybe 2.2 or older), fileobj.read(4.2) displays an error and works as fileobj.read(4).
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications? Well, the real problem is os.urandom(4.2) which goes to an unlimited loop: while len(bytes) < n: bytes += read(_urandomfd, n - len(bytes)) because read(0.2) works as read(0) :-/ Victor

On Mon, Jul 21, 2008 at 2:17 PM, Victor Stinner < victor.stinner@haypocalc.com> wrote:
This warning is actually given by the argument parser when "i" gets a Python non-integer.
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
-- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."

Le Monday 21 July 2008 21:23:21, vous avez écrit :
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions. -- I hate "transparent" cast, like C and PHP do everywhere. The worst "transparent" cast in Python (2.x) is between str (bytes) and unicode (characters) types. Victor Stinner aka haypo

I can't quite accept that as a bug in the library. If you give invalid parameters, Python should not crash, but it may start to behave in a nonsensical way. Of course, it would be possible to move the conversion warning one layer up, into os.urandom; if the argument is float, raise a warning, and then truncate. Regards, Martin

On 21Jul2008 21:17, Victor Stinner <victor.stinner@haypocalc.com> wrote: | Well, the real problem is os.urandom(4.2) which goes to an unlimited loop: | while len(bytes) < n: | bytes += read(_urandomfd, n - len(bytes)) | because read(0.2) works as read(0) :-/ Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware? -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Tue, 22 Jul 2008, Cameron Simpson wrote:
I think read(0) should be a no-op, just like it is in libc. This lets you write 'read(bytes)' without worrying about checking bytes, and also lets you silently stop reading when you have no more space, like in the following: buf = f.read(max(bytes_left, page_size)) while buf: process(buf) # updates bytes_left buf = f.read(max(bytes_left, page_size)) -- Cheers, Leif

On 21Jul2008 23:35, Leif Walsh <leif.walsh@gmail.com> wrote: | On Tue, 22 Jul 2008, Cameron Simpson wrote: | > Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an | > exception if asked for < 1 bytes? Or is there a legitimate use for | > read(0) with which I was not previously aware? | | I think read(0) should be a no-op, just like it is in libc. This lets | you write 'read(bytes)' without worrying about checking bytes, and | also lets you silently stop reading when you have no more space, like | in the following: | | buf = f.read(max(bytes_left, page_size)) | while buf: | process(buf) # updates bytes_left | buf = f.read(max(bytes_left, page_size)) [ Don't you mean "min()"? Unimportant. ] I see the convenience here, but doubt I'd ever do that myself. I'd write the above like this: while bytes_left > 0: buf = f.read(max(bytes_left, page_size)) if buf == 0: break process(buf) # updates bytes_left I'm kind of picky about doing things exactly as often as required and no more. Especially things that call another facility. read(0) itself must internally have a check for size == 0 anyway, so it's not like the overall system is less complex. If we're unlucky it could trickle all the way down to an OS system call to read(2) (UNIX, substitute as suitable elsewhere) and for a no-op that would be overkill by far. The only way the read() implementation would avoid that is by doing the test on size anyway. But since read() is opaque IMO it is better to avoid it at the upper level if we know it will produce nothing. Which leaves me unconvinced of the utility of this mode. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Tue, Jul 22, 2008 at 3:46 PM, Cameron Simpson <cs@zip.com.au> wrote:
[ Don't you mean "min()"? Unimportant. ]
Haha, that's what I get for not actually _running_ the code example.
I do the same, but I know lots of people that prefer the example I sent earlier. Also, if we ever adopt the "while ... as ..." syntax (here's not hoping) currently being discussed in another thread, having read(0) return None or an empty buffer will cause that idiom to short circuit as well.
If we are going to make read(0) a no-op, we should definitely do it before it hits the underlying implementation, for portability's sake. On Tue, Jul 22, 2008 at 4:43 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions.
Ack! We're not writing Perl here, guys. Can we please not start having multiple subsets of the language that are separately valid? -- Cheers, Leif

On Tue, 22 Jul 2008, Cameron Simpson wrote: [...]
http://docs.python.org/lib/bltin-file-objects.html read([size]) ... If the size argument is negative or omitted, read all data until EOF is reached. ... John

On 22Jul2008 20:56, John J Lee <jjl@pobox.com> wrote:
Hmm, yeah, but 0 is not negative and not omitted so this does not apply. Personally I'm not very fond of that spec; I'm good with the omitted size provoking a "read everything" mode but I'd rather a non-numeric value like None rather than a negative one (eg the conventional "def read(size=None)") if an explicit size should do so. That way bad arithmetic in the caller could have a chance of triggering an exception from read instead of a silent (and to my taste, nasty) "slurp the file" mode. -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Mon, Jul 21, 2008 at 10:37 PM, Cameron Simpson <cs@zip.com.au> wrote:
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all). Of course, read(), read(None), read(-1) and read(<any negative int>) should all read all data until EOF. On the main topic here, read(<float>) and read(<anything that supports __int__ but not __index__>) should definitely raise an exception in 3.0. In 2.6 it should show a warning as it does in 2.5. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote:
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL1yZplgi5GaxT1NAQJdTAP7B4GaeBRFg1A6PibmH+2cmJs3AIO2qWrx xfgRO1QVF4OnxGWIKTTbKWX4whBY/zA3UUs35XMSRUROlxPR1dCNIvlaQb+rCuO6 AL0IkE5Fe6iN+VlS9UqarUla9vGhrqD9BxMZmDisIu4uKJi7c3ChlGKuatk16RBQ BosUJe3VjNM= =GkbX -----END PGP SIGNATURE-----

On Tue, Sep 2, 2008 at 10:05 AM, Jesus Cea <jcea@jcea.es> wrote:
You don't. If you want to know whether you hit EOF you should try reading a non-zero number of bytes. (Also note that getting fewer bytes than you asked for is not enough to conclude that you have hit EOF.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 2 Sep 2008, Jesus Cea wrote:
Why would you expect a difference between reading 0 bytes at EOF and reading 0 bytes anywhere else? If you read(4) when at offset 996 in a 1000-byte file I doubt you expect any special notification that you are now at EOF. The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Isaac Morland wrote:
My message was an answer to Guido one, saying that some programs calculate the read len substracting buffer lengths, so, then can try to read 0 bytes. Guido argues that returning a empty string is the way to go. My point is: we are simplifying the program considering "0" a valid len counter, but we complicates it because now the code can't consider "" = EOF if it actually asked for 0 bytes.
I always consider ""==EOF. I thought that was correct for non-blocking sockets. Am I wrong?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL2GjZlgi5GaxT1NAQLniwP/SwdmA929j4oPplhtkVU82TYFoyevP/E2 QsHvCZ18CWYSa5LO00Vsd0Uo8ZQeqV8Gx6o2pG2ke66qI7c7pjTQcSO28Z3ztlVW YZVbc46WGozjuiHh2tLVSckI4GyZJzs7+Btho2klE2dNygxWVEpT5Ueu+o2CK0Pl Onf7jG4L+h0= =YHQ/ -----END PGP SIGNATURE-----

On Tue, Sep 2, 2008 at 11:31 AM, Jesus Cea <jcea@jcea.es> wrote:
Note that it has been like this for a very long time.
You can continue to assume this if you never pass 0 to read(). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Jesus Cea wrote:
What are you suggesting read(0) *should* do, then? If it returns None or some other special value, or raises an exception, then you need a special case to handle that. So you've just substituted one special case for another.
No, that's not right -- a read of more than 0 bytes will always block until at least 1 byte is available, or something happens that counts as an EOF condition. However, with some devices it's possible for what counts as EOF to happen more than once, e.g. ttys. -- Greg

On Wed, 3 Sep 2008, Greg Ewing wrote:
Sorry, you're absolutely right. I was thinking only of the fact that read() at EOF is not an error, rather than the blocking behaviour. It sounds like Python read() really is very similar to Unix read() in behaviour. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist

Jesus Cea wrote:
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
If you need to be able to make that distinction, then you have to be careful not to try to read 0 bytes. Personally I've never come across a situation where allowing read(0) to occur would have simplified the code. In the usual keep-reading-until-we've-got-the- required-number-of-bytes scenario, you're checking for 0 bytes left to read in order to tell when to stop. -- Greg

On Mon, Jul 21, 2008 at 2:17 PM, Victor Stinner < victor.stinner@haypocalc.com> wrote:
This warning is actually given by the argument parser when "i" gets a Python non-integer.
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
-- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."

Le Monday 21 July 2008 21:23:21, vous avez écrit :
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions. -- I hate "transparent" cast, like C and PHP do everywhere. The worst "transparent" cast in Python (2.x) is between str (bytes) and unicode (characters) types. Victor Stinner aka haypo

I can't quite accept that as a bug in the library. If you give invalid parameters, Python should not crash, but it may start to behave in a nonsensical way. Of course, it would be possible to move the conversion warning one layer up, into os.urandom; if the argument is float, raise a warning, and then truncate. Regards, Martin

On 21Jul2008 21:17, Victor Stinner <victor.stinner@haypocalc.com> wrote: | Well, the real problem is os.urandom(4.2) which goes to an unlimited loop: | while len(bytes) < n: | bytes += read(_urandomfd, n - len(bytes)) | because read(0.2) works as read(0) :-/ Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware? -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Tue, 22 Jul 2008, Cameron Simpson wrote:
I think read(0) should be a no-op, just like it is in libc. This lets you write 'read(bytes)' without worrying about checking bytes, and also lets you silently stop reading when you have no more space, like in the following: buf = f.read(max(bytes_left, page_size)) while buf: process(buf) # updates bytes_left buf = f.read(max(bytes_left, page_size)) -- Cheers, Leif

On 21Jul2008 23:35, Leif Walsh <leif.walsh@gmail.com> wrote: | On Tue, 22 Jul 2008, Cameron Simpson wrote: | > Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an | > exception if asked for < 1 bytes? Or is there a legitimate use for | > read(0) with which I was not previously aware? | | I think read(0) should be a no-op, just like it is in libc. This lets | you write 'read(bytes)' without worrying about checking bytes, and | also lets you silently stop reading when you have no more space, like | in the following: | | buf = f.read(max(bytes_left, page_size)) | while buf: | process(buf) # updates bytes_left | buf = f.read(max(bytes_left, page_size)) [ Don't you mean "min()"? Unimportant. ] I see the convenience here, but doubt I'd ever do that myself. I'd write the above like this: while bytes_left > 0: buf = f.read(max(bytes_left, page_size)) if buf == 0: break process(buf) # updates bytes_left I'm kind of picky about doing things exactly as often as required and no more. Especially things that call another facility. read(0) itself must internally have a check for size == 0 anyway, so it's not like the overall system is less complex. If we're unlucky it could trickle all the way down to an OS system call to read(2) (UNIX, substitute as suitable elsewhere) and for a no-op that would be overkill by far. The only way the read() implementation would avoid that is by doing the test on size anyway. But since read() is opaque IMO it is better to avoid it at the upper level if we know it will produce nothing. Which leaves me unconvinced of the utility of this mode. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Tue, Jul 22, 2008 at 3:46 PM, Cameron Simpson <cs@zip.com.au> wrote:
[ Don't you mean "min()"? Unimportant. ]
Haha, that's what I get for not actually _running_ the code example.
I do the same, but I know lots of people that prefer the example I sent earlier. Also, if we ever adopt the "while ... as ..." syntax (here's not hoping) currently being discussed in another thread, having read(0) return None or an empty buffer will cause that idiom to short circuit as well.
If we are going to make read(0) a no-op, we should definitely do it before it hits the underlying implementation, for portability's sake. On Tue, Jul 22, 2008 at 4:43 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions.
Ack! We're not writing Perl here, guys. Can we please not start having multiple subsets of the language that are separately valid? -- Cheers, Leif

On Tue, 22 Jul 2008, Cameron Simpson wrote: [...]
http://docs.python.org/lib/bltin-file-objects.html read([size]) ... If the size argument is negative or omitted, read all data until EOF is reached. ... John

On 22Jul2008 20:56, John J Lee <jjl@pobox.com> wrote:
Hmm, yeah, but 0 is not negative and not omitted so this does not apply. Personally I'm not very fond of that spec; I'm good with the omitted size provoking a "read everything" mode but I'd rather a non-numeric value like None rather than a negative one (eg the conventional "def read(size=None)") if an explicit size should do so. That way bad arithmetic in the caller could have a chance of triggering an exception from read instead of a silent (and to my taste, nasty) "slurp the file" mode. -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/

On Mon, Jul 21, 2008 at 10:37 PM, Cameron Simpson <cs@zip.com.au> wrote:
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all). Of course, read(), read(None), read(-1) and read(<any negative int>) should all read all data until EOF. On the main topic here, read(<float>) and read(<anything that supports __int__ but not __index__>) should definitely raise an exception in 3.0. In 2.6 it should show a warning as it does in 2.5. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote:
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL1yZplgi5GaxT1NAQJdTAP7B4GaeBRFg1A6PibmH+2cmJs3AIO2qWrx xfgRO1QVF4OnxGWIKTTbKWX4whBY/zA3UUs35XMSRUROlxPR1dCNIvlaQb+rCuO6 AL0IkE5Fe6iN+VlS9UqarUla9vGhrqD9BxMZmDisIu4uKJi7c3ChlGKuatk16RBQ BosUJe3VjNM= =GkbX -----END PGP SIGNATURE-----

On Tue, Sep 2, 2008 at 10:05 AM, Jesus Cea <jcea@jcea.es> wrote:
You don't. If you want to know whether you hit EOF you should try reading a non-zero number of bytes. (Also note that getting fewer bytes than you asked for is not enough to conclude that you have hit EOF.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 2 Sep 2008, Jesus Cea wrote:
Why would you expect a difference between reading 0 bytes at EOF and reading 0 bytes anywhere else? If you read(4) when at offset 996 in a 1000-byte file I doubt you expect any special notification that you are now at EOF. The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Isaac Morland wrote:
My message was an answer to Guido one, saying that some programs calculate the read len substracting buffer lengths, so, then can try to read 0 bytes. Guido argues that returning a empty string is the way to go. My point is: we are simplifying the program considering "0" a valid len counter, but we complicates it because now the code can't consider "" = EOF if it actually asked for 0 bytes.
I always consider ""==EOF. I thought that was correct for non-blocking sockets. Am I wrong?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL2GjZlgi5GaxT1NAQLniwP/SwdmA929j4oPplhtkVU82TYFoyevP/E2 QsHvCZ18CWYSa5LO00Vsd0Uo8ZQeqV8Gx6o2pG2ke66qI7c7pjTQcSO28Z3ztlVW YZVbc46WGozjuiHh2tLVSckI4GyZJzs7+Btho2klE2dNygxWVEpT5Ueu+o2CK0Pl Onf7jG4L+h0= =YHQ/ -----END PGP SIGNATURE-----

On Tue, Sep 2, 2008 at 11:31 AM, Jesus Cea <jcea@jcea.es> wrote:
Note that it has been like this for a very long time.
You can continue to assume this if you never pass 0 to read(). -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Jesus Cea wrote:
What are you suggesting read(0) *should* do, then? If it returns None or some other special value, or raises an exception, then you need a special case to handle that. So you've just substituted one special case for another.
No, that's not right -- a read of more than 0 bytes will always block until at least 1 byte is available, or something happens that counts as an EOF condition. However, with some devices it's possible for what counts as EOF to happen more than once, e.g. ttys. -- Greg

On Wed, 3 Sep 2008, Greg Ewing wrote:
Sorry, you're absolutely right. I was thinking only of the fact that read() at EOF is not an error, rather than the blocking behaviour. It sounds like Python read() really is very similar to Unix read() in behaviour. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist

Jesus Cea wrote:
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
If you need to be able to make that distinction, then you have to be careful not to try to read 0 bytes. Personally I've never come across a situation where allowing read(0) to occur would have simplified the code. In the usual keep-reading-until-we've-got-the- required-number-of-bytes scenario, you're checking for 0 bytes left to read in order to tell when to stop. -- Greg
participants (11)
-
"Martin v. Löwis"
-
Alex Martelli
-
Benjamin Peterson
-
Cameron Simpson
-
Greg Ewing
-
Guido van Rossum
-
Isaac Morland
-
Jesus Cea
-
John J Lee
-
Leif Walsh
-
Victor Stinner