fileobj.read(float): warning or error?
Hi, Since Python 2.4 (maybe 2.2 or older), fileobj.read(4.2) displays an error and works as fileobj.read(4).
i=open('/etc/issue') i.read(4.2) __main__:1: DeprecationWarning: integer argument expected, got float
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications? Well, the real problem is os.urandom(4.2) which goes to an unlimited loop: while len(bytes) < n: bytes += read(_urandomfd, n - len(bytes)) because read(0.2) works as read(0) :-/ Victor
On Mon, Jul 21, 2008 at 2:17 PM, Victor Stinner < victor.stinner@haypocalc.com> wrote:
Hi,
Since Python 2.4 (maybe 2.2 or older), fileobj.read(4.2) displays an error and works as fileobj.read(4).
i=open('/etc/issue') i.read(4.2) __main__:1: DeprecationWarning: integer argument expected, got float
This warning is actually given by the argument parser when "i" gets a Python non-integer.
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
Well, the real problem is os.urandom(4.2) which goes to an unlimited loop:
while len(bytes) < n: bytes += read(_urandomfd, n - len(bytes))
because read(0.2) works as read(0) :-/
Victor _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/musiccomposition%40gmail.c...
-- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."
Le Monday 21 July 2008 21:23:21, vous avez écrit :
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions. -- I hate "transparent" cast, like C and PHP do everywhere. The worst "transparent" cast in Python (2.x) is between str (bytes) and unicode (characters) types. Victor Stinner aka haypo
On Tue, Jul 22, 2008 at 6:43 PM, Victor Stinner < victor.stinner@haypocalc.com> wrote:
Le Monday 21 July 2008 21:23:21, vous avez écrit :
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions.
You could use -Wall to make the warning an error. -- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."
On Tue, Jul 22, 2008 at 6:47 PM, Benjamin Peterson < musiccomposition@gmail.com> wrote:
On Tue, Jul 22, 2008 at 6:43 PM, Victor Stinner < victor.stinner@haypocalc.com> wrote:
Le Monday 21 July 2008 21:23:21, vous avez écrit :
It should raises an error instead of a warning, it has no sense to read a partial byte :-) But that should breaks some applications?
This doesn't come into effect until 3.0.
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions.
You could use -Wall to make the warning an error.
I meant -Werr.
-- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."
-- Cheers, Benjamin Peterson "There's no place like 127.0.0.1."
Well, the real problem is os.urandom(4.2) which goes to an unlimited loop:
while len(bytes) < n: bytes += read(_urandomfd, n - len(bytes))
because read(0.2) works as read(0) :-/
I can't quite accept that as a bug in the library. If you give invalid parameters, Python should not crash, but it may start to behave in a nonsensical way. Of course, it would be possible to move the conversion warning one layer up, into os.urandom; if the argument is float, raise a warning, and then truncate. Regards, Martin
I thought that's what we had __index__ for -- reject arguments that don't SMOOTHLY turn into integers when an integer is actually required! Alex On Mon, Jul 21, 2008 at 10:01 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:
Well, the real problem is os.urandom(4.2) which goes to an unlimited loop:
while len(bytes) < n: bytes += read(_urandomfd, n - len(bytes))
because read(0.2) works as read(0) :-/
I can't quite accept that as a bug in the library. If you give invalid parameters, Python should not crash, but it may start to behave in a nonsensical way.
Of course, it would be possible to move the conversion warning one layer up, into os.urandom; if the argument is float, raise a warning, and then truncate.
Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com
Alex Martelli wrote:
I thought that's what we had __index__ for -- reject arguments that don't SMOOTHLY turn into integers when an integer is actually required!
Sure. However, using that would create an incompatibility, that's why you only get a warning when it falls back to not using __index__. Regards, Martin
On 21Jul2008 21:17, Victor Stinner <victor.stinner@haypocalc.com> wrote: | Well, the real problem is os.urandom(4.2) which goes to an unlimited loop: | while len(bytes) < n: | bytes += read(_urandomfd, n - len(bytes)) | because read(0.2) works as read(0) :-/ Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware? -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/
On Tue, 22 Jul 2008, Cameron Simpson wrote:
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
I think read(0) should be a no-op, just like it is in libc. This lets you write 'read(bytes)' without worrying about checking bytes, and also lets you silently stop reading when you have no more space, like in the following: buf = f.read(max(bytes_left, page_size)) while buf: process(buf) # updates bytes_left buf = f.read(max(bytes_left, page_size)) -- Cheers, Leif
On 21Jul2008 23:35, Leif Walsh <leif.walsh@gmail.com> wrote: | On Tue, 22 Jul 2008, Cameron Simpson wrote: | > Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an | > exception if asked for < 1 bytes? Or is there a legitimate use for | > read(0) with which I was not previously aware? | | I think read(0) should be a no-op, just like it is in libc. This lets | you write 'read(bytes)' without worrying about checking bytes, and | also lets you silently stop reading when you have no more space, like | in the following: | | buf = f.read(max(bytes_left, page_size)) | while buf: | process(buf) # updates bytes_left | buf = f.read(max(bytes_left, page_size)) [ Don't you mean "min()"? Unimportant. ] I see the convenience here, but doubt I'd ever do that myself. I'd write the above like this: while bytes_left > 0: buf = f.read(max(bytes_left, page_size)) if buf == 0: break process(buf) # updates bytes_left I'm kind of picky about doing things exactly as often as required and no more. Especially things that call another facility. read(0) itself must internally have a check for size == 0 anyway, so it's not like the overall system is less complex. If we're unlucky it could trickle all the way down to an OS system call to read(2) (UNIX, substitute as suitable elsewhere) and for a no-op that would be overkill by far. The only way the read() implementation would avoid that is by doing the test on size anyway. But since read() is opaque IMO it is better to avoid it at the upper level if we know it will produce nothing. Which leaves me unconvinced of the utility of this mode. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/
On Tue, Jul 22, 2008 at 3:46 PM, Cameron Simpson <cs@zip.com.au> wrote:
[ Don't you mean "min()"? Unimportant. ]
Haha, that's what I get for not actually _running_ the code example.
I see the convenience here, but doubt I'd ever do that myself. I'd write the above like this:
while bytes_left > 0: buf = f.read(max(bytes_left, page_size)) if buf == 0: break process(buf) # updates bytes_left
I'm kind of picky about doing things exactly as often as required and no more. Especially things that call another facility.
I do the same, but I know lots of people that prefer the example I sent earlier. Also, if we ever adopt the "while ... as ..." syntax (here's not hoping) currently being discussed in another thread, having read(0) return None or an empty buffer will cause that idiom to short circuit as well.
read(0) itself must internally have a check for size == 0 anyway, so it's not like the overall system is less complex. If we're unlucky it could trickle all the way down to an OS system call to read(2) (UNIX, substitute as suitable elsewhere) and for a no-op that would be overkill by far. The only way the read() implementation would avoid that is by doing the test on size anyway. But since read() is opaque IMO it is better to avoid it at the upper level if we know it will produce nothing.
If we are going to make read(0) a no-op, we should definitely do it before it hits the underlying implementation, for portability's sake. On Tue, Jul 22, 2008 at 4:43 PM, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Would it possible to create an option "strict mode" which disallow dangerous cast? Especially in PyArg_Parse*() functions.
Ack! We're not writing Perl here, guys. Can we please not start having multiple subsets of the language that are separately valid? -- Cheers, Leif
On Tue, 22 Jul 2008, Cameron Simpson wrote: [...]
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
http://docs.python.org/lib/bltin-file-objects.html read([size]) ... If the size argument is negative or omitted, read all data until EOF is reached. ... John
On 22Jul2008 20:56, John J Lee <jjl@pobox.com> wrote:
On Tue, 22 Jul 2008, Cameron Simpson wrote: [...]
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
http://docs.python.org/lib/bltin-file-objects.html
read([size])
... If the size argument is negative or omitted, read all data until EOF is reached. ...
Hmm, yeah, but 0 is not negative and not omitted so this does not apply. Personally I'm not very fond of that spec; I'm good with the omitted size provoking a "read everything" mode but I'd rather a non-numeric value like None rather than a negative one (eg the conventional "def read(size=None)") if an explicit size should do so. That way bad arithmetic in the caller could have a chance of triggering an exception from read instead of a silent (and to my taste, nasty) "slurp the file" mode. -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/
On Wed, 23 Jul 2008, Cameron Simpson wrote:
On 22Jul2008 20:56, John J Lee <jjl@pobox.com> wrote:
On Tue, 22 Jul 2008, Cameron Simpson wrote: [...]
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
http://docs.python.org/lib/bltin-file-objects.html
read([size])
... If the size argument is negative or omitted, read all data until EOF is reached. ...
Hmm, yeah, but 0 is not negative and not omitted so this does not apply.
Well, -1 *is* < 1 (and is in the domain of the function), but yes -- sorry, read too quickly, took your "< 1" too literally. John
On Mon, Jul 21, 2008 at 10:37 PM, Cameron Simpson <cs@zip.com.au> wrote:
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all). Of course, read(), read(None), read(-1) and read(<any negative int>) should all read all data until EOF. On the main topic here, read(<float>) and read(<anything that supports __int__ but not __index__>) should definitely raise an exception in 3.0. In 2.6 it should show a warning as it does in 2.5. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Guido van Rossum wrote:
On Mon, Jul 21, 2008 at 10:37 PM, Cameron Simpson <cs@zip.com.au> wrote:
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all).
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL1yZplgi5GaxT1NAQJdTAP7B4GaeBRFg1A6PibmH+2cmJs3AIO2qWrx xfgRO1QVF4OnxGWIKTTbKWX4whBY/zA3UUs35XMSRUROlxPR1dCNIvlaQb+rCuO6 AL0IkE5Fe6iN+VlS9UqarUla9vGhrqD9BxMZmDisIu4uKJi7c3ChlGKuatk16RBQ BosUJe3VjNM= =GkbX -----END PGP SIGNATURE-----
On Tue, Sep 2, 2008 at 10:05 AM, Jesus Cea <jcea@jcea.es> wrote:
Guido van Rossum wrote:
On Mon, Jul 21, 2008 at 10:37 PM, Cameron Simpson <cs@zip.com.au> wrote:
Leaving aside the 0.2 => 0 converstion, shouldn't read() raise an exception if asked for < 1 bytes? Or is there a legitimate use for read(0) with which I was not previously aware?
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all).
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
You don't. If you want to know whether you hit EOF you should try reading a non-zero number of bytes. (Also note that getting fewer bytes than you asked for is not enough to conclude that you have hit EOF.) -- --Guido van Rossum (home page: http://www.python.org/~guido/)
On Tue, 2 Sep 2008, Jesus Cea wrote:
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all).
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
Why would you expect a difference between reading 0 bytes at EOF and reading 0 bytes anywhere else? If you read(4) when at offset 996 in a 1000-byte file I doubt you expect any special notification that you are now at EOF. The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Isaac Morland wrote:
On Tue, 2 Sep 2008, Jesus Cea wrote:
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all).
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
Why would you expect a difference between reading 0 bytes at EOF and reading 0 bytes anywhere else? If you read(4) when at offset 996 in a 1000-byte file I doubt you expect any special notification that you are now at EOF.
My message was an answer to Guido one, saying that some programs calculate the read len substracting buffer lengths, so, then can try to read 0 bytes. Guido argues that returning a empty string is the way to go. My point is: we are simplifying the program considering "0" a valid len counter, but we complicates it because now the code can't consider "" = EOF if it actually asked for 0 bytes.
The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call.
I always consider ""==EOF. I thought that was correct for non-blocking sockets. Am I wrong?. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL2GjZlgi5GaxT1NAQLniwP/SwdmA929j4oPplhtkVU82TYFoyevP/E2 QsHvCZ18CWYSa5LO00Vsd0Uo8ZQeqV8Gx6o2pG2ke66qI7c7pjTQcSO28Z3ztlVW YZVbc46WGozjuiHh2tLVSckI4GyZJzs7+Btho2klE2dNygxWVEpT5Ueu+o2CK0Pl Onf7jG4L+h0= =YHQ/ -----END PGP SIGNATURE-----
On Tue, Sep 2, 2008 at 11:31 AM, Jesus Cea <jcea@jcea.es> wrote:
Isaac Morland wrote:
On Tue, 2 Sep 2008, Jesus Cea wrote:
Indeed. read(0) is quite often generated as an edge case when one is computing buffer sizes, and returning an empty string is most definitely the right thing to do here (otherwise some application code becomes more complex by having to avoid calling read(0) at all).
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
Why would you expect a difference between reading 0 bytes at EOF and reading 0 bytes anywhere else? If you read(4) when at offset 996 in a 1000-byte file I doubt you expect any special notification that you are now at EOF.
My message was an answer to Guido one, saying that some programs calculate the read len substracting buffer lengths, so, then can try to read 0 bytes. Guido argues that returning a empty string is the way to go.
My point is: we are simplifying the program considering "0" a valid len counter, but we complicates it because now the code can't consider "" = EOF if it actually asked for 0 bytes.
Note that it has been like this for a very long time.
The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call.
I always consider ""==EOF. I thought that was correct for non-blocking sockets. Am I wrong?.
You can continue to assume this if you never pass 0 to read(). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Jesus Cea wrote:
My point is: we are simplifying the program considering "0" a valid len counter, but we complicates it because now the code can't consider "" = EOF if it actually asked for 0 bytes.
What are you suggesting read(0) *should* do, then? If it returns None or some other special value, or raises an exception, then you need a special case to handle that. So you've just substituted one special case for another.
Isaac Morland wrote:
The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call.
No, that's not right -- a read of more than 0 bytes will always block until at least 1 byte is available, or something happens that counts as an EOF condition. However, with some devices it's possible for what counts as EOF to happen more than once, e.g. ttys. -- Greg
On Wed, 3 Sep 2008, Greg Ewing wrote:
The Unix read() system call doesn't treat EOF as special other than it won't return bytes from "beyond" EOF and therefore even when reading a regular file could return fewer (including 0) bytes than asked for in the call.
No, that's not right -- a read of more than 0 bytes will always block until at least 1 byte is available, or something happens that counts as an EOF condition.
However, with some devices it's possible for what counts as EOF to happen more than once, e.g. ttys.
Sorry, you're absolutely right. I was thinking only of the fact that read() at EOF is not an error, rather than the blocking behaviour. It sounds like Python read() really is very similar to Unix read() in behaviour. Isaac Morland CSCF Web Guru DC 2554C, x36650 WWW Software Specialist
Jesus Cea wrote:
How do you differenciate between that empty string (when doing "read(0)"), from EOF (that is signaled by an empty string)?.
If you need to be able to make that distinction, then you have to be careful not to try to read 0 bytes. Personally I've never come across a situation where allowing read(0) to occur would have simplified the code. In the usual keep-reading-until-we've-got-the- required-number-of-bytes scenario, you're checking for 0 bytes left to read in order to tell when to stop. -- Greg
participants (11)
-
"Martin v. Löwis"
-
Alex Martelli
-
Benjamin Peterson
-
Cameron Simpson
-
Greg Ewing
-
Guido van Rossum
-
Isaac Morland
-
Jesus Cea
-
John J Lee
-
Leif Walsh
-
Victor Stinner