Re: [Patches] [Patch #102955] bltinmodule.c warning fix

On Tue, Dec 19, 2000 at 07:02:05PM -0800, noreply@sourceforge.net wrote:
Date: 2000-Dec-19 19:02 By: tim_one
Is it OK to refer to 8-bit strings under that name? How about "expected an 8-bit string or Unicode string", when the object passed to ord() isn't of the right type. Similarly, when the value is of the right type but has length>1, the message is "ord() expected a character, length-%d string found". Should that be "length-%d (string / unicode) found)" And should the type names be changed to '8-bit string'/'Unicode string', maybe? --amk

[Andrew Kuchling]
Actually, upon reflection I think it was a mistake to add all these "or Unicode" clauses to the error msgs to begin with. Python used to have only one string type, we're saying that's also a hope for the future, and in the meantime I know I'd have no trouble understanding "string" as including both 8-bit strings and Unicode strings. So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable. So "ord() expected string ..." instead of (even a repaired version of) "ord() expected string or Unicode character ..." but-i'm-not-even-motivated-enough-to-finish-this-sig-

Tim Peters wrote:
I think this has to do with understanding that there are two string types in Python 2.0 -- a novice won't notice this until she sees the error message. My understanding is similar to yours, "string" should mean "any string object" and in cases where the difference between 8-bit string and Unicode matters, these should be referred to as "8-bit string" and "Unicode string". Still, I think it is a good idea to make people aware of the possibility of passing Unicode objects to these functions, so perhaps the idea of adding both possibilies to error messages is not such a bad idea for 2.1. The next phases would be converting all messages back to "string" and then convert all strings to Unicode ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Tim]
[MAL]
Except that this error msg has nothing to do with how many string types there are: they didn't pass *any* flavor of string when they get this msg. At the time they pass (say) a float to ord(), that there are currently two flavors of string is more information than they need to know.
In that happy case of universal harmony, the msg above should say just "string" and leave it at that.
Still, I think it is a good idea to make people aware of the possibility of passing Unicode objects to these functions,
Me too.
so perhaps the idea of adding both possibilies to error messages is not such a bad idea for 2.1.
But not that. The user is trying to track down their problem. Advertising an irrelevant (to their problem) distinction at that time of crisis is simply spam. TypeError: ord() requires an 8-bit string or a Unicode string. On the other hand, you'd be surprised to discover all the things you can pass to chr(): it's not just ints. Long ints are also accepted, by design, and due to an obscure bug in the Python internals, you can also pass floats, which get truncated to ints.
The next phases would be converting all messages back to "string" and then convert all strings to Unicode ;-)
Then we'll save a lot of work by skipping the need for the first half of that -- unless you're volunteering to do all of it <wink>.

On Thu, Dec 21, 2000 at 02:44:19AM -0500, Tim Peters wrote:
So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable. So
OK... how about this patch? Index: bltinmodule.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/bltinmodule.c,v retrieving revision 2.185 diff -u -r2.185 bltinmodule.c --- bltinmodule.c 2000/12/20 15:07:34 2.185 +++ bltinmodule.c 2000/12/21 18:36:54 @@ -1524,13 +1524,14 @@ } } else { PyErr_Format(PyExc_TypeError, - "ord() expected string or Unicode character, " \ + "ord() expected string of length 1, but " \ "%.200s found", obj->ob_type->tp_name); return NULL; } PyErr_Format(PyExc_TypeError, - "ord() expected a character, length-%d string found", + "ord() expected a character, " + "but string of length %d found", size); return NULL; }

[Tim]
So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable.
[Andrew]
OK... how about this patch?
+1 from me. And maybe if you offer to send a royalty to Marc-Andre each time it's printed, he'll back down from wanting to use the error msgs as a billboard <wink>.

[Andrew Kuchling]
Actually, upon reflection I think it was a mistake to add all these "or Unicode" clauses to the error msgs to begin with. Python used to have only one string type, we're saying that's also a hope for the future, and in the meantime I know I'd have no trouble understanding "string" as including both 8-bit strings and Unicode strings. So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable. So "ord() expected string ..." instead of (even a repaired version of) "ord() expected string or Unicode character ..." but-i'm-not-even-motivated-enough-to-finish-this-sig-

Tim Peters wrote:
I think this has to do with understanding that there are two string types in Python 2.0 -- a novice won't notice this until she sees the error message. My understanding is similar to yours, "string" should mean "any string object" and in cases where the difference between 8-bit string and Unicode matters, these should be referred to as "8-bit string" and "Unicode string". Still, I think it is a good idea to make people aware of the possibility of passing Unicode objects to these functions, so perhaps the idea of adding both possibilies to error messages is not such a bad idea for 2.1. The next phases would be converting all messages back to "string" and then convert all strings to Unicode ;-) -- Marc-Andre Lemburg ______________________________________________________________________ Company: http://www.egenix.com/ Consulting: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/

[Tim]
[MAL]
Except that this error msg has nothing to do with how many string types there are: they didn't pass *any* flavor of string when they get this msg. At the time they pass (say) a float to ord(), that there are currently two flavors of string is more information than they need to know.
In that happy case of universal harmony, the msg above should say just "string" and leave it at that.
Still, I think it is a good idea to make people aware of the possibility of passing Unicode objects to these functions,
Me too.
so perhaps the idea of adding both possibilies to error messages is not such a bad idea for 2.1.
But not that. The user is trying to track down their problem. Advertising an irrelevant (to their problem) distinction at that time of crisis is simply spam. TypeError: ord() requires an 8-bit string or a Unicode string. On the other hand, you'd be surprised to discover all the things you can pass to chr(): it's not just ints. Long ints are also accepted, by design, and due to an obscure bug in the Python internals, you can also pass floats, which get truncated to ints.
The next phases would be converting all messages back to "string" and then convert all strings to Unicode ;-)
Then we'll save a lot of work by skipping the need for the first half of that -- unless you're volunteering to do all of it <wink>.

On Thu, Dec 21, 2000 at 02:44:19AM -0500, Tim Peters wrote:
So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable. So
OK... how about this patch? Index: bltinmodule.c =================================================================== RCS file: /cvsroot/python/python/dist/src/Python/bltinmodule.c,v retrieving revision 2.185 diff -u -r2.185 bltinmodule.c --- bltinmodule.c 2000/12/20 15:07:34 2.185 +++ bltinmodule.c 2000/12/21 18:36:54 @@ -1524,13 +1524,14 @@ } } else { PyErr_Format(PyExc_TypeError, - "ord() expected string or Unicode character, " \ + "ord() expected string of length 1, but " \ "%.200s found", obj->ob_type->tp_name); return NULL; } PyErr_Format(PyExc_TypeError, - "ord() expected a character, length-%d string found", + "ord() expected a character, " + "but string of length %d found", size); return NULL; }

[Tim]
So we should say "8-bit string" or "Unicode string" when *only* one of those is allowable.
[Andrew]
OK... how about this patch?
+1 from me. And maybe if you offer to send a royalty to Marc-Andre each time it's printed, he'll back down from wanting to use the error msgs as a billboard <wink>.
participants (3)
-
Andrew Kuchling
-
M.-A. Lemburg
-
Tim Peters