Re: [Python-Dev] [Python-checkins] r87505 - in python/branches/py3k: Doc/c-api/unicode.rst Include/unicodeobject.h

On Mon, 27 Dec 2010 02:49:29 +0100, victor.stinner <python-checkins@python.org> wrote:
Author: victor.stinner Date: Mon Dec 27 02:49:29 2010 New Revision: 87505
Log: Issue #9738: document encodings of unicode functions
Modified: python/branches/py3k/Doc/c-api/unicode.rst python/branches/py3k/Include/unicodeobject.h
Modified: python/branches/py3k/Doc/c-api/unicode.rst ============================================================================== --- python/branches/py3k/Doc/c-api/unicode.rst (original) +++ python/branches/py3k/Doc/c-api/unicode.rst Mon Dec 27 02:49:29 2010 @@ -1063,7 +1063,8 @@ .. c:function:: int PyUnicode_CompareWithASCIIString(PyObject *uni, char *string)
Compare a unicode object, *uni*, with *string* and return -1, 0, 1 for less - than, equal, and greater than, respectively. + than, equal, and greater than, respectively. *string* is an ASCII-encoded + string (it is interpreted as ISO-8859-1).
Does it mean anything to say that an ASCII string is interpreted as ISO-8859-1? If it is ASCII-encoded it shouldn't have any bytes with the 8th bit set, leaving no room for interpretation. So presumably you mean it is (treated as) an ISO-8859-1 encoded string, despite the function name? -- R. David Murray www.bitdance.com

Le lundi 27 décembre 2010 à 23:13 -0500, R. David Murray a écrit :
Modified: python/branches/py3k/Doc/c-api/unicode.rst ============================================================================== --- python/branches/py3k/Doc/c-api/unicode.rst (original) +++ python/branches/py3k/Doc/c-api/unicode.rst Mon Dec 27 02:49:29 2010 @@ -1063,7 +1063,8 @@ .. c:function:: int PyUnicode_CompareWithASCIIString(PyObject *uni, char *string)
Compare a unicode object, *uni*, with *string* and return -1, 0, 1 for less - than, equal, and greater than, respectively. + than, equal, and greater than, respectively. *string* is an ASCII-encoded + string (it is interpreted as ISO-8859-1).
Does it mean anything to say that an ASCII string is interpreted as ISO-8859-1? If it is ASCII-encoded it shouldn't have any bytes with the 8th bit set, leaving no room for interpretation. So presumably you mean it is (treated as) an ISO-8859-1 encoded string, despite the function name?
Oh. Someone noticed :-) I would like to say that it is better to pass only ASCII-encoded string, but the function supports ISO-8859-1. Would it be more clear to say that the function expects ISO-8859-1 encoded string? But I don't want to patch the function. Victor

On Tue, 28 Dec 2010 10:28:51 +0100, Victor Stinner <victor.stinner@haypocalc.com> wrote:
Le lundi 27 décembre 2010 à 23:13 -0500, R. David Murray a écrit :
Modified: python/branches/py3k/Doc/c-api/unicode.rst ============================================================================== --- python/branches/py3k/Doc/c-api/unicode.rst (original) +++ python/branches/py3k/Doc/c-api/unicode.rst Mon Dec 27 02:49:29 2010 @@ -1063,7 +1063,8 @@ .. c:function:: int PyUnicode_CompareWithASCIIString(PyObject *uni, char *string)
Compare a unicode object, *uni*, with *string* and return -1, 0, 1 for less - than, equal, and greater than, respectively. + than, equal, and greater than, respectively. *string* is an ASCII-encoded + string (it is interpreted as ISO-8859-1).
Does it mean anything to say that an ASCII string is interpreted as ISO-8859-1? If it is ASCII-encoded it shouldn't have any bytes with the 8th bit set, leaving no room for interpretation. So presumably you mean it is (treated as) an ISO-8859-1 encoded string, despite the function name?
Oh. Someone noticed :-) I would like to say that it is better to pass only ASCII-encoded string, but the function supports ISO-8859-1.
Would it be more clear to say that the function expects ISO-8859-1 encoded string?
But I don't want to patch the function.
I think your first paragraph is what you should put in the docs: "it is best to pass only ASCII-encoded strings, but the function interprets the input string as ISO-8859-1 if it contains non-ASCII characters". A bit harder to compress that into an in-line comment in the code... -- R. David Murray www.bitdance.com

Le mardi 28 décembre 2010 à 12:14 -0500, R. David Murray a écrit :
I think your first paragraph is what you should put in the docs: "it is best to pass only ASCII-encoded strings, but the function interprets the input string as ISO-8859-1 if it contains non-ASCII characters".
Nice, done in r87560 Victor
participants (2)
-
R. David Murray
-
Victor Stinner