buffer('abc') == 'abc' is False ?!
I was wondering whether this is an oversight or intended. Buffer objects can certainly be compared to strings on a byte-by-byte basis, so the compare result looks like a (long standing) bug to me. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
I was wondering whether this is an oversight or intended. Buffer objects can certainly be compared to strings on a byte-by-byte basis, so the compare result looks like a (long standing) bug to me.
I'd consider it a feature, designed to convey the subliminal message "the buffer type should be deprecated". :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
I was wondering whether this is an oversight or intended. Buffer objects can certainly be compared to strings on a byte-by-byte basis, so the compare result looks like a (long standing) bug to me.
I'd consider it a feature, designed to convey the subliminal message "the buffer type should be deprecated". :-)
Fine, but what alternative is there which meets the following requirements: * signals "this data is binary data" * compares just fine to strings * gets accepted by all APIs which use the buffer interface to access the data * has a C API which can be used in extensions * is available in Python 2.1.x and up (other than rolling my own mxBinary type...) What I'd basically need is a type which simply wraps up any string data, plays nice with 8-bit strings and signals the binary nature of the content. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
"M.-A. Lemburg"
* signals "this data is binary data" * compares just fine to strings * gets accepted by all APIs which use the buffer interface to access the data * has a C API which can be used in extensions * is available in Python 2.1.x and up
I believe the string type meets all these requirements. Regards, Martin
Martin v. Loewis wrote:
"M.-A. Lemburg"
writes: * signals "this data is binary data" * compares just fine to strings * gets accepted by all APIs which use the buffer interface to access the data * has a C API which can be used in extensions * is available in Python 2.1.x and up
I believe the string type meets all these requirements.
Except one which was implicit: how to tell binary data from text data. This information can sometimes be deduced from the string content provided you know what text data means to you, but this doesn't always work, since sometimes binary data happens to look like text data (ie. use only character ordinals as data bytes). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
* signals "this data is binary data" * compares just fine to strings * gets accepted by all APIs which use the buffer interface to access the data * has a C API which can be used in extensions * is available in Python 2.1.x and up
I believe the string type meets all these requirements.
Except one which was implicit: how to tell binary data from text data. This information can sometimes be deduced from the string content provided you know what text data means to you, but this doesn't always work, since sometimes binary data happens to look like text data (ie. use only character ordinals as data bytes).
I don't understand why you need to signal "this is binary data" while at the same time you want to be able to compare to strings. Also, since buffer objects *can't* be compared to strings right now, and you require compatibility with 2.1, there is no solution that satisfies your requirements, so I conclude you're just being "difficult". :-) --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
* signals "this data is binary data" * compares just fine to strings * gets accepted by all APIs which use the buffer interface to access the data * has a C API which can be used in extensions * is available in Python 2.1.x and up
I believe the string type meets all these requirements.
Except one which was implicit: how to tell binary data from text data. This information can sometimes be deduced from the string content provided you know what text data means to you, but this doesn't always work, since sometimes binary data happens to look like text data (ie. use only character ordinals as data bytes).
I don't understand why you need to signal "this is binary data" while at the same time you want to be able to compare to strings.
Because I use buffer objects to wrap string data to say "this is binary data" to a database. When fetching the same data back from the database I return a string and I found the quirk mentioned in the subject while writing a unit test for this. It's not a showstopper. The above just was a hint not to deprecate the buffer object until we've come up with a decent replacement that's easy to adapt in existing code. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
The above just was a hint not to deprecate the buffer object until we've come up with a decent replacement that's easy to adapt in existing code.
So, apart from the Python 2.1 requirement, subclassing str does the trick, right? --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
The above just was a hint not to deprecate the buffer object until we've come up with a decent replacement that's easy to adapt in existing code.
So, apart from the Python 2.1 requirement, subclassing str does the trick, right?
Right. Would be nice if there were a standard builtin, e.g. binary(), for this and maybe some support code to go with it in C (e.g. the type object would be nice to have at C level). -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
So, apart from the Python 2.1 requirement, subclassing str does the trick, right?
Right.
Would be nice if there were a standard builtin, e.g. binary(), for this and maybe some support code to go with it in C (e.g. the type object would be nice to have at C level).
I disagree. There are a thousand different applications, and yours seems rather unusual to me. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
So, apart from the Python 2.1 requirement, subclassing str does the trick, right?
Right.
Would be nice if there were a standard builtin, e.g. binary(), for this and maybe some support code to go with it in C (e.g. the type object would be nice to have at C level).
I disagree. There are a thousand different applications, and yours seems rather unusual to me.
It's not at all unusual if you interface to databases. These offer you three choices: character data, Unicode data and binary data and each of these is handled slightly differently. We currently don't have any notion of separating character data from binary except the difference between Unicode and strings. Using Unicode for character data only and reserving strings for binary data would be nice, except that practice shows that this doesn't always work because not all tools in the chain are ready for Unicode just yet (including Python's stdlib itself). Nevermind, I'll roll my own, -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Would be nice if there were a standard builtin, e.g. binary(), for this and maybe some support code to go with it in C (e.g. the type object would be nice to have at C level).
I disagree. There are a thousand different applications, and yours seems rather unusual to me.
It's not at all unusual if you interface to databases. These offer you three choices: character data, Unicode data and binary data and each of these is handled slightly differently.
I figure that most apps will be happy to return 8-bit strings for binary data; that's what 8-bit strings were explicitly designed to support.
We currently don't have any notion of separating character data from binary except the difference between Unicode and strings. Using Unicode for character data only and reserving strings for binary data would be nice, except that practice shows that this doesn't always work because not all tools in the chain are ready for Unicode just yet (including Python's stdlib itself).
No, we'll eventually need a separate data type, but I doubt it needs to be as compatible with the current 8-bit string type as your requirements state.
Nevermind, I'll roll my own,
You're welcome. --Guido van Rossum (home page: http://www.python.org/~guido/)
--- "M.-A. Lemburg"
We currently don't have any notion of separating character data from binary except the difference between Unicode and strings. Using Unicode for character data only and reserving strings for binary data would be nice, except that practice shows that this doesn't always work because not all tools in the chain are ready for Unicode just yet (including Python's stdlib itself).
Nevermind, I'll roll my own,
I don't know if you've seen PEP 296, but I still hope to finish it and have it accepted in time for the Python 2.3 release. That doesn't meet your requirement of working with Python 2.1 however. __________________________________________________ Do you Yahoo!? Faith Hill - Exclusive Performances, Videos & More http://faith.yahoo.com
Guido van Rossum
Would be nice if there were a standard builtin, e.g. binary(), for this and maybe some support code to go with it in C (e.g. the type object would be nice to have at C level).
I disagree. There are a thousand different applications, and yours seems rather unusual to me.
I do think there should be a string type for binary data, and that the standard string type should become Unicode one day. There was past discussion about this, at which proponents suggest that there should be even binary literals. Of course, all I/O would use the binary, unless an encoding was specified when creating the stream. Regards, Martin
I do think there should be a string type for binary data, and that the standard string type should become Unicode one day. There was past discussion about this, at which proponents suggest that there should be even binary literals.
Of course, all I/O would use the binary, unless an encoding was specified when creating the stream.
Maybe it's time for a PEP outlining and detailing this view of the future? --Guido van Rossum (home page: http://www.python.org/~guido/)
participants (4)
-
Guido van Rossum
-
M.-A. Lemburg
-
martin@v.loewis.de
-
Scott Gilbert