[DB-SIG] Should Binary accept unicode string?

M.-A. Lemburg mal at egenix.com
Sat Jan 16 14:30:56 EST 2016

On 16.01.2016 16:27, Mike Bayer wrote:
> On 01/15/2016 05:14 PM, Vernon D. Cole wrote:
>> Mike:
>>   Thank you for your long explanation. I got lost somewhere in the
>> middle there, though.
>> If you are suggesting that better documentation be added to PEP-249,
>> then perhaps you could include a suggestion as to a (brief) note which
>> could be appended.
> Well, I wasn't even going that far.  I'm trying to get a handle on what
> pep-249's position is as far as portability of datatypes.   It has
> always struck me as very weak.
>> If you are suggesting that the PEP be expanded to provide a service
>> not now generally available, then perhaps you ought to start the
>> long-talked-about-but-never-tried task of writing a DBAPI level 3 PEP.
> Well if a DBAPI driver would like to accept a Python unicode object to a
> Binary() and produce bytes, some conversion is needed, and there are
> many possible conversions that could take place - there is every
> possible encoding, and at typically at least four potential candidates
> among those available.
> It's my position that the Binary() type should *not* offer to
> automatically choose such a conversion and should only accept Python
> types (or 3rd party extension types, sure) that are explicitly 1-1
> mappable to a stream of bytes without a "conversion decision" being
> made.  The type of conversion should not be guessed among a choice of
> several / hundreds within the Binary type.
> So definitely, not proposing any new service other than "disallow
> ambiguous input".

I still don't understand why you want to restrict Binary()
to perform automatic conversions on the input types.

Unicode is just one example of where you can implement such
conversion, e.g. a database module may want to automatically
convert Unicode to UTF-8. For database backends which don't
provide Unicode support, this is usually also being done
for string parameter types.

mxODBC, for example, allows setting a per connection .encoding
attribute to define which encoding to use in such cases.

But again, Unicode is just one example. Binary() may also
apply automatic conversions for other types, such as images,
numeric arrays, etc.

The DB-API standard cannot define which types to autoconvert
and which not. This is a conscious decision left to the database
module authors.

They have to make similar choices for all other parameter
types as well, e.g. whether to convert datetime values to
strings, ticks or whether to reject them.

In many cases, the database backends don't provide parameter
type information, so the database module has to decide what
to do. In other cases, the database module may get type information
from the database and then has to decide what to do with the
input parameters passed to it from Python.

Back to Binary(): What we could do is recommend to use e.g.
buffer() for Python 2 as default implementation and bytes()
for Python 3.

The fact that buffer() does accept Unicode objects in Python 2
is due to the way the buffer interface works in Python 2 (in 2000
we thought it would be a good idea to allow access to
the UCS-2 data; later on, when we added UCS-4 support, we
could not easily remove this feature anymore).

For Python 3, bytes() won't accept Unicode because the buffer
interface was changed in Python 3 to no longer expose the
binary buffer interface.

Would that make you happy ? :-)

Marc-Andre Lemburg

Professional Python Services directly from the Experts
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611

More information about the DB-SIG mailing list