[Python-Dev] Pre-PEP: The "bytes" object

Fri Feb 17 13:03:29 CET 2006

Guido van Rossum wrote:
> On 2/15/06, Neil Schemenauer <nas at arctrix.com> wrote:
>> This could be a replacement for PEP 332.  At least I hope it can
>> serve to summarize the previous discussion and help focus on the
>> currently undecided issues.
>>
>> I'm too tired to dig up the rules for assigning it a PEP number.
>> Also, there are probably silly typos, etc.   Sorry.
> 
> I may check it in for you, although right now it would be good if we
> had some more feedback.
> 
> I noticed one behavior in your pseudo-code constructor that seems
> questionable: while in the Q&A section you explain why the encoding is
> ignored when the argument is a str instance, in fact you require an
> encoding (and one that's not "ascii") if the str instance contains any
> non-ASCII bytes. So bytes("\xff") would fail, but bytes("\xff",
> "blah") would succeed. I think that's a bit strange -- if you ignore
> the encoding, you should always ignore it. So IMO bytes("\xff") and
> bytes("\xff", "ascii") should both return the same as bytes([255]).
> Also, there's a code path where the initializer is a unicode instance
> and its encode() method is called with None as the argument. I think
> both could be fixed by setting the encoding to
> sys.getdefaultencoding() if it is None and the argument is a unicode
> instance:
> 
>     def bytes(initialiser=[], encoding=None):
>         if isinstance(initialiser, basestring):
>             if isinstance(initialiser, unicode):
>                 if encoding is None:
>                     encoding = sys.getdefaultencoding()
>                 initialiser = initialiser.encode(encoding)
>             initialiser = [ord(c) for c in initialiser]
>         elif encoding is not None:
>             raise TypeError("explicit encoding invalid for non-string "
>                             "initialiser")
>         create bytes object and fill with integers from initialiser
>         return bytes object
> 
> BTW, for folks who want to experiment, it's quite simple to create a
> working bytes implementation by inheriting from array.array. Here's a
> quick draft (which only takes str instance arguments):
> 
>     from array import array
>     class bytes(array):
>         def __new__(cls, data=None):
>             b = array.__new__(cls, "B")
>             if data is not None:
>                 b.fromstring(data)
>             return b
>         def __str__(self):
>             return self.tostring()
>         def __repr__(self):
>             return "bytes(%s)" % repr(list(self))
>         def __add__(self, other):
>             if isinstance(other, array):
>                 return bytes(super(bytes, self).__add__(other))
>             return NotImplemented

Another hint:

If you want to play around with the migration
to all Unicode in Py3k, start Python with the -U switch and
monkey-patch the builtin str to be an alias for unicode.

Ideally, the bytes type should work under both the Py3k conditions
and the Py2.x default ones.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::