[Python-Dev] Pre-PEP: The "bytes" object
M.-A. Lemburg
mal at egenix.com
Fri Feb 17 13:03:29 CET 2006
Guido van Rossum wrote:
> On 2/15/06, Neil Schemenauer <nas at arctrix.com> wrote:
>> This could be a replacement for PEP 332. At least I hope it can
>> serve to summarize the previous discussion and help focus on the
>> currently undecided issues.
>>
>> I'm too tired to dig up the rules for assigning it a PEP number.
>> Also, there are probably silly typos, etc. Sorry.
>
> I may check it in for you, although right now it would be good if we
> had some more feedback.
>
> I noticed one behavior in your pseudo-code constructor that seems
> questionable: while in the Q&A section you explain why the encoding is
> ignored when the argument is a str instance, in fact you require an
> encoding (and one that's not "ascii") if the str instance contains any
> non-ASCII bytes. So bytes("\xff") would fail, but bytes("\xff",
> "blah") would succeed. I think that's a bit strange -- if you ignore
> the encoding, you should always ignore it. So IMO bytes("\xff") and
> bytes("\xff", "ascii") should both return the same as bytes([255]).
> Also, there's a code path where the initializer is a unicode instance
> and its encode() method is called with None as the argument. I think
> both could be fixed by setting the encoding to
> sys.getdefaultencoding() if it is None and the argument is a unicode
> instance:
>
> def bytes(initialiser=[], encoding=None):
> if isinstance(initialiser, basestring):
> if isinstance(initialiser, unicode):
> if encoding is None:
> encoding = sys.getdefaultencoding()
> initialiser = initialiser.encode(encoding)
> initialiser = [ord(c) for c in initialiser]
> elif encoding is not None:
> raise TypeError("explicit encoding invalid for non-string "
> "initialiser")
> create bytes object and fill with integers from initialiser
> return bytes object
>
> BTW, for folks who want to experiment, it's quite simple to create a
> working bytes implementation by inheriting from array.array. Here's a
> quick draft (which only takes str instance arguments):
>
> from array import array
> class bytes(array):
> def __new__(cls, data=None):
> b = array.__new__(cls, "B")
> if data is not None:
> b.fromstring(data)
> return b
> def __str__(self):
> return self.tostring()
> def __repr__(self):
> return "bytes(%s)" % repr(list(self))
> def __add__(self, other):
> if isinstance(other, array):
> return bytes(super(bytes, self).__add__(other))
> return NotImplemented
Another hint:
If you want to play around with the migration
to all Unicode in Py3k, start Python with the -U switch and
monkey-patch the builtin str to be an alias for unicode.
Ideally, the bytes type should work under both the Py3k conditions
and the Py2.x default ones.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Feb 17 2006)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::
More information about the Python-Dev
mailing list