[Tutor] use of __new__
Steven D'Aprano
steve at pearwood.info
Fri Mar 12 01:53:16 CET 2010
On Fri, 12 Mar 2010 06:03:35 am spir wrote:
> Hello,
>
> I need a custom unicode subtype (with additional methods). This will
> not be directly used by the user, instead it is just for internal
> purpose. I would like the type to be able to cope with either a byte
> str or a unicode str as argument. In the first case, it needs to be
> first decoded. I cannot do it in __init__ because unicode will first
> try to decode it as ascii, which fails in the general case.
Are you aware that you can pass an explicit encoding to unicode?
>>> print unicode('cdef', 'utf-16')
摣晥
>>> help(unicode)
Help on class unicode in module __builtin__:
class unicode(basestring)
| unicode(string [, encoding[, errors]]) -> object
> So, I
> must have my own __new__. The issue is the object (self) is then a
> unicode one instead of my own type.
>
> class Unicode(unicode):
> Unicode.FORMAT = "utf8"
> def __new__(self, text, format=None):
> # text can be str or unicode
> format = Unicode.FORMAT if format is None else format
> if isinstance(text,str):
> text = text.decode(format)
> return text
> .......
>
> x = Unicode("abc") # --> unicode, not Unicode
That's because you return a unicode object :) Python doesn't magically
convert the result of __new__ into your class, in fact Python
specifically allows __new__ to return something else. That's fairly
unusual, but it does come in handy.
"format" is not a good name to use. The accepted term is "encoding". You
should also try to match the function signature of the built-in unicode
object, which includes unicode() -> u''.
Writing Unicode.FORMAT in the definition of Unicode can't work:
>>> class Unicode(unicode):
... Unicode.FORMAT = 'abc'
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in Unicode
NameError: name 'Unicode' is not defined
So it looks like you've posted something slightly different from what
you are actually running.
I have tried to match the behaviour of the built-in unicode as close as
I am able. See here:
http://docs.python.org/library/functions.html#unicode
class Unicode(unicode):
"""Unicode(string [, encoding[, errors]]) -> object
Special Unicode class that has all sorts of wonderful
methods missing from the built-in unicode class.
"""
_ENCODING = "utf8"
_ERRORS = "strict"
def __new__(cls, string='', encoding=None, errors=None):
# If either encodings or errors is specified, then always
# attempt decoding of the first argument.
if (encoding, errors) != (None, None):
if encoding is None: encoding = cls._ENCODING
if errors is None: errors = cls._ERRORS
obj = super(Unicode, cls).__new__(
Unicode, string, encoding, errors)
else: # Never attempt decoding.
obj = super(Unicode, cls).__new__(Unicode, string)
assert isinstance(obj, Unicode)
return obj
>>> Unicode()
u''
>>> Unicode('abc')
u'abc'
>>> Unicode('cdef', 'utf-16')
u'\u6463\u6665'
>>> Unicode(u'abcd')
u'abcd'
--
Steven D'Aprano
More information about the Tutor
mailing list