[Tutor] my newbie program

Magnus Lycka magnus@thinkware.se
Wed Dec 11 07:41:23 2002


At 22:38 2002-12-10 -0600, david wrote:
>i was messing with the code below and i
>took the underscores out because i didn't know what
>they were for and i didn't like them.

That sounds a bit like those Unix sysadmin horror stories:
"I never used those files in /dev/ and I didn't know what
they were for, so I erased them... ;)

But experimenting, testing and changing things is a good way
to learn. This change DOES make a difference.

Like it or not, underscores are significant in python, and used
for particular reasons.

I.

Leading *and* trailing underscores are used by "magic" names
that have a special meaning for the python interpreter. Never
use them for other things. Example:

 >>> class X:
...     def __init__(self, **kwargs):
...             self.__dict__ = kwargs
...     def __str__(self):
...             return "\n".join(["%s = %s" % (key, getattr(self, key))
...                                     for key in dir(self)])
...
 >>> x = X(a='Hello', b=2, c=3)
 >>> x.d = 99
 >>> print x
__doc__ = None
__init__ = <bound method X.__init__ of <__main__.X instance at 0x022F1818>>
__module__ = __main__
__str__ = <bound method X.__str__ of <__main__.X instance at 0x022F1818>>
a = Hello
b = 2
c = 3
d = 99

If you remove the underscores in __init__, __dict__ or __str__,
things won't work the same at all.

__init__ is the magic name for the method that is invoked when we
instanciate a class. __dict__ is the dictionary that keeps track of
the attributes in an instance object. __str__ is the method that is
called when we use the str() function or print.

II.

Leading (but not trailing) double underscores are used for private
attributes in classes. These are variables that should only be changed
from within the class. The idea is that someone who uses a class should
only be dependent on it's public interface, not on it's implementation,
so that class implementations can be changed if needed, without breaking
the programs that use them. This "data hiding" is not absolute in Python.
The name is "mangled" and you can access it if you really want, but then
you know that you are breaking the rules and have to take the consequences.
This ability to break the rules is helpful in debugging etc, but should
not be used in production code.

 >>> class Person:
...     def __init__(self, fname, lname):
...             self.__fname = fname
...             self.__lname = lname
...     def __str__(self):
...             return "My name is %s %s and you can't change that" % (
...                         self.__fname, self.__lname)
...
 >>> p = Person('Brian', 'Cohen')
 >>> print p
My name is Brian Cohen and you can't change that
 >>> p.__lname
Traceback (most recent call last):
   File "<interactive input>", line 1, in ?
AttributeError: Person instance has no attribute '__lname'
 >>> p.__fname = 'Ron'
 >>> print p
My name is Brian Cohen and you can't change that
 >>> dir(p)
['_Person__fname', '_Person__lname', '__doc__', '__fname', '__init__', 
'__module__', '__str__']
 >>> print p.__fname
Ron
 >>> p._Person__lname = 'Parrot'
 >>> print p
My name is Brian Parrot and you can't change that

Well, I could obviously change that, and those kinds of manipulations
are useful in testing and debugging code. Of course, you _can_ abuse it,
but there is no way of defending programs from stupid programmers
anyway.

(Note that "p.__fname = 'Ron'" worked though. But note that that
attribute can't be reached from within the class without fishing
it out of self.__dict__. self.__fname will always find the key
'_Person__fname' in self.__dict__. The "outside __fname" and
"inside __fname" are different variables.)

III.

"from <module> import *" won't import names starting with
underscore.

 >>> print file('test.py').read()
public = 1
_protected = 2
__private = 3
__magic__ = 4
ordinary_ = 5

 >>> dir()
['__builtins__', '__doc__', '__name__']
 >>> from test import *
 >>> dir()
['__builtins__', '__doc__', '__name__', 'ordinary_', 'public']
 >>> # Only "public" and "ordinary_" were imported from test.
 >>> import test
 >>> dir(test)
['__builtins__', '__doc__', '__file__', '__magic__', '__name__',
  '__private', '_protected', 'ordinary_', 'public']

This means that you will never get your variables that start with _
overwritten by names that you import with "from <module> import *".
I still think "from <module> import *" is a bad idea most of the
time anyway...

Note that these leading underscores can be used to stop any kind
of object from being imported with "from <module> import *". It might
be classes, functions or other modules etc.

IV.

Many python programmers (me included) use the convention to denote
protected attributes in classes with one leading underscore. This is
not enforced by the interpreter (no name mangling), but I know that I
should normally never use "x._protected", only "self._protected".

The difference between private and protected attributes in classes
(as defined in C++) is that private attributes can only be accessed
in the class where they were defined. Protected attributes can also
be used in subclasses.

The concept is meaningless unless you understand inheritance. And if
you do, you might be of the opinion that derived classes should use
the public interface...

 >>> class Person:
...     def __init__(self, fname, lname):
...             self._fname = fname
...             self._lname = lname
...
 >>> class Customer(Person):
...     def __init__(self, fname, lname, custNo):
...             Person.__init__(self, fname, lname)
...             self.__custNo = custNo
...     def remind(self, date):
...             return "Dear %s %s. As of %s you should have..." % (
...                         self._fname, self._lname, date)
...
 >>> c = Customer('Bill', 'Gates', 123)
 >>> print c.remind('2002-12-05')
Dear Bill Gates. As of 2002-12-05 you should have...
 >>> c._fname = "William" # Legal Python but not following Magnus' coding 
standards
 >>> print c.remind('2002-12-05')
Dear William Gates. As of 2002-12-05 you should have...

V.

A trailing underscore, is sometimes used when we want to use a variable
name that we can't use because it's a reserved word, or it's used by a
standard type or function that we don't want to hide.

 >>> from, to = '2002-12-10', '2002-12-24'
Traceback (  File "<interactive input>", line 1
     from, to = '2002-12-10', '2002-12-24'
         ^
SyntaxError: invalid syntax
 >>> from_, to = '2002-12-10', '2002-12-24'

 >>> list_ = [1,2,3,4] # No problem
 >>> list('qwe')
['q', 'w', 'e']
 >>> list = [1,2,3,4] # This is not a good idea...
 >>> list('qwe')      # ...since this won't work.
Traceback (most recent call last):
   File "<interactive input>", line 1, in ?
TypeError: 'list' object is not callable



-- 
Magnus Lycka, Thinkware AB
Alvans vag 99, SE-907 50 UMEA, SWEDEN
phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
http://www.thinkware.se/  mailto:magnus@thinkware.se