[Tutor] question about descriptors
Peter Otten
__peter__ at web.de
Sat Nov 7 09:03:44 EST 2015
Albert-Jan Roskam wrote:
>
>
> p, li { white-space: pre-wrap; }
>
> Hi,
> First, before I forget, emails from hotmail/yahoo etc appear to end up in
> the spam folder these days, so apologies in advance if I do not appear to
> follow up to your replies. Ok, now to my question. I want to create a
> class with read-only attribute access to the columns of a .csv file. E.g.
> when a file has a column named 'a', that column should be returned as list
> by using instance.a. At first I thought I could do this with the builtin
> 'property' class, but I am not sure how. I now tried to use descriptors
> (__get__ and __set__), which are also used by ' property' (See also:
> https://docs.python.org/2/howto/descriptor.html).
>
> In the " if __name__ == '__main__'" section, [a] is supposed to be a
> shorthand for == equivalent to [b]. But it's not.I suspect it has to do
> with the way attributes are looked up. So once an attribute has been found
> in self.__dict__ aka "the usual place", the search stops, and __get__ is
> never called. But I may be wrong. I find the __getatttribute__,
> __getattr__ and __get__ distinction quite confusing. What is the best
> approach to do this? Ideally, the column values should only be retrieved
> when they are actually requested (the .csv could be big). Thanks in
> advance!
>
>
>
> import csv
> from cStringIO import StringIO
>
>
> class AttrAccess(object):
>
>
> def __init__(self, fileObj):
> self.__reader = csv.reader(fileObj, delimiter=";")
> self.__header = self.__reader.next()
> #[setattr(self, name, self.__get_column(name)) for name in
> #[self.header]
> self.a = range(10)
>
>
> @property
> def header(self):
> return self.__header
>
> def __get_column(self, name):
> return [record[self.header.index(name)] for record in
> self.__reader] # generator expression might be better here.
>
> def __get__(self, obj, objtype=type):
> print "__get__ called"
> return self.__get_column(obj)
> #return getattr(self, obj)
>
> def __set__(self, obj, val):
> raise AttributeError("Can't set attribute")
>
> if __name__ == " __main__":
> f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
> instance = AttrAccess(f)
> print instance.a # [a] does not call __get__. Looks, and finds, in
> self.__dict__?
> print instance.__get__("a") # [b] this is supposed to be equivalent
> to [a]
> instance.a = 42 # should throw AttributeError!
I think the basic misunderstandings are that
(1) the __get__() method has to be implemented by the descriptor class
(2) the descriptor instances should be attributes of the class that is
supposed to invoke __get__(). E. g.:
class C(object):
x = decriptor()
c = C()
c.x # invoke c.x.__get__(c, C) under the hood.
As a consequence you need one class per set of attributes, instantiating the
same AttrAccess for csv files with differing layouts won't work.
Here's how to do it all by yourself:
class ReadColumn(object):
def __init__(self, index):
self._index = index
def __get__(self, obj, type=None):
return obj._row[self._index]
def __set__(self, obj, value):
raise AttributeError("oops")
def first_row(instream):
reader = csv.reader(instream, delimiter=";")
class Row(object):
def __init__(self, row):
self._row = row
for i, header in enumerate(next(reader)):
setattr(Row, header, ReadColumn(i))
return Row(next(reader))
f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
row = first_row(f)
print row.a
row.a = 42
Instead of a custom descriptor you can of course use the built-in property:
for i, header in enumerate(next(reader)):
setattr(Row, header, property(lambda self, i=i: self._row[i]))
In many cases you don't care about the specifics of the row class and use
collections.namedtuple:
def rows(instream):
reader = csv.reader(instream, delimiter=";")
Row = collections.namedtuple("Row", next(reader))
return itertools.imap(Row._make, reader)
f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
row = next(rows(f))
print row.a
row.a = 42
More information about the Tutor
mailing list