[Tutor] creating variable names from string slices
Jeff Shannon
jeff@ccvcorp.com
Wed Apr 2 13:56:11 2003
Bruce Dykes wrote:
>>First, we get the imported data into a list of lines:
>>
>>
>
>ayup...I'm using:
>
>records=string.split(telnet.readuntil(prompt),'\n')
>
>but I'll also need to do read from a file....
>
Which is easy enough to do with readlines() / xreadlines(); the
important point here being that it doesn't really matter where your data
is coming from if you can get it into a consistently-formatted list of
lines.
>>Now, we write a function that deals with a single line. Since we know
>>the structure of the line, we can take advantage of that fact.
>>[...]
>>
>As I said, I need to slice data from the string, since it's columnar, but we
>can get around that simply enough:
>
> >>> def processline(line):
> ... fields = ['haircolor', 'role']
> ... name = line.split()[0]
> ... line = [line[5:10],line[15:20]]
> ... fielddata = zip(fields, line)
> ... return name, dict(fielddata)
> ...
> >>>
>
>yes?
>
Okay, having seen later that some 'cells' in the table may be empty,
then yes, you'll need to use slicing. You can actually make it a little
easier to maintain, though, by making that list of fieldnames into a
dictionary of fieldnames and start/end points.
>>> fields = {
... 'name': (0,10),
... 'haircolor': (15,25),
... 'role': (30,40)
... }
>>> def processline(line):
... item = {}
... for key, val in fields.items():
... item[key] = line[val[0]:val[1]]
... return item
...
>>>
Note that I've just made 'name' a field like any other, instead of
special-casing it as before. You *could* still special-case it so that
it doesn't go in the item dictionary, or you could leave it in the
dictionary but also return it separately by changing the function's last
line to 'return item["name"], item'.
>>Now all we have to do is run each line through this function, and store
>>the results.
>>
>> >>> data = {}
>> >>> for line in lines:
>>... name, info = processline(line)
>>... data[name] = info
>>...
>>
>>
>
>But we can combine this with above for:
>
> >>> def processline(line):
> ... fields = ['haircolor', 'role']
> ... name = line.split()[0]
> ... line = [line[5:10],line[15:20]]
> ... fielddata = zip(fields, line)
> ... record = {}
> ... record[name] = fielddata
> ... return record
> ...
> >>>
>
>And that should return our dictionary:
>{'Betty': {'haircolor': 'Blond', 'role': 'Stude'}},
>
>yes?
>
Well, we *could*, but then we'd have a separate single-entry dictionary
for each line, and we'd need another list or dictionary to store all of
those dictionaries... I don't see the advantage to that.
>>Depending on how you're using this data, it might also be practical to
>>define a class, and make each line into an instance of that class, with
>>attributes named 'haircolor' and 'role'.
>>
>>
>
>Well, let's not get too crazy here! <g>
>[...]
>So the question is, when do you decide to make a class, and when do you
>stick with Python's native dictionary tools?
>
I'd start thinking about making a class when I can see obvious methods
to attach to it -- if it's simply a matter of storing data, a dictionary
is suitable, but once I want to add nontrivial manipulations of that
data, then it becomes easier to use a class than to use a dictionary
plus a set of functions. (Indeed, a class *is* a dictionary plus a set
of functions, but conveniently packaged up together in a helpful way.)
From your problem description, you'll probably want a function that
compares the current item with a previous item, returning some
information about any changes (possibly just a boolean "yes there's a
change", possibly detailed information on *what* changed and how).
You'll want another function (or possibly several) that takes
appropriate action based on the results of the first function.
If a function depends on detailed knowledge of the structure of complex
data that's passed into it, then it's a good candidate to convert to a
class method. Your comparison function will definitely require such
knowledge, and your action function(s) may or may not require it. The
more this information is needed, the stronger the case for packaging the
lot of it up into a class, so that all this information is localized in
one place. Note that, if you're using a class, then the information
about how to parse a line of data is *also* localized in the class,
instead of in an isolated function like processline(), and your set of
fields (and start/stop points) can be made a class attribute as well.
By the way, you can still use the process I showed above for slicing a
line to create attributes in a class, by using the setattr() function:
class Girl:
fields = {
'name': (0,10),
'haircolor': (15,25),
'role': (30,40)
}
def __init__(self, line):
for key, val in self.fields.items():
setattr(self, key, line[val[0]:val[1]])
This will also give you the flexibility of creating subclasses of Girl
that use a different dictionary of fields, in case you have different
data sources that format your columns slightly differently. But since
they're all Girl instances, you can use the same comparison and action
methods.
>And the answer is, when we need the additional abstraction of handling
>objects and flexible access to attributes is necessary?
>
>
Pretty much, yes. But once you have a basic grasp of the concept of
OOP, then the cost of creating a class is pretty low, so the point at
which you start to benefit by having a class comes pretty easily.
Jeff Shannon
Technician/Programmer
Credit International