[Tutor] creating variable names from string slices

Wed Apr 2 13:56:11 2003

Bruce Dykes wrote:

>>First, we get the imported data into a list of lines:
>>    
>>
>
>ayup...I'm using:
>
>records=string.split(telnet.readuntil(prompt),'\n')
>
>but I'll also need to do read from a file....
>

Which is easy enough to do with readlines() / xreadlines(); the 
important point here being that it doesn't really matter where your data 
is coming from if you can get it into a consistently-formatted list of 
lines.

>>Now, we write a function that deals with a single line.  Since we know
>>the structure of the line, we can take advantage of that fact.
>>[...]
>>
>As I said, I need to slice data from the string, since it's columnar, but we
>can get around that simply enough:
>
>  >>> def processline(line):
> ...     fields = ['haircolor', 'role']
> ...     name = line.split()[0]
> ...     line = [line[5:10],line[15:20]]
> ...     fielddata = zip(fields, line)
> ...     return name, dict(fielddata)
> ...
>  >>>
>
>yes?
>

Okay, having seen later that some 'cells' in the table may be empty, 
then yes, you'll need to use slicing.  You can actually make it a little 
easier to maintain, though, by making that list of fieldnames into a 
dictionary of fieldnames and start/end points.

 >>> fields = {
...     'name':      (0,10),
...     'haircolor':     (15,25),
...     'role':        (30,40)
...     }
 >>> def processline(line):
...     item = {}
...     for key, val in fields.items():
...         item[key] = line[val[0]:val[1]]
...     return item
...
 >>>

Note that I've just made 'name' a field like any other, instead of 
special-casing it as before.  You *could* still special-case it so that 
it doesn't go in the item dictionary, or you could leave it in the 
dictionary but also return it separately by changing the function's last 
line to 'return item["name"], item'.

>>Now all we have to do is run each line through this function, and store
>>the results.
>>
>> >>> data = {}
>> >>> for line in lines:
>>...     name, info = processline(line)
>>...     data[name] = info
>>...
>>    
>>
>
>But we can combine this with above for:
>
>  >>> def processline(line):
> ...     fields = ['haircolor', 'role']
> ...     name = line.split()[0]
> ...     line = [line[5:10],line[15:20]]
> ...     fielddata = zip(fields, line)
> ...     record = {}
> ...     record[name] = fielddata
> ...     return record
> ...
>  >>>
>
>And that should return our dictionary:
>{'Betty': {'haircolor': 'Blond', 'role': 'Stude'}},
>
>yes?
>

Well, we *could*, but then we'd have a separate single-entry dictionary 
for each line, and we'd need another list or dictionary to store all of 
those dictionaries...  I don't see the advantage to that.  

>>Depending on how you're using this data, it might also be practical to
>>define a class, and make each line into an instance of that class, with
>>attributes named 'haircolor' and 'role'.
>>    
>>
>
>Well, let's not get too crazy here! <g>
>[...]
>So the question is, when do you decide to make a class, and when do you
>stick with Python's native dictionary tools?
>

I'd start thinking about making a class when I can see obvious methods 
to attach to it -- if it's simply a matter of storing data, a dictionary 
is suitable, but once I want to add nontrivial manipulations of that 
data, then it becomes easier to use a class than to use a dictionary 
plus a set of functions.  (Indeed, a class *is* a dictionary plus a set 
of functions, but conveniently packaged up together in a helpful way.)

 From your problem description, you'll probably want a function that 
compares the current item with a previous item, returning some 
information about any changes (possibly just a boolean "yes there's a 
change", possibly detailed information on *what* changed and how). 
 You'll want another function (or possibly several) that takes 
appropriate action based on the results of the first function.  

If a function depends on detailed knowledge of the structure of complex 
data that's passed into it, then it's a good candidate to convert to a 
class method.  Your comparison function will definitely require such 
knowledge, and your action function(s) may or may not require it.  The 
more this information is needed, the stronger the case for packaging the 
lot of it up into a class, so that all this information is localized in 
one place.  Note that, if you're using a class, then the information 
about how to parse a line of data is *also* localized in the class, 
instead of in an isolated function like processline(), and your set of 
fields (and start/stop points) can be made a class attribute as well.

By the way, you can still use the process I showed above for slicing a 
line to create attributes in a class, by using the setattr() function:

class Girl:
    fields = {
        'name':      (0,10),
        'haircolor':     (15,25),
        'role':        (30,40)
        }
    def __init__(self, line):
        for key, val in self.fields.items():
            setattr(self, key, line[val[0]:val[1]])

This will also give you the flexibility of creating subclasses of Girl 
that use a different dictionary of fields, in case you have different 
data sources that format your columns slightly differently.  But since 
they're all Girl instances, you can use the same comparison and action 
methods.

>And the answer is, when we need the additional abstraction of handling
>objects and flexible access to attributes is necessary?
>  
>

Pretty much, yes.  But once you have a basic grasp of the concept of 
OOP, then the cost of creating a class is pretty low, so the point at 
which you start to benefit by having a class comes pretty easily.

Jeff Shannon
Technician/Programmer
Credit International