Need better string methods

Robert Brewer fumanchu at amor.org
Sat Mar 6 15:22:14 EST 2004


David MacQuigg wrote:
> I'm considering Python as a replacement for the highly specialized
> scripting languages used in the electronics design industry.
>8
> The resistance will come from people who throw at us little bits and
> pieces of code that can be done more easily in their chosen CPL.
> String processing, for example, is one area where we may face some
> difficulty.  Here is a typical line of garbage from a statefile
> revision control system (simplified to eliminate some items that pose
> no new challenges):
> 
> line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"
> 
> The problem is to break this into its component parts, and eliminate
> spaces and other gradoo.  The cleaned-up list should look like:
> 
> ['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']
> 
> # Current best Python:
> clean = [' '.join(t.split()).strip('.') for t in line.split('|')]
> 
> This is too much to expect of a non-programmer, even one who
> undestands the methods.  The usability problems are 1) the three
> variations in syntax ( methods, a list comprehension, and what *looks
> like* a join function prefixed by some odd punctuation), and 2) The
> order in which each step is entered at the keyboard.  ( I can show
> this in step-by-step detail if anyone doesn't understand what I mean.)
> 3) Proper placement of parens can be confusing.
> 
> What we need is a syntax that flows in the same order you have to
> think about the problem, stopping at each step to visualize an
> intermendiate result, then typing the next operation, not mousing back
> to insert a function or the start of a comprehension, and not screwing
> up the parentheses. ( My inititial version had the closing paren of
> the join method *after* the following strip, which lucky-for-me popped
> an attribute error ... not-so-lucky could work OK on this example, but
> mess up in subtle ways on future data. )
> 
> # Subclassing a list:
> clean = [MyList(t.split()).join().strip('.') for t in line.split('|')]
> 
> The MyList.join method works as expected.  I havent' figured out yet
> how to add a map method to MyList, but already I can guess this is not
> leading to a clean syntax.  Having to insert 'MyList' everywhere is as
> bad as the original syntax.  Maybe someone can help me with the
> Python.  I would love it if there was a simple solution not requiring
> changes to Python.
> 
> # Possible future Python:
> # clean = line.split('|').map().split().join().strip('.')
> 
> The map method takes a list in the "front door" and feeds items from
> the list one-at-a-time to the method waiting at its "back door".  The
> join method expects a list of strings at its front door and delivers a
> single string at its back door.  If something other than a space is
> needed to join the strings, that can be provided via the (side-door)
> of the join method.

The answer depends quite a bit on the deployment environment. If you
have a limited (knowable) set of commands which you wish to chain in
this manner, you can simply write each one and package them up into a
"listtools" package (or "mycompanytools" if they end up being more
diverse in the problems they solve). Usage would then be, for example:

>>> import listtools
>>> line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"
>>> listtools.List(line).split("|").squeeze().strip(".")
['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']

...the idea being that, once you have "line" wrapped in a listtools.List
object, that object can have whatever methods you see fit. In this case,
I'd envision most of the methods above returning another listtools.List
(this is the actual module I used to produce the above):


class List(list):
    def __init__(self, value=[]):
        if isinstance(value, (tuple, list)):
            list.__init__(self, value)
        else:
            list.__init__(self, [value])
    
    def split(self, separator):
        product = List()
        for atom in self:
            product.extend(atom.split(separator))
        return product
    
    def squeeze(self):
        product = List()
        for atom in self:
            atom = ' '.join(atom.split())
            product.append(atom)
        return product
    
    def strip(self, chars=None):
        product = List()
        for atom in self:
            product.append(atom.strip(chars))
        return product


Hope that helps!


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list