Need better string methods

David MacQuigg dmq at gain.com
Sat Mar 6 20:01:16 CET 2004


I'm considering Python as a replacement for the highly specialized
scripting languages used in the electronics design industry.  Design
engineers are typically not programmers, and they avoid working with
these complex proprietary languages, preferring instead to use GUI
tools that are poorly implemented and very limited in the problems
they can solve.

I am convinced that Python can do anything that can be done by these
CPL's, but I know it will be an uphill battle getting design engineers
to learn yet another scripting language. The pitch will be 1) What you
need to solve most of your design problems can be learned in two days.
Then you can decide if you want to learn the full language. 2) Learn
this one and you will have a language applicable to not just
controlling one company's EDA tools, but almost any scripting or
computational problem you may encounter.  3) Python may well be the
ultimate computer language for non-programmer technical professionals.
You won't have to learn another in the future.

The resistance will come from people who throw at us little bits and
pieces of code that can be done more easily in their chosen CPL.
String processing, for example, is one area where we may face some
difficulty.  Here is a typical line of garbage from a statefile
revision control system (simplified to eliminate some items that pose
no new challenges):

line = "..../bgref/stats.stf| SPICE | 3.2.7  | John    Anderson  \n"

The problem is to break this into its component parts, and eliminate
spaces and other gradoo.  The cleaned-up list should look like:

['/bgref/stats.stf', 'SPICE', '3.2.7', 'John Anderson']

# Ruby:
# clean = line.chomp.strip('.').squeeze.split(/\s*\|\s*/)

This is pretty straight-forward once you know what each of the methods
do.

# Current best Python:
clean = [' '.join(t.split()).strip('.') for t in line.split('|')]

This is too much to expect of a non-programmer, even one who
undestands the methods.  The usability problems are 1) the three
variations in syntax ( methods, a list comprehension, and what *looks
like* a join function prefixed by some odd punctuation), and 2) The
order in which each step is entered at the keyboard.  ( I can show
this in step-by-step detail if anyone doesn't understand what I mean.)
3) Proper placement of parens can be confusing.

What we need is a syntax that flows in the same order you have to
think about the problem, stopping at each step to visualize an
intermendiate result, then typing the next operation, not mousing back
to insert a function or the start of a comprehension, and not screwing
up the parentheses. ( My inititial version had the closing paren of
the join method *after* the following strip, which lucky-for-me popped
an attribute error ... not-so-lucky could work OK on this example, but
mess up in subtle ways on future data. )

# Subclassing a list:
clean = [MyList(t.split()).join().strip('.') for t in line.split('|')]

The MyList.join method works as expected.  I havent' figured out yet
how to add a map method to MyList, but already I can guess this is not
leading to a clean syntax.  Having to insert 'MyList' everywhere is as
bad as the original syntax.  Maybe someone can help me with the
Python.  I would love it if there was a simple solution not requiring
changes to Python.

# Possible future Python:
# clean = line.split('|').map().split().join().strip('.')

The map method takes a list in the "front door" and feeds items from
the list one-at-a-time to the method waiting at its "back door".  The
join method expects a list of strings at its front door and delivers a
single string at its back door.  If something other than a space is
needed to join the strings, that can be provided via the (side-door)
of the join method.

-- Dave




More information about the Python-list mailing list