[Tutor] Re: Testing for punctuation in a string

Michael Janssen Janssen at rz.uni-frankfurt.de
Mon Oct 13 11:31:02 EDT 2003


On Mon, 13 Oct 2003, Greg Brunet wrote:

> I ended up using regex, thought that might be a bit heavyweight for what
> I needed:
>
> p=re.compile('^[a-zA-Z]\w*$')
> def fldNameValid(fldName):
>     return p.match(fldName) != None

Hello Greg Brunet,


in case your goal is to "test for punctuation in a string" you
should do that in the most direct way:

[untested]
_punctuation = string.punctuation.replace("_","").replace(",","")
aPattern = "[%s]" % _punctuation
mt = re.search(aPattern, aString)
if mt:
    # do whatever you want, including:
    raise Exception, \
      "fldName '%s' contains a '%s' on position %s" \
      % (fldName, mg.group(), mt.start() )

regular expression is the thing to test for the presence or absence of
patterns. The other way (iterating through the string and look if each
element is in a sequence of punctuation chars:

for s in aString:
    if s in _punctuation:
        raise

) is heavyweight in terms of performance: Python has to perform
len(aString) "s in _punctuation" lookups against one single
regexp-operation with the former example (OTOH, it is believed that
only testing the actual runtime of both solutions truly reveals their
better or worse performance).


When I may take a look at your latest approach (Testing legality by
definig a pattern the Field Name must match):

> p=re.compile('^[a-zA-Z]\w*$')
> def fldNameValid(fldName):
>     return p.match(fldName) != None

Doing things the other way around as in ones own goal-definition
(supposed I can take the thread's subject as such a thing ;-) might
be a clever choice in case it becomes clear that it's better than the
"former way around". Nevertheless it might introduce some unforeseen
results: Now, your pattern rejects whitespace and commas; Field Names
starting with numbers or an underscore will be rejected. All this might
be perfectly what you needs but it doesn't do the "Testing for
punctuation in a string" job any longer.

Beside this (and given that the additional restrictions for valid Field
Names are what you need), you're solution will do the job, and you won't
feel much of a performance impact as long as you don't want to parse a
serveral kB's commandline ;-)


Michael




More information about the Tutor mailing list