Help me use re better

Darrell news at dorb.com
Tue Apr 24 21:54:31 EDT 2001


A variation on Alex's script that runs faster and fixes a bug.
Guess I don't get enough coding time during the day :)

--Darrell

import re, time

directors = ['Prof A B Looney','Dr C D E Ftang','Ms H I J K Biscuit Barrel']
directors *= 9999

def nospaces(matchobj):
    return " %s "%matchobj.group(0).replace(' ','')

s= '\000'.join(directors)
t1=time.time()
b2=re.sub("\s([A-Z]\s)+",nospaces,s).split("\000")
print "Len2:", len(b2), b2[:3]
print time.time()-t1

########## output
Len2: 29997 ['Prof AB Looney', 'Dr CDE Ftang', 'Ms HIJK Biscuit Barrel']
3.625

"Alex Martelli" <aleaxit at yahoo.com> wrote
> "P Browning" <glpb at eis.bris.ac.uk> wrote in message
> news:GCAAFM.7Jv at bath.ac.uk...
> > I've read AMK's RE HowTo but I think I'm missing something
> > obvious when it comes to substitutions. Any offers for
> > a more elegant solution to the program below gratefully
>
> Elegance is in the eye of the beholder, but...:
>
> > import string,re
> >
> > directors = ['Prof A B Looney','Dr C D E Ftang','Ms H I J K Biscuit
> Barrel']
> > # I want no spaces between the initials
> > # Prof AB Looney
> > # Dr CDE Ftang
> > # Ms HIJK Biscuit Barrel
> >
> > match_initials = re.compile(r'([A-Z] )+')
>
> This may not match *QUITE* what you want -- if the honorific or
> any part of the name but the last ever ends with a capital, this
> may unwontedly match it.  It may be best to stick a word-boundary
> marker before that initial:
>
> match_initials = re.compile(r'(\b[A-Z] )+')
>
>
> Anyway, now add:
>
> def nospaces(matchobj):
>     return matchobj.group(0).replace(' ','')
>
>
> > print
> > for director in directors:
> >     s = match_initials.search(director)
> >     inits = s.group(0)
> >     new_inits = string.replace(inits,' ','')
> >     new_director = string.replace(director,inits,new_inits + ' ')
> >     print director,new_director
>
> This then becomes:
>
> for director in directors:
>     new_director = match_initials.sub(nospaces, director)
>
> plus of course whatever "print"s you want.
>
>
> Alex
>
>
>





More information about the Python-list mailing list