[Tutor] Testing for punctuation in a string
Kirk Bailey
idiot1 at netzero.net
Sun Oct 12 23:15:46 EDT 2003
May I please jump in the thread?
I just wrote 2 wiki's, and had to sort things out, seperating sheep from goats
just this way. And I had to do it a char at a time, and used recursion in doing
it. You might not want to use recursion, but maybe you will...
My task was to return a 1 or a 0 for an if statement to swallow and genuflect
at. A function looked at a word and nibbled through it char by char, using other
functions, which were recursive, and decided if a word was a wiki word. ( a
WikiWord is 2+ words RunTogetherLikeThis, each capitalized, and no punctuation
within them. So 'WikiWord' is a wikiword, but 'Wiki-Word' is not. I used a lot
of string slicing, IF statements, IN, and the string module's constants
(string.uppercase, string.punctuation, etc).
Each function is rather simple in itself, and the total effect is a fairly
powerful word chomper. And NO gang, this time I did NOT do any counting, which
kept it from being a state engine in the first one I wrote.
Here is the stript I wrote using these functions. Note it is optimized for
operation in a windows 9x environment.
BEWARE WORD WRAP! I edited this to minimize as much as possible, and inserted
a '\' wher a code line wraps around, but please be careful, make your screen as
wide as possible, maybe use a narrow font?
------------------------------------------------------------------------------
#!C:\Python22\pythonw.exe
#
tablebgcolor="FFE0C0"
import os, os.path, sys, string, re, cgitb; cgitb.enable()
print "Content-Type: text/html\n"
#
try:
pagename=os.environ['QUERY_STRING'] # try to get page name requested
except Exception, e:
pagename="FrontPage" # but default to FrontPage
#
path=os.getcwd()
path=path+'\\cgi-bin\\text'
os.chdir(path) # make the pages dir current
if os.path.exists(pagename): # if the page asked for exists,
pass # do nothing here;
else: # BUT if it does NOT-
f1=open(pagename,'w') # CREATE it!
f1.write('Please contribute to this webpage.\n')
f1.close()
#
# header for webpage
print '<html><head><title>MiniWiki - '+ pagename +'</title>'
print '<style type="text/css">'
print 'body { margin-left: 5%; margin-right: 5%; }'
print 'A:link, A:visited, A:active { text-decoration:none; }'
print 'A:hover { background-color: #D0D0FF; }'
print '</style></head>'
print '<body bgcolor="FFFFFF" text="000000" links="0000FF">'
print '<table width=100% bgcolor="'+tablebgcolor+'" border="0"
print 'cellpadding="10"><tr><td>'
print '<B><font size="4" color="FF0000">MiniWiki</font></B>'
print '<center><font size="6"><a href="./MWbacksearch.py?' + pagename+'">'
print string.join(re.split('([A-Z][a-z]*)',pagename)[1:-1]) + '</a></font><br>'
print '(Click for backsearch)<p></center></td></tr></table>'
f1=open(pagename,'r')
page=f1.readlines()
f1.close()
#
def isin( searchthis, forthis ): # return a 1 or 0 to control IF statements
value = 1 + string.find(searchthis, forthis)
if value > 0:
return 1
else:
return 0
#
#
# Now we try to hash out wiether or not a word is a wikiword...
# a WikiWord is in CamelCaps. it has a 'hump' in it.
# ThisIs a WikiWord,
# but ThisISNOT,
# ANDNEITHERISTHIS,
# NorisTHIS. Fun?
#
# these functions build wikiwords.
def buildwikilink(word): # turns a word into a hyperlink
word = '<a href="./MW.py?' + word + '">' + word + '</a>'
# it's a hyperlink anchor, normal html.
return word
#
def potentialword(word): #
newword=''
prefix=''
suffix=''
index=0
while 1: # loop processes the word 1 char at time
if word[index] in string.ascii_letters:
prefix, newword, suffix = mainbody(prefix, word[index:])
return prefix, newword, suffix
else:
prefix=prefix+word[index]
index=index+1
#
def mainbody(prefix, word,):
newword=''
suffix=''
index=0
for char in word:
if char in string.ascii_letters:
newword=newword+char
else:
suffix=suffix+char #
index=index+1
return prefix, newword, suffix
#
def makewikiword(word): # combines processed link
prefix=''
suffix=''
prefix, word, suffix=potentialword(word)
return prefix + buildwikilink(word) + suffix
#
#
# these 2 words determine if a word is a wikiword.
def iswikiword(word): # tests to see if is wikiword.
if word:
if word[0]in string.ascii_uppercase: # ALL start with capital.
if (len(word)>1): # guard for 'A' word
if word[1] in string.ascii_lowercase: #
if (len(word)>3): # is there any more to process?
if processword(word[2:]): # is there another?
return 1 # YES! it's a wikiword!
return 0 # it's not a wikiword.
#
def processword(word):
if word: # it is possible to
# exaust a word and not find another capital letter
if word[0] in string.ascii_lowercase: # wikiwords CAN have
# several lowercase letters before the next capital, after all...
value=processword(word[1:]) # so keep
# invoking this word until exaustion, or a capital is found
return value
else: # MIGHT be a capital!
if word[0] in string.ascii_uppercase:
if len(word)>1:
if word[1] in string.ascii_lowercase:
return 1
else:
return 0
else:
return 0
else:
return 0
else:
return 0
#
#
#
# this group processes the raw page to convert it to html code- but not
wikiwords or links.
index=0
for line in page: # process the line for substrings
line=string.rstrip(line) # kill EOL stuff
line=string.replace(line,'<','<') # kills html tag opener
line=string.replace(line,'>','>') # kills html tag closer
line=string.replace(line,'----','<hr>') # create standard <hr>
line=string.replace(line,'@!','<center>') # open centering
line=string.replace(line,'!@','</center>') # close centering
line=string.replace(line,"```","<b>") # open BOLD
line=string.replace(line,"'''","</b>") # close BOLD
line=string.replace(line,"``","<i>") # open ITALIC
line=string.replace(line,"''","</i>") # Close Italic
line=string.replace(line,"{{{","<pre>") # open PREFORMATTED TEXT
line=string.replace(line,"}}}","</pre>\n") # close PREFORMATTED
line=string.replace(line,'<br>','') # remove any BR's.
line=string.replace(line,'[=','<table border="0" cellpadding="5" \
width=100% bgcolor="'+tablebgcolor+'"><tr><td><B><big>') # create header bar
line=string.replace(line,'=]','</big></b></td></tr></table>\n') # end
# header bar
line=string.replace(line,'{{{','<pre>') # start <pre> zone
line=string.replace(line,'}}}','<pre>') # END a <pre> zone with </pre>
line=string.replace(line,'#! ','<img src="../images/') # start img tag
line=string.replace(line,' !#','"><br>\n') # end an image tag
line=string.replace(line,'-!','<small>') # insert small tag
line=string.replace(line,'!-','</small>') # insert end small tag
line=string.replace(line,'+!','<big>') # insert <bigh> tag
line=string.replace(line,'!+','</big>') # insert END of BIG state
if line == "": # null line> with <P>
line='<p>\n'
page[index]=line # save resulting line
index=index+1 # and increase pointer.
#
# wordscanner routine
linecounter=0 # reset the linepointer we will use it again...
for line in page: # see?
wordcounter=0 # start the word pointer over at 0 again
wordlist = string.split(line,' ') # split the line into a list of words.
for word in wordlist:
if ((isin(word,'http://')) or (isin(word,'mailto:'))):
#if link:
if isin(word,'"'):
# DO NOT process a "http- it's imbedded code!
pass
# DO NOT process; leave the word alone!
else:
# otherwise, make a hyperlink for them to click.
wordlist[wordcounter]='<a href="' + word + '">'\ + word + '</a>'
else:
if iswikiword(word):
wordlist[wordcounter]=makewikiword(word)
else:
pass
wordcounter=wordcounter+1
line=string.join(wordlist,' ')
page[linecounter]=line
linecounter=linecounter+1
#
# print out the final highly modified page.
for line in page:
print line
#
# Page footer follows
print '<p><table border="0" width=100% cellpadding="10" bgcolor="'+tablebgcolor+'">'
print '<tr>'
print '<td width=30% ><a href="./MWed1.py?'+pagename+'" >Edit this page</a></td>'
print '<td align="center" width=40% ><form method="GET"
action="./MWbacksearch.py?">'
print '<input type="text" size="20" maxlength="24" name=""> <input
type=submit value="WORDSEARCH">'
print '</form></td>'
print '<td align="right" width=30% ><a href="./MWlistall.py">LIST ALL
PAGES</a></td></tr>'
print '<tr><td><a href="http://www.tinylist.org/">MiniWiki V:1.3.0<br>©2003
Kirk D Bailey</a></td>'
print '<td> </td>'
print '<td align="right"><a
href="./MW.py?FrontPage">FrontPage</a></td></tr></table>'
print '</body></html>'
------------------------------------------------------------------------------
BEWARE WORD WRAP!
That is the complete wiki browser engine. If you want the entire suite of
several scripts I am using, I will email it to you off list- email me and ask
for it.
Anna Ravenscroft wrote:
> On Sunday 12 October 2003 06:52 pm, Alan Gauld wrote:
>
>>>If s is my list of fieldnames (s='fld1,fld2,fld-bad'), I'm not sure
>
>
> Make sure your list of fieldnames has quotes around each item, not just the
> beginning and end of the list...
>
>
>>what to do next. I would probably split the string & test it a character
>>at a time, but I am hoping for something better. Thanks,
>
>
> Ouch. Sounds long and unpleasant.
>
> I like the nifty new sets module (available in 2.3). You can use a list
> comprehension (as Alan suggested) but if you have more than one type of "bad"
> punctuation in a particular fieldname, you'll get duplicates. If you use the
> new sets module, you can eliminate the duplicates.
>
>
>>>>import sets
>>>>fieldnames=['fld1','fld2','fld-bad', 'fld-ba_d']
>>>>bad = ['-'. '_']
>>>>badfld = [f for f in fieldnames for b in bad if b in f]
>>>>print badfld # will print any duplicates
>
> ['fld-bad', 'fld-ba_d', 'fld-ba_d']
>
>>>>setofbad = sets.Set(badfld) # removes the duplicates
>>>>print setofbad
>
> Set(['fld-ba_d', 'fld-bad'])
>
>
> Hope this gives you some ideas... Have fun.
>
> Anna
--
--
end
Cheers!
Kirk D Bailey
+ think +
http://www.howlermonkey.net +-----+ http://www.tinylist.org
http://www.listville.net | BOX | http://www.sacredelectron.org
Thou art free"-ERIS +-----+ 'Got a light?'-Prometheus
+ kniht +
Fnord.
More information about the Tutor
mailing list