[Tutor] Testing for punctuation in a string

Sun Oct 12 23:15:46 EDT 2003

May I please jump in the thread?

I just wrote 2 wiki's, and had to sort things out, seperating sheep from goats 
just this way. And I had to do it a char at a time, and used recursion in doing 
it. You might not want to use recursion, but maybe you will...

My task was to return a 1 or a 0 for an if statement to swallow and genuflect 
at. A function looked at a word and nibbled through it char by char, using other 
functions, which were recursive, and decided if a word was a wiki word. ( a 
WikiWord is 2+ words RunTogetherLikeThis, each capitalized, and no punctuation 
within them. So 'WikiWord' is a wikiword, but 'Wiki-Word' is not. I used a lot 
of string slicing, IF statements, IN, and the string module's constants 
(string.uppercase, string.punctuation, etc).

Each function is rather simple in itself, and the total effect is a fairly 
powerful word chomper. And NO gang, this time I did NOT do any counting, which 
kept it from being a state engine in the first one I wrote.

Here is the stript I wrote using these functions. Note it is optimized for 
operation in a windows 9x environment.

BEWARE WORD WRAP! I edited this to minimize as much as possible, and inserted
a '\' wher a code line wraps around, but please be careful, make your screen as 
wide as possible, maybe use a narrow font?
------------------------------------------------------------------------------
#!C:\Python22\pythonw.exe
#
tablebgcolor="FFE0C0"
import os, os.path, sys, string, re, cgitb; cgitb.enable()
print "Content-Type: text/html\n"
#
try:
	pagename=os.environ['QUERY_STRING']	# try to get page name requested
except Exception, e:
	pagename="FrontPage"			# but default to FrontPage
#
path=os.getcwd()
path=path+'\\cgi-bin\\text'
os.chdir(path)		# make the pages dir current
if os.path.exists(pagename):			# if the page asked for exists,
	pass					# do nothing here;
else:						# BUT if it does NOT-
	f1=open(pagename,'w')			# CREATE it!
	f1.write('Please contribute to this webpage.\n')
	f1.close()
#
# header for webpage
print '<html><head><title>MiniWiki - '+ pagename +'</title>'
print '<style type="text/css">'
print 'body { margin-left: 5%; margin-right: 5%; }'
print 'A:link, A:visited,  A:active { text-decoration:none; }'
print 'A:hover { background-color: #D0D0FF; }'
print '</style></head>'
print '<body bgcolor="FFFFFF" text="000000" links="0000FF">'
print '<table width=100% bgcolor="'+tablebgcolor+'" border="0"
print 'cellpadding="10"><tr><td>'
print '<B><font size="4" color="FF0000">MiniWiki</font></B>'
print '<center><font size="6"><a href="./MWbacksearch.py?' + pagename+'">'
print string.join(re.split('([A-Z][a-z]*)',pagename)[1:-1]) + '</a></font><br>'
print '(Click for backsearch)<p></center></td></tr></table>'
f1=open(pagename,'r')
page=f1.readlines()
f1.close()
#
def isin( searchthis, forthis ):	# return a 1 or 0 to control IF statements
	value = 1 + string.find(searchthis, forthis)
	if value > 0:
		return 1
	else:
		return 0
#
#
# Now we try to hash out wiether or not a word is a wikiword...
# a WikiWord is in CamelCaps. it has a 'hump' in it.
# ThisIs a WikiWord,
# but ThisISNOT,
# ANDNEITHERISTHIS,
# NorisTHIS. Fun?
#
# these functions build wikiwords.
def buildwikilink(word):		# turns a word into a hyperlink
	word = '<a href="./MW.py?' + word + '">' + word + '</a>'
         # it's a hyperlink anchor, normal html.
	return word
#		
def potentialword(word):	#
	newword=''
	prefix=''
	suffix=''
	index=0
	while 1:			# loop processes the word 1 char at time
		if word[index] in string.ascii_letters:
			prefix, newword, suffix = mainbody(prefix, word[index:])
			return prefix, newword, suffix

		else:
			prefix=prefix+word[index]
			index=index+1
#
def mainbody(prefix, word,):
	newword=''
	suffix=''
	index=0
	for char in word:
		if char in string.ascii_letters:	
			newword=newword+char
		else:
			suffix=suffix+char	#
		index=index+1
	return prefix, newword, suffix
#
def makewikiword(word):				# combines processed link
	prefix=''
	suffix=''
	prefix, word, suffix=potentialword(word)
	return prefix + buildwikilink(word) + suffix
#
#
# these 2 words determine if a word is a wikiword.
def iswikiword(word):				# tests to see if is wikiword.
	if word:
		if word[0]in string.ascii_uppercase: # ALL start with capital.
			if (len(word)>1): 	# guard for 'A' word
				if word[1] in string.ascii_lowercase:	#
					if (len(word)>3):										# is there any more to process?
						if processword(word[2:]):									# is there another?
							return 1			# YES! it's a wikiword!
	return 0					# it's not a wikiword.
#
def processword(word):
	if word:					# it is possible to
# exaust a word and not find another capital letter
		if word[0] in string.ascii_lowercase:	# wikiwords CAN have
# several lowercase letters before the next capital, after all...
			value=processword(word[1:])		# so keep
# invoking this word until exaustion, or a capital is found
			return value
		else: 					# MIGHT be a capital!
			if word[0] in string.ascii_uppercase:
				if len(word)>1: 	
					if word[1] in string.ascii_lowercase:
						return 1
					else:
						return 0	
				else:
					return 0	
			else:
				return 0		
	else:
		return 0
#
#
#
# this group processes the raw page to convert it to html code- but not 
wikiwords or links.
index=0
for line in page:			# process the line for substrings
	line=string.rstrip(line)			# kill EOL stuff
	line=string.replace(line,'<','&lt;')		# kills html tag opener
	line=string.replace(line,'>','&gt;')		# kills html tag closer
	line=string.replace(line,'----','<hr>')		# create standard <hr>
	line=string.replace(line,'@!','<center>')	# open centering
	line=string.replace(line,'!@','</center>')	# close centering
	line=string.replace(line,"```","<b>")		# open BOLD
	line=string.replace(line,"'''","</b>")		# close BOLD
	line=string.replace(line,"``","<i>")		# open ITALIC
	line=string.replace(line,"''","</i>")		# Close Italic
	line=string.replace(line,"{{{","<pre>")		# open PREFORMATTED TEXT
	line=string.replace(line,"}}}","</pre>\n")	# close PREFORMATTED
	line=string.replace(line,'<br>','')		# remove any BR's.
	line=string.replace(line,'[=','<table border="0" cellpadding="5" \
width=100% bgcolor="'+tablebgcolor+'"><tr><td><B><big>') # create header bar
	line=string.replace(line,'=]','</big></b></td></tr></table>\n')	# end
	# header bar
	line=string.replace(line,'{{{','<pre>')	# start <pre> zone
	line=string.replace(line,'}}}','<pre>')	# END a <pre> zone with </pre>
	line=string.replace(line,'#! ','<img src="../images/')	# start img tag
	line=string.replace(line,' !#','"><br>\n')	# end an image tag
	line=string.replace(line,'-!','<small>')	# insert small tag
	line=string.replace(line,'!-','</small>')	# insert end small tag
	line=string.replace(line,'+!','<big>')	# insert <bigh> tag
	line=string.replace(line,'!+','</big>')	# insert END of BIG state
	if line == "":					# null line> with <P>
		line='<p>\n'
	page[index]=line				# save resulting line
	index=index+1					# and increase pointer.
#
# wordscanner routine
linecounter=0		# reset the linepointer we will use it again...
for line in page:					# see?
	wordcounter=0					# start the word pointer over at 0 again
	wordlist = string.split(line,' ') # split the line into a list of words.
	for word in wordlist:
		if ((isin(word,'http://')) or (isin(word,'mailto:'))):
		#if link:
			if isin(word,'"'):
			# DO NOT process a "http- it's imbedded code!
				pass
				# DO NOT process; leave the word alone!
			else:
				# otherwise, make a hyperlink for them to click.
				wordlist[wordcounter]='<a href="' + word + '">'\ + word + '</a>'
		else:
			if iswikiword(word):
				wordlist[wordcounter]=makewikiword(word)
			else:
				pass
		wordcounter=wordcounter+1
	line=string.join(wordlist,' ')
	page[linecounter]=line
	linecounter=linecounter+1
#
# print out the final highly modified page.
for line in page:
	print line
#
# Page footer follows
print '<p><table border="0" width=100% cellpadding="10" bgcolor="'+tablebgcolor+'">'
print '<tr>'
print '<td width=30% ><a href="./MWed1.py?'+pagename+'" >Edit this page</a></td>'
print '<td align="center" width=40% ><form method="GET" 
action="./MWbacksearch.py?">'
print '<input type="text" size="20" maxlength="24" name="">&nbsp;<input 
type=submit value="WORDSEARCH">'
print '</form></td>'
print '<td align="right" width=30% ><a href="./MWlistall.py">LIST ALL 
PAGES</a></td></tr>'
print '<tr><td><a href="http://www.tinylist.org/">MiniWiki V:1.3.0<br>&copy;2003 
Kirk D Bailey</a></td>'
print '<td>&nbsp;</td>'
print '<td align="right"><a 
href="./MW.py?FrontPage">FrontPage</a></td></tr></table>'
print '</body></html>'
------------------------------------------------------------------------------
BEWARE WORD WRAP!

That is the complete wiki browser engine. If you want the entire suite of 
several scripts I am using, I will email it to you off list- email me and ask 
for it.

Anna Ravenscroft wrote:

> On Sunday 12 October 2003 06:52 pm, Alan Gauld wrote:
> 
>>>If s is my list of fieldnames (s='fld1,fld2,fld-bad'), I'm not sure
> 
> 
> Make sure your list of fieldnames has quotes around each item, not just the 
> beginning and end of the list... 
> 
> 
>>what to do next.  I would probably split the string & test it a character
>>at a time, but I am hoping for something better.  Thanks,
> 
> 
> Ouch. Sounds long and unpleasant.
> 
> I like the nifty new sets module (available in 2.3). You can use a list 
> comprehension (as Alan suggested) but if you have more than one type of "bad" 
> punctuation in a particular fieldname, you'll get duplicates. If you use the 
> new sets module, you can eliminate the duplicates. 
> 
> 
>>>>import sets
>>>>fieldnames=['fld1','fld2','fld-bad', 'fld-ba_d']
>>>>bad = ['-'. '_']
>>>>badfld = [f for f in fieldnames for b in bad if b in f]
>>>>print badfld                            # will print any duplicates
> 
> ['fld-bad', 'fld-ba_d', 'fld-ba_d']      
> 
>>>>setofbad = sets.Set(badfld)    # removes the duplicates
>>>>print setofbad
> 
> Set(['fld-ba_d', 'fld-bad'])
> 
> 
> Hope this gives you some ideas... Have fun.
> 
> Anna

-- 

-- 

end

Cheers!
         Kirk D Bailey

  +                              think                                +
   http://www.howlermonkey.net  +-----+        http://www.tinylist.org
   http://www.listville.net     | BOX |  http://www.sacredelectron.org
   Thou art free"-ERIS          +-----+     'Got a light?'-Prometheus
  +                              kniht                                +

Fnord.