<BR><BR><B><I>Fredrik Lundh <fredrik@pythonware.com></I></B> wrote: <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"> <div>李政 wrote:<BR><BR>> I've a problem with regular express(dot problem). I checked Python <BR>> Library Reference, but i can't find any infomation that is useful.<BR><BR>like what a dot means in a regular expression? you really need to work<BR>on your google fu ;-)<BR><BR>in the meantime, look under "The special characters are" on this page:<BR></div> <div><A href="http://docs.python.org/lib/re-syntax.html">http://docs.python.org/lib/re-syntax.html</A><BR></div> <div><STRONG>May be my bad writing english confused you. I know what a dot means in a regular expression. In the case you are forced to use regular expression in the way:</STRONG></div><STRONG></STRONG> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">
<div><STRONG> patter = 'www.'</STRONG></div> <div><STRONG> if re.compile(pattern).match(string) is not None:</STRONG></div> <div><STRONG> ......</STRONG></div> <div><STRONG></STRONG> </div> <div><STRONG>but not:</STRONG></div> <div><STRONG></STRONG> </div> <div><STRONG> if re.compile(r'www.').match(string) is not None: </STRONG></div> <div>or</div> <div> <STRONG>if re.compile('www\.').match(string) is not None: </STRONG></div> <div><STRONG></STRONG> </div> <div><STRONG>, how you process special characters, like dot.</STRONG></div> <div><STRONG></STRONG><BR>> * if re.compile(pattern).match(urldomain) is not None:*<BR>> return INTERNAL_LINK # match. url is internal link<BR><BR><BR>if you want to check if the url starts with a given prefix, use<BR><BR>if
url.startswith(prefix):</div></BLOCKQUOTE></BLOCKQUOTE> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid"> <div><STRONG>Your suggestion is really helpful. I use both </STRONG>startswith(prefix) <STRONG>and </STRONG>endswith(suffix) <STRONG>in my program, and it works better. Here is the new one:</STRONG></div> <div><STRONG>=====================================================</STRONG></div> <div>def getLinkType(url, sitedomain):<BR> # get the domain which 'url' belongs to<BR> urldomain = urlparse4esa(url)[1]<BR> <BR> tmpsd = ''<BR> if sitedomain.startswith('www'):<BR> tmpsd = sitedomain[4:]<BR> <BR> if urldomain.endswith(tmpsd):<BR> return INTERNAL_LINK # match. url is internal
link<BR> else:<BR> return EXTERNAL_LINK # doesn't match. url is external link</div> <div><STRONG>=====================================================</STRONG><BR></div> <div>Thks for your help!</div> <div> </div> <div>Alex, China</div></BLOCKQUOTE><p> __________________________________________________<br>赶快注册雅虎超大容量免费邮箱?<br>http://cn.mail.yahoo.com