Regular expression - dot problem!

李政 fzhenglee23 at yahoo.com.cn
Thu Jun 8 00:15:16 CEST 2006



Fredrik Lundh <fredrik at pythonware.com> wrote:    李政 wrote:

> I've a problem with regular express(dot problem). I checked Python 
> Library Reference, but i can't find any infomation that is useful.

like what a dot means in a regular expression? you really need to work
on your google fu ;-)

in the meantime, look under "The special characters are" on this page:

  http://docs.python.org/lib/re-syntax.html

  May be my bad writing english confused you. I know what a dot means in a regular expression. In the case you are forced to use regular expression in the way:
        patter = 'www.'
      if re.compile(pattern).match(string) is not None:
          ......
   
  but not:
   
      if re.compile(r'www.').match(string) is not None:           
  or
      if re.compile('www\.').match(string) is not None: 
   
  , how you process special characters, like dot.
  
> * if re.compile(pattern).match(urldomain) is not None:*
> return INTERNAL_LINK # match. url is internal link


if you want to check if the url starts with a given prefix, use

if url.startswith(prefix):
    Your suggestion is really helpful. I use both startswith(prefix) and endswith(suffix) in my program, and it works better. Here is the new one:
  =====================================================
  def getLinkType(url, sitedomain):
    # get the domain which 'url' belongs to
    urldomain = urlparse4esa(url)[1]
    
    tmpsd = ''
    if sitedomain.startswith('www'):
        tmpsd = sitedomain[4:]
    
    if urldomain.endswith(tmpsd):
        return INTERNAL_LINK    # match. url is internal link
    else:
        return EXTERNAL_LINK    # doesn't match. url is external link
  =====================================================

  Thks for your help!
   
  Alex, China

 __________________________________________________
赶快注册雅虎超大容量免费邮箱?
http://cn.mail.yahoo.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20060608/bf89dd75/attachment.html>


More information about the Python-list mailing list