Question about concatenation error
colonel
thecamel at camelrichard.org
Wed Sep 7 12:42:27 EDT 2005
On Wed, 07 Sep 2005 16:34:25 GMT, colonel <thecamel at camelrichard.org>
wrote:
>I am new to python and I am confused as to why when I try to
>concatenate 3 strings, it isn't working properly.
>
>Here is the code:
>
>------------------------------------------------------------------------------------------
>import string
>import sys
>import re
>import urllib
>
>linkArray = []
>srcArray = []
>website = sys.argv[1]
>
>urllib.urlretrieve(website, 'getfile.txt')
>
>filename = "getfile.txt"
>input = open(filename, 'r')
>reg1 = re.compile('href=".*"')
>reg3 = re.compile('".*?"')
>reg4 = re.compile('http')
>Line = input.readline()
>
>while Line:
> searchstring1 = reg1.search(Line)
> if searchstring1:
> rawlink = searchstring1.group()
> link = reg3.search(rawlink).group()
> link2 = link.split('"')
> cleanlink = link2[1:2]
> fullink = reg4.search(str(cleanlink))
> if fullink:
> linkArray.append(cleanlink)
> else:
> cleanlink2 = str(website) + "/" + str(cleanlink)
> linkArray.append(cleanlink2)
> Line = input.readline()
>
>print linkArray
>-----------------------------------------------------------------------------------------------
>
>I get this:
>
>["http://www.slugnuts.com/['index.html']",
>"http://www.slugnuts.com/['movies.html']",
>"http://www.slugnuts.com/['ramblings.html']",
>"http://www.slugnuts.com/['sluggies.html']",
>"http://www.slugnuts.com/['movies.html']"]
>
>instead of this:
>
>["http://www.slugnuts.com/index.html]",
>"http://www.slugnuts.com/movies.html]",
>"http://www.slugnuts.com/ramblings.html]",
>"http://www.slugnuts.com/sluggies.html]",
>"http://www.slugnuts.com/movies.html]"]
>
>The concatenation isn't working the way I expected it to. I suspect
>that I am screwing up by mixing types, but I can't see where...
>
>I would appreciate any advice or pointers.
>
>Thanks.
Okay. It works if I change:
fullink = reg4.search(str(cleanlink))
if fullink:
linkArray.append(cleanlink)
else:
cleanlink2 = str(website) + "/" + str(cleanlink)
to
fullink = reg4.search(cleanlink[0])
if fullink:
linkArray.append(cleanlink[0])
else:
cleanlink2 = str(website) + "/" + cleanlink[0]
so can anyone tell me why "cleanlink" gets coverted to a list? Is it
during the slicing?
Thanks.
More information about the Python-list
mailing list