Question about concatenation error
Steve Holden
steve at holdenweb.com
Wed Sep 7 13:10:54 EDT 2005
colonel wrote:
> On Wed, 07 Sep 2005 16:34:25 GMT, colonel <thecamel at camelrichard.org>
> wrote:
>
>
>>I am new to python and I am confused as to why when I try to
>>concatenate 3 strings, it isn't working properly.
>>
>>Here is the code:
>>
>>------------------------------------------------------------------------------------------
>>import string
>>import sys
>>import re
>>import urllib
>>
>>linkArray = []
>>srcArray = []
>>website = sys.argv[1]
>>
>>urllib.urlretrieve(website, 'getfile.txt')
>>
>>filename = "getfile.txt"
>>input = open(filename, 'r')
>>reg1 = re.compile('href=".*"')
>>reg3 = re.compile('".*?"')
>>reg4 = re.compile('http')
>>Line = input.readline()
>>
>>while Line:
>> searchstring1 = reg1.search(Line)
>> if searchstring1:
>> rawlink = searchstring1.group()
>> link = reg3.search(rawlink).group()
>> link2 = link.split('"')
>> cleanlink = link2[1:2]
>> fullink = reg4.search(str(cleanlink))
>> if fullink:
>> linkArray.append(cleanlink)
>> else:
>> cleanlink2 = str(website) + "/" + str(cleanlink)
>> linkArray.append(cleanlink2)
>> Line = input.readline()
>>
>>print linkArray
>>-----------------------------------------------------------------------------------------------
>>
>>I get this:
>>
>>["http://www.slugnuts.com/['index.html']",
>>"http://www.slugnuts.com/['movies.html']",
>>"http://www.slugnuts.com/['ramblings.html']",
>>"http://www.slugnuts.com/['sluggies.html']",
>>"http://www.slugnuts.com/['movies.html']"]
>>
>>instead of this:
>>
>>["http://www.slugnuts.com/index.html]",
>>"http://www.slugnuts.com/movies.html]",
>>"http://www.slugnuts.com/ramblings.html]",
>>"http://www.slugnuts.com/sluggies.html]",
>>"http://www.slugnuts.com/movies.html]"]
>>
>>The concatenation isn't working the way I expected it to. I suspect
>>that I am screwing up by mixing types, but I can't see where...
>>
>>I would appreciate any advice or pointers.
>>
>>Thanks.
>
>
>
> Okay. It works if I change:
>
> fullink = reg4.search(str(cleanlink))
> if fullink:
> linkArray.append(cleanlink)
> else:
> cleanlink2 = str(website) + "/" + str(cleanlink)
>
> to
>
> fullink = reg4.search(cleanlink[0])
> if fullink:
> linkArray.append(cleanlink[0])
> else:
> cleanlink2 = str(website) + "/" + cleanlink[0]
>
>
> so can anyone tell me why "cleanlink" gets coverted to a list? Is it
> during the slicing?
>
>
> Thanks.
The statement
cleanlink = link2[1:2]
results in a list of one element. If you want to accesss element one
(the second in the list) then use
cleanlink = link2[1]
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
More information about the Python-list
mailing list