[Pythonmac-SIG] Python Help?

Thu Jun 2 18:53:47 CEST 2005

Charles Hartman wrote:
> On Jun 1, 2005, at 10:33 PM, Matthew S-H wrote:

>>             list[currentWord:currentWord + 1] = [word[:-1], word[-1]]

> You start with a list of strings, but your code replaces one (or more) 
> of them, not with a different string or two strings, but with a tuple 
> whose elements are two strings. The comma is what does that.

Well, no. It's a 2-element list, not a tuple (the [ ] make it a list), 
and he's assigning it to a slice, which should work:

 >>> l = [1,2,3,4]
 >>> l[2:3] = [5,6]
 >>> l
[1, 2, 5, 6, 4]
 >>>

So what is going on? I wrote a little test, and inserted a print statement:

import string
##Separates words with punctuation into 2 separate words.
def puncSep(list):
     currentWord = -1
     for word in list:
         currentWord = currentWord + 1
         print currentWord, word
         if word[-1] in string.punctuation:
##            list = list[:currentWord] + [word[0:-1], word[-1]] + 
list[currentWord + 1:]
             list[currentWord:currentWord + 1] = [word[:-1], word[-1]]
             currentWord = currentWord + 1
     return list

#L = "This is a sentence.".split()
L = ["Word?"]
print L
print puncSep(L)

Running it gave me:
cbarker at localhost junk $ ./piglatin.py
['Word?']
0 Word?
2 ?
4
Traceback (most recent call last):
   File "./piglatin.py", line 113, in ?
     print puncSep(L)
   File "./piglatin.py", line 103, in puncSep
     if word[-1] in string.punctuation:
IndexError: string index out of range

same error, but I got a hint: the last "word" is an empty string, which 
is why you got the IndexError. So I added another print statement:

             print "adding:", [word[:-1], word[-1]]
             list[currentWord:currentWord + 1] = [word[:-1], word[-1]]

cbarker at localhost junk $ ./piglatin.py
['Word?']
0 Word?
adding: ['Word', '?']
2 ?
adding: ['', '?']
4
Traceback (most recent call last):
   File "./piglatin.py", line 114, in ?
     print puncSep(L)
   File "./piglatin.py", line 103, in puncSep
     if word[-1] in string.punctuation:
IndexError: string index out of range

So you are adding an empty string. I'm not totally sure why yet, but I 
see a common trip-up in this code:

Never alter a list while iterating through it with a for loop!

(actually, it's not never, but don't do it unless you know what you are 
doing.)

I'll check if this is the problem by printing the list as we go:
             print "adding:", [word[:-1], word[-1]]
             list[currentWord:currentWord + 1] = [word[:-1], word[-1]]
             print "the list is now:", list

and we get:cbarker at localhost junk $ ./piglatin.py
['Word?']
0 Word?
adding: ['Word', '?']
the list is now: ['Word', '?']
2 ?
adding: ['', '?']
the list is now: ['Word', '?', '', '?']
4
Traceback (most recent call last):
   File "./piglatin.py", line 115, in ?
     print puncSep(L)
   File "./piglatin.py", line 103, in puncSep
     if word[-1] in string.punctuation:
IndexError: string index out of range

So what happened? The iteration started with the one word in the list: 
"Word?". Then that was replaced by two words: ["Word", "?"], Now this 
list has two elements, so the iteration continues, and the next word in 
the list is "?". It gets replaced by ["","?"].. whoops, that's not 
supposed to happen!

So what's the solution? two options:
1) make sure you only iterate through the original number of items in 
the list:

replace:
     for word in list:
         currentWord = currentWord + 1

with:
     while currentWord < len(list)-1:
         currentWord = currentWord + 1
         word = list[currentWord]

that's a bit ugly, so I'd rather move the increment to the end of the 
while block:

and move the increment to the end of the loop:

def puncSep(list):
     currentWord = 0
     while currentWord < len(list)-1:
         word = list[currentWord]
         if word[-1] in string.punctuation:
             list[currentWord:currentWord + 1] = [word[:-1], word[-1]]
             currentWord += 1
         currentWord += 1
     return list

Another option, and one I'd probably do, is to create a new, list, 
rather than altering the one you have in place:

def puncSep(list):
     currentWord = 0
     newList = []
     for currentWord, word in enumerate(list):
         if word[-1] in string.punctuation:
             newList.extend([word[:-1], word[-1]])
         else:
             newList.append(word)
     return newList

But what if there is a punctuation mark by itself? (which I suppose is a 
syntax error in the input, but probably best to check for it):

before: ['two', 'words', '?']
after: ['two', 'words', '', '?']

It adds an empty string, which you don't want:
         if len(word)> 1 and word[-1] in string.punctuation:

before: ['two', 'words', '?']
after: ['two', 'words', '?']

There, that's fixed it.

Now, a few style issues:

1) don't use "import *", you can get name clashes, and it's hard to know 
where stuff comes from when you look at your code later.

2) as pointed out, don't use "list" as a variable name

3) use enumerate, if you need to loop through a list, and keep track of 
the index, though you don't need to anymore here.

4) minor point, but I"m not sure there's much point in using 
list.extend() when you are creating the list in the argument anyway, so 
I've just used two append()s

Here's my version now:

import string
##Separates words with punctuation into 2 separate words.
def puncSep(oldList):
     newList = []
     for word in oldList:
         if len(word)> 1 and word[-1] in string.punctuation:
             newList.append(word[:-1])
             newList.append(word[-1])
         else:
             newList.append(word)
     return newList

L = "This is a sentence. Is this another? Here is one with a lone 
punctuation mark .".split()
#L = ["Word?"]
#L = ["two","words", "?"]
print "before:", L
print "after:", puncSep(L)

By the way, this cries out for unit testing of some sort. Read up about 
it in "Dive Into Python" in print or on the web.

For an additional challenge, could you do this with list comprehensions?

-Chris

  Then (I
> thnk) you're expecting the size of your list to adjust itself, so that 
> the index of the last element will be one larger than it was before the 
> substitution. But the list's length -- the number of elements in the 
> list -- hasn't changed; it's just that one of them has been replaced 
> with a tuple, a different kind of object from a string.
> 
> One easy (not necessarily efficient) way to revise it would be (off the 
> top of my head without testing)
> 
>         list[currentWord] = word[:-1]
>         list.insert(currentWord + 1, word[-1])
> 
> (though there are more elegant ways to do it without using the 
> currentWord indexing variable).
> 
> By the way, "list" is an operator in Python (it turns its argument into 
> a list), so it's a bad idea to use that as the name of a variable.
> 
> Charles Hartman
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Pythonmac-SIG maillist  -  Pythonmac-SIG at python.org
> http://mail.python.org/mailman/listinfo/pythonmac-sig

-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/OR&R/HAZMAT         (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov