[Tutor] Splitting on punctuation

Dave Angel davea at davea.name
Sun Jun 9 02:21:55 CEST 2013


On 06/08/2013 07:44 PM, Mike Nickey wrote:
> Hey guys,
>
> I'm working on a web-project simply to bruh up on skills and build new
> ones.
> One of the tasks is to split on punctuation passed yet I'm having a bit
> more trouble then I expected.
>
> Give the input of "which isn't that surprising I guess.",",'.") where the
> first part passed is the string and the second part is the punctuation to
> split on, I'm having difficulty converting the punctuation to a split
> parameter.
>
> As you'll see I have tried various attempts at replace, strip and split but
> I can't seem to get it just right.
>

There's a lot here that's irrelevant to the problem you're describing.

> Currently I have the following:
> import string

Why?   You don't use it.

> def tokenize_query(query, punctuation):
>      # informational and to be removed
>      print 'Query passed: ', query
>      print 'Punctuation passed:' , punctuation
>      print '-----------------------'
>      punc = punctuation
>      query = query.replace(punc," ")

That's enough.  Just print it and be done.

Try just reading about the replace method itself.

http://docs.python.org/2/library/string.html#deprecated-string-functions

"Return a copy of string s with all occurrences of substring old 
replaced by new. If the optional argument maxreplace is given, the first 
maxreplace occurrences are replaced."

You're trying to use a string as though it were a list of characters, 
while replace is using it as a substring.

Try it with something simple in Python 2.7:

 >>> print "abc cba".replace("ca", "*")
abc cba
 >>> print "abc cba".replace("cb", "*")
abc *a
 >>>


Probably the simplest way to do it is to write a for loop over all the 
punctuation characters, replacing each of them in turn with a space.



-- 
DaveA


More information about the Tutor mailing list