[Tutor] Splitting on punctuation
Dave Angel
davea at davea.name
Sun Jun 9 02:21:55 CEST 2013
On 06/08/2013 07:44 PM, Mike Nickey wrote:
> Hey guys,
>
> I'm working on a web-project simply to bruh up on skills and build new
> ones.
> One of the tasks is to split on punctuation passed yet I'm having a bit
> more trouble then I expected.
>
> Give the input of "which isn't that surprising I guess.",",'.") where the
> first part passed is the string and the second part is the punctuation to
> split on, I'm having difficulty converting the punctuation to a split
> parameter.
>
> As you'll see I have tried various attempts at replace, strip and split but
> I can't seem to get it just right.
>
There's a lot here that's irrelevant to the problem you're describing.
> Currently I have the following:
> import string
Why? You don't use it.
> def tokenize_query(query, punctuation):
> # informational and to be removed
> print 'Query passed: ', query
> print 'Punctuation passed:' , punctuation
> print '-----------------------'
> punc = punctuation
> query = query.replace(punc," ")
That's enough. Just print it and be done.
Try just reading about the replace method itself.
http://docs.python.org/2/library/string.html#deprecated-string-functions
"Return a copy of string s with all occurrences of substring old
replaced by new. If the optional argument maxreplace is given, the first
maxreplace occurrences are replaced."
You're trying to use a string as though it were a list of characters,
while replace is using it as a substring.
Try it with something simple in Python 2.7:
>>> print "abc cba".replace("ca", "*")
abc cba
>>> print "abc cba".replace("cb", "*")
abc *a
>>>
Probably the simplest way to do it is to write a for loop over all the
punctuation characters, replacing each of them in turn with a space.
--
DaveA
More information about the Tutor
mailing list