[Tutor] FW: wierd replace problem

Roelof Wobben rwobben at hotmail.com
Tue Sep 14 09:28:22 CEST 2010


Strip ('"'') does not work.
Still this message : SyntaxError: EOL while scanning string literal

So I think I go for the suggestion of Bob en develop a programm which deletes all the ' and " by scanning it character by character.


> ----------------------------------------
>> From: steve at pearwood.info
>> To: tutor at python.org
>> Date: Tue, 14 Sep 2010 09:39:29 +1000
>> Subject: Re: [Tutor] wierd replace problem
>> On Tue, 14 Sep 2010 09:08:24 am Joel Goldstick wrote:
>>> On Mon, Sep 13, 2010 at 6:41 PM, Steven D'Aprano
>> wrote:
>>>> On Tue, 14 Sep 2010 04:18:36 am Joel Goldstick wrote:
>>>>> How about using str.split() to put words in a list, then run
>>>>> strip() over each word with the required characters to be removed
>>>>> ('`")
>>>> Doesn't work. strip() only removes characters at the beginning and
>>>> end of the word, not in the middle:
>>> Exactly, you first split the words into a list of words, then strip
>>> each word
>> Of course, if you don't want to remove ALL punctuation marks, but only
>> those at the beginning and end of words, then strip() is a reasonable
>> approach. But if the aim is to strip out all punctuation, no matter
>> where, then it can't work.
>> Since the aim is to count words, a better approach might be a hybrid --
>> remove all punctuation marks like commas, fullstops, etc. no matter
>> where they appear, keep internal apostrophes so that words like "can't"
>> are different from "cant", but remove external ones. Although that
>> loses information in the case of (e.g.) dialect speech:
>> "'e said 'e were going to kill the lady, Mister Holmes!"
>> cried the lad excitedly.
>> You probably want to count the word as 'e rather than just e.
>> And hyphenation is tricky to. A lone hyphen - like these - should be
>> deleted. But double-dashes--like these--are word separators, so need to
>> be replaced by a space. Otherwise, single hyphens should be kept. If a
>> word begins or ends with a hyphen, it should be be joined up with the
>> previous or next word. But then it gets more complicated, because you
>> don't know whether to keep the hyphen after joining or not.
>> E.g. if the line ends with:
>> blah blah blah blah some-
>> thing blah blah blah.
>> should the joined up word become the compound word "some-thing" or the
>> regular word "something"? In general, there's no way to be sure,
>> although you can make a good guess by looking it up in a dictionary and
>> assuming that regular words should be preferred to compound words. But
>> that will fail if the word has changed over time, such as "cooperate",
>> which until very recently used to be written "co-operate", and before
>> that as "coöperate".
>> --
>> Steven D'Aprano
>> _______________________________________________
>> Tutor maillist - Tutor at python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor 		 	   		  

More information about the Tutor mailing list