[Tutor] simple text replace

Dave Angel davea at ieee.org
Mon Jul 27 12:37:48 CEST 2009


Albert-Jan Roskam wrote:
> Hi!
>
> Did you consider using a regex?
>
> import re
> re.sub("python\s", "snake ", "python is cool, pythonprogramming...")
>
> Cheers!!
> Albert-Jan
>
>
> --- On Mon, 7/27/09, Dave Angel <davea at ieee.org> wrote:
>
>   
>> From: Dave Angel <davea at ieee.org>
>> Subject: Re: [Tutor] simple text replace
>> To: "j booth" <j8ooth at gmail.com>
>> Cc: tutor at python.org
>> Date: Monday, July 27, 2009, 12:41 AM
>> j booth wrote:
>>     
>>> Hello,
>>>
>>> I am scanning a text file and replacing words with
>>>       
>> alternatives. My
>>     
>>> difficulty is that all occurrences are replaced (even
>>>       
>> if they are part of
>>     
>>> another word!)..
>>>
>>> This is an example of what I have been using:
>>>
>>>      for line in
>>>       
>> fileinput.FileInput("test_file.txt",inplace=1):
>>     
>>>    
>>>       
>>>>          line =
>>>>         
>> line.replace(original, new)
>>     
>>>>          print line,
>>>>      
>>>>         
>>    fileinput.close()
>>     
>>>>      
>>>>         
>>> original and new are variables that have string values
>>>       
>> from functions..
>>     
>>> original finds each word in a text file and old is a
>>>       
>> manipulated
>>     
>>> replacement. Essentially, I would like to replace only
>>>       
>> the occurrence that
>>     
>>> is currently selected-- not the rest. for example:
>>>
>>> python is great, but my python knowledge is limited!
>>>       
>> regardless, I enjoy
>>     
>>>    
>>>       
>>>> pythonprogramming
>>>>      
>>>>         
>>> returns something like:
>>>
>>> snake is great, but my snake knowledge is limited!
>>>       
>> regardless, I enjoy
>>     
>>>    
>>>       
>>>> snakeprogramming
>>>>      
>>>>         
>>> thanks so much!
>>>
>>>    
>>>       
>> Not sure what you mean by "currently selected," you're
>> processing a line at a time, and there are multiple
>> legitimate occurrences of the word in the line.
>>
>> The trick is to define what you mean by "word." 
>> replace() has no such notion.  So we want to write a
>> function such as:
>>
>> given three strings, line, inword, and outword.  Find
>> all occurrences of inword in the line, and replace all of
>> them with outword.  The definition of word is a group
>> of alphabetic characters (a-z perhaps) that is surrounded by
>> non-alphabetic characters.
>>
>> The approach that I'd use is to prepare a translated copy
>> of the line as follows:   Replace each
>> non-alphabetic character with a space.  Also insert a
>> space at the beginning and one at the end.  Now, take
>> the inword, and similarly add spaces at begin and end. 
>> Now search this modified line for all occurrences of this
>> modified inword, and make a list of the indices where it is
>> found.  In your example line, there would be 2 items in
>> the list.
>>
>> Now, using the original line, use that list of indices to
>> substitute the outword in the appropriate places.  Use
>> slices to do it, preferably from right to left, so the
>> indices will work even though the string is changing. 
>> (The easiest way to do right to left is to reverse() the
>> list.
>>
>> DaveA
>>
>> _______________________________________________
>> Tutor maillist  -  Tutor at python.org
>> http://mail.python.org/mailman/listinfo/tutor
>>
>>     
(Please don't top-post on this list.  The message then appears out of 
order.  Append new responses to end, or inline when appropriate)

Yes, a regex would make a lot of sense here.  But a person should not 
take on regular expressions till they have lots of experience with the 
rest of the language.  Besides, it's pretty easy to have subtle bugs, 
even with such a simple case.  For example your re string would 
erroneously convert the word "newpython", and miss the last two 
occurrences of the real word "python" near the end of the string.

import re
print st = re.sub("python\s", "snake ", "python is cool, 
pythonprogramming... newpython becomes python, or python")

Output:  snake is cool, pythonprogramming... newsnake becomes python, or 
python

(gives the wrong answer, in three places)

DaveA



More information about the Tutor mailing list