Strings in Python
Shawn Milo
Shawn at Milochik.com
Thu Feb 8 11:58:49 EST 2007
On 2/8/07, Gary Herron <gherron at islandtraining.com> wrote:
> Johny wrote:
> > Playing a little more with strings, I found out that string.find
> > function provides the position of
> > the first occurance of the substring in the string.
> > Is there a way how to find out all substring's position ?
> > To explain more,
> > let's suppose
> >
> > mystring='12341'
> > import string
> >
> >
> >>>> string.find(mystring ,'1')
> >>>>
> > 0
> >
> > But I need to find the possition the other '1' in mystring too.
> > Is it possible?
> > Or must I use regex?
> > Thanks for help
> > L
> >
> >
> You could use a regular expression. The re module has s function
> "findall" that does what you want.
>
> Also, if you read the documentation for strings find method, you'll find:
>
> 1 S.find(sub [,start [,end]]) -> int
> 2
> 3 Return the lowest index in S where substring sub is found,
> 4 such that sub is contained within s[start,end]. Optional
> 5 arguments start and end are interpreted as in slice notation.
> 6
> 7 Return -1 on failure.
>
> So put your find in a loop, starting the search one past the previously
> found occurrence.
>
> i = string.find(mystring, i+1)
>
> Gary Herron
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
Speaking of regex examples, that's basically what I did in the script
below which James Kim and I were collaborating on yesterday and this
morning, as a result of his thread.
This matches not only a string, but a regex, then loops through each
match to do something to it. I hope this helps. I submitted this to
the list for recommendations on how to make it more Pythonic, but at
least it works.
Here are the most important, stripped down pieces:
#! /usr/bin/python
import re
#match a date in this format: 05/MAR/2006
regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")
for line in infile:
matches = regex.findall(line)
for someDate in matches:
newDate = #do something here
line = line.replace(someDate, newDate)
Here is the full script:
#! /usr/bin/python
import sys
import re
month ={'JAN':1,'FEB':2,'MAR':3,'APR':4,'MAY':5,'JUN':6,'JUL':7,'AUG':8,'SEP':9,'OCT':10,'NOV':11,'DEC':12}
infile=file('TVA-0316','r')
outfile=file('tmp.out','w')
def formatDatePart(x):
"take a number and transform it into a two-character string,
zero padded"
x = str(x)
while len(x) < 2:
x = "0" + x
return x
regex = re.compile(r",\d{2}/[A-Z]{3}/\d{4},")
for line in infile:
matches = regex.findall(line)
for someDate in matches:
dayNum = formatDatePart(someDate[1:3])
monthNum = formatDatePart(month[someDate[4:7]])
yearNum = formatDatePart(someDate[8:12])
newDate = ",%s-%s-%s," % (yearNum,monthNum,dayNum)
line = line.replace(someDate, newDate)
outfile.writelines(line)
infile.close
outfile.close
More information about the Python-list
mailing list