[Tutor] regexp help needed
János Juhász
janos.juhasz at VELUX.com
Wed Oct 18 22:48:33 CEST 2006
Hi Kent,
thanks your respond.
> > I have to remove the thousand separator by moving the numbers before
it to
> > right.
> > So the number and char groups has to be left in their original
position.
> >
> > I have to make this kind of changes on the problematic lines:
> > MOATOT 79 47.281,680 DKK4
> > MOATOT 79 47281,680 DKK4
> >
> > I have no idea how to make it :(
> Break it up into smaller problems:
> for each line in the data:
> break the line up into fields
> fix the field containing the amount
> rebuild the line
>
> You don't really have to make a regex for the whole line. re.split() is
> useful for splitting the line and preserving the whitespace so you can
> rebuild the line with the same format.
> Kent
I can't find the way to preserve the whitespace at split.
But I found a way to use regexp.
import re
# this is a group of numbers, followed by a dot, followed by 3 numerics
and a comma
pat = re.compile(r'(\d+)\.(\d{3},)')
def replace_thousand_separator(line):
# when the '.' removed a space has to be prepended
return re.sub(pat, r' \1\2', line)
for file in glob.glob('*.doc'):
lines = open(file, 'r').readlines()
lines = [replace_thousand_separator(line) for line in lines]
open(file, 'wt').writelines(lines)
Yours sincerely,
______________________________
Janos Juhasz
More information about the Tutor
mailing list