[Tutor] Problems with partial string matching

Josep M. Fontana josep.m.fontana at gmail.com
Mon Nov 1 13:48:43 CET 2010

Thanks a lot Dave and Joel,

> You call re.sub(), but don't do anything with the result.
> Where do you call os.rename() ?

Yes, indeed, as you suggested what was missing was the use of
os.rename() to apply the substitution to the actual file names. I
incorporated that and I changed the loop that I had produced in my
first version because it wasn't doing what it was supposed to do.

Doing that definitely gets me closer to my goal but I'm encountering a
strange problem. Well, strange to me, that is. I'm sure that more
experienced programmers like the people who hang out in this list will
immediately see what is going on. First, here's the code:

import os, sys, glob, re
#What follows creates a dictionary with the form {'name':'year'} out
of a csv file called FileNameYear.txt which has a string of the form
'A-01,1374' on each line. The substring before the comma is the code
for the text that appears at the beginning of the name of the file
containing the given text and the substring after the comma indicates
the year in which the text was written.
fileNameYear = open(r'/Volumes/DATA/Documents/workspace/MyCorpus/CORPUS_TEXT_LATIN_1/FileNameYear.txt',
name_year = {}
for line in fileNameYear: #File objects have built-in iteration
    name, year = line.strip().split(',')
    name_year[name] = year #effectively creates the dictionary by
creating keys with the element 'name' returned by the loop and
assigning them values corresponding to the element 'year' --> !d[key]
= value" means Set d[key] to value.
file_names = glob.glob('*.txt')
for name in name_year:
    for name_of_file in file_names:
        if name_of_file.startswith(name):
            os.rename(name_of_file, re.sub('__', '__' + year, name_of_file))

What this produces is a change in the names of the files which is not
exactly the desired result. The new names of the files have the
following structure:

'A-01-name1__1499.txt' , 'A-02-name2__1499.txt',
'A-05-name3__1499.txt', ... 'I-01-name14__1499.txt',

That is, only the year '1499' of the many possible years has been
added in the substitution. I can understand that I've done something
wrong in the loop and the iteration over the values of the dictionary
(i.e. the strings representing the years) is not working properly.
What I don't understand is why precisely '1499' is the string that is
obtained in all the cases.

I've been trying to figure out how the loop proceeds and this doesn't
make sense to me because the year '1499' appears as the value for
dictionary item number 34. Because of the order of the dictionary
entries and the way I've designed the loop (which I admit might not be
the most efficient way to process these data), the first match would
correspond to a file that starts with the initial code 'I-02'. The
dictionary value for this key is '1399', not '1499'. '1499' is not
even the value that would correspond to key 'A-01' which is the first
file in the directory according to the alphabetical order ('A-02', the
second file in the directory does correspond to value '1499', though).

So besides being able to explain why '1499' is the string that winds
up added to the file name, my question is, how do I set up the loop so
that the string representing the appropriate year is added to each
file name?

Thanks a lot in advance for your help (since it usually takes me a
while to answer).

Josep M.

> On 2:59 PM, Josep M. Fontana wrote:
>> Hi,
>> As I said in another message with the heading "Using contents of a
>> document to change file names", I'm trying to learn Python "by doing"
>> and I was working on a little project where I had to change the names
>> <snip>
>> I run this and I don't get any errors. The names of the files in the
>> directory, however, are not changed. What am I doing wrong?
>> As always, your help is greatly appreciated.
>> Josep M.
> You call re.sub(), but don't do anything with the result.
> Where do you call os.rename() ?
> DaveA

More information about the Tutor mailing list