[Tutor] Problems with partial string matching

Josep M. Fontana josep.m.fontana at gmail.com
Sun Oct 24 20:16:19 CEST 2010


Hi,

As I said in another message with the heading "Using contents of a
document to change file names", I'm trying to learn Python "by doing"
and I was working on a little project where I had to change the names
of the files in a directory according to some codes contained in a CSV
file. With the help of the participants in this list I managed to
overcome some of the first obstacles I found and managed to create a
dictionary out of the structured contents of the CSV file.

Unfortunately, I didn't have much time to continue working on my
project and I didn't get back to the script until now. I have
encountered another hurdle, though, which doesn't allow me to go on.
The problem might be *very* simple to solve but I've spent the last
couple of hours checking manuals and doing searches on the internet
without having much success.

What I'm trying to do now is to use the dictionary I created (with
entries such as {'I-02': '1399', 'I-01': '1374',...}) to iterate over
the file names I want to change and do the necessary string
substitutions. If one of the keys in the dictionary matches the code
that is found at the beginning of every file, then the value of the
dictionary representing the year in which the text was written is
appended at the end of the file name.

Here is what I've done so far:
------------------------------
import os, sys, glob, re
fileNameYear = open(r'/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1/FileNameYear.txt',
"U").readlines()
name_year = {}
for line in fileNameYear: #File objects have built-in iteration
    name, year = line.strip().split(',')
    name_year[name] = year #effectively creates the dictionary by
creating keys with the element 'name' returned by the loop and
assigning them values corresponding to the element 'year' --> !d[key]
= value" means Set d[key] to value.
os.getcwd()
os.chdir('/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1')
file_names = glob.glob('*.txt')
for name_of_file in file_names:
    if name_of_file.startswith(name):
        re.sub('__', '__' + year, name_of_file) #The files have names
such as 'B-13-Viatges_Marco_Polo__.txt' so the first argument in
re.sub() is the string '__' which should be replaced by the same
string followed by the string corresponding to the year value in the
dictionary (also a string)
------------------------------

I run this and I don't get any errors. The names of the files in the
directory, however, are not changed. What am I doing wrong?

As always, your help is greatly appreciated.


Josep M.


More information about the Tutor mailing list