[Tutor] Using contents of a document to change file names, (was Re: how to extract data only after a certain ...)

Mon Oct 11 14:23:37 CEST 2010

On 11/10/2010 13:46, Josep M. Fontana wrote:
> I tried your suggestion of using .split() to get around the problem 
> but I still cannot move forward. I don't know if my implementation of 
> your suggestion is the correct one but here's the problem I'm having. 
> When I do the following:
>
> -----------------
>
> fileNameCentury = 
> open(r'/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1/FileNamesYears.txt'.split('\r'))
> dct = {}
> for pair in fileNameCentury:
>     key,value = pair.split(',')
>     dct[key] = value
> print dct
>
> --------------
>
> I get the following long error message:
>
>     fileNameCentury = 
> open(r'/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1/FileNamesYears.txt'.split('\n')) 
>
>
> TypeError: coercing to Unicode: need string or buffer, list found
>
> ------------

What you should be doing is:

fileNameCentury = 
open('/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1/FileNamesYears.txt', 
'r')
dct = {}
for line in fileNameCentury: #File objects have built-in iteration
     key, value = line.strip().split(',')
     dct[key] = value

What you were doing originally was splitting the input filename for the 
open function (hence the error message stating `need string or buffer, 
list found`.  If you wish to read in the entire file and then split it 
on newline characters you would do fileObject.read().splitlines() but it 
is more efficient to create your file object and just iterate over it 
(that way there is only 1 line at a time stored in memory and you're not 
reading in the entire file first).

It's not a Mac problem, just a problem with how you were going about it.

Hope that helps.

-- 
Kind Regards,
Christian Witts