[Tutor] how to extract data only after a certain condition is met
Emile van Sebille
emile at fenx.com
Sun Oct 10 22:54:16 CEST 2010
On 10/10/2010 12:35 PM Josep M. Fontana said...
<snip>
> OK. Let's start with -b- . My first problem is that I don't really know how
> to go about building a dictionary from the file with the comma separated
> values. I've discovered that if I use a file method called 'readlines' I can
> create a list whose elements would be each of the lines contained in the
> document with all the codes followed by comma followed by the year. Thus if
> I do:
>
> fileNameCentury = open(r
> '/Volumes/DATA/Documents/workspace/GCA/CORPUS_TEXT_LATIN_1/FileNamesYears.txt'
> ).readlines()
>
> Where 'FileNamesYears.txt' is the document with the following info:
>
> A-01, 1278
> A-02, 1501
> ...
> N-09, 1384
>
> I get a list of the form ['A-01,1374\rA-02,1499\rA-05,1449\rA-06,1374\rA-09,
> ...]
>
> Would this be a good first step to creating a dictionary?
Hmmm... It looks like you got a single string -- is that the output from
read and not readlines? I also see you're just getting \r which is the
Mac line terminator. Are you on a Mac, or was 'FileNamesYears.txt'
created on a Mac?. Python's readlines tries to be smart about which
line terminator to expect, so if there's a mismatch you could have
issues related to that. I would have expected you'd get something more
like: ['A-01,1374\r','A-02,1499\r','A-05,1449\r','A-06,1374\r','A-09, ...]
In any case, as you're getting a single string, you can split a string
into pieces, for example, print "1\r2\r3\r4\r5".split("\r"). That way
you can force creation of a list of strings following the format
"X-NN,YYYY" each of which can be further split with xxx.split(",").
Note as well that you can assign the results of split to variable names.
For example, ky,val = "A-01, 1278".split(",") sets ky to A-01 and val
to 1278. So, you should be able to create an empty dict, and for each
line in your file set the dict entry for that line.
Why don't you start there and show us what you get.
HTH,
Emile
More information about the Tutor
mailing list