[Tutor] creating a corpus from a csv file

Alan Gauld alan.gauld at btinternet.com
Sat May 4 00:05:32 CEST 2013


On 03/05/13 21:48, Treder, Robert wrote:

> I'm very new to python and am trying to figure out how to
 > make a corpus from a text file.

Hi, I for one have no idea what a corpus is or looks like
so you will need to help us out a little before we can help you.

> I have a csv file (actually pipe '|' delimited) where each
> row corresponds to a different text document.

> Each row contains a communication note.
 > Other columns correspond to categories of types of communications.

> I am able to read the csv file and print the notes column as follows:
>
> import csv
> with open('notes.txt', 'rb') as infile:
>      reader = csv.reader(infile, delimiter = '|')
>      i = 0
>      for row in reader:
>      if i <= 25: print row[8]
>      i = i+1

You don't need to manually manage 'i'.

you could do this instead:

with open('notes.txt', 'rb') as infile:
      reader = csv.reader(infile, delimiter = '|')
      for count, row in enumerate(reader):
          if count <= 25: print row[8]  # I assume indented?
          else: break                   # save time if its a big file

> I would like to convert this to a categorized corpus with
 > some of the other columns corresponding to the categories.

You might be able to use a dictionary but for now
I'm still not clear what you mean. Can you show us
some sample input and output data?

 > documentation on how to use csv.reader with PlaintextCorpusReader

never heard of the latter - is it an external module?

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/



More information about the Tutor mailing list