[Tutor] Adding to a CSV file?
aeneas24 at priest.com
aeneas24 at priest.com
Sun Aug 29 20:12:06 CEST 2010
Hi,
I'm learning Python so I can take advantage of the really cool stuff in the Natural Language Toolkit. But I'm having problems with some basic file manipulation stuff.
My basic question: How do I read data in from a csv, manipulate it, and then add it back to the csv in new columns (keeping the manipulated data in the "right row")?
Here's an example of what my data looks like ("test-8-29-10.csv"):
MyWord
Category
Ct
CatCt
!
A
2932
456454
!
B
2109
64451
a
C
7856
90000
a
A
19911
456454
abnormal
C
174
90000
abnormally
D
5
77777
cats
E
1999
886454
cat
B
160
64451
# I want to read in the MyWord for each row and then do some stuff to it and add in some new columns. Specifically, I want to "lemmatize" and "stem", which basically means I'll turn "abnormally" into "abnormal" and "cats" into "cat".
import nltk
wnl=nltk.WordNetLemmatizer()
porter=nltk.PorterStemmer()
text=nltk.word_tokenize(TheStuffInMyWordColumn)
textlemmatized=[wnl.lemmatize(t) for t in text]
textPort=[porter.stem(t) for t in text]
# This creates the right info, but I don't really want "textlemmatized" and "textPort" to be independent lists, I want them inside the csv in new columns.
# If I didn't want to keep the information in the Category and Counts columns, I would probably do something like this:
for word in text:
word2=wnl.lemmatize(word)
word3=porter.stem(word)
print word+";"+word2+";"+word3+"\r\n")
# Looking through some of the older discussions about the csv module, I found this code helps identify headers, but I'm still not sure how to use them--or how to word the for-loop that I need correctly so I iterate through each row in the csv file.
f_out.close()
fp=open(r'c:test-8-29-10.csv', 'r')
inputfile=csv.DictReader(fp)
for record in inputfile:
print record
{'Category': 'A', 'CatCt': '456454', 'MyWord': '!', 'Ct': '2932'}
{'Category': 'B', 'CatCt': '64451', 'MyWord': '!', 'Ct': '2109'}
...
fp.close()
# So I feel like I have *some* of the pieces, but I'm just missing a bunch of little connections. Any and all help would be much appreciated!
Tyler
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100829/e4291f15/attachment-0001.html>
More information about the Tutor
mailing list