[Tutor] gensim to generate document vectors
Joshua Valdez
jdv12 at case.edu
Mon Aug 17 19:50:54 CEST 2015
Okay, so I'm trying to use Doc2Vec to simply read in a a file that is a
list of sentences like this:
#
The elephant flaps its large ears to cool the blood in them and its body.
A house is a permanent building or structure for people or families to live
in.
....
#
What I want to do is generate two files one with unique words from these
sentences and another file that has one corresponding vector per line (if
theres no vector output I want to output a vector od 0's)
I'm getting the vocab fine with my code but I can't seem to figure out how
to print out the individual sentence vectors, I have looked through the
documentation and haven't found much help. Here is what my code looks like
so far.
sentences = []for uid, line in enumerate(open(filename)):
sentences.append(LabeledSentence(words=line.split(),
labels=['SENT_%s' % uid]))
model = Doc2Vec(alpha=0.025, min_alpha=0.025)
model.build_vocab(sentences)for epoch in range(10):
model.train(sentences)
model.alpha -= 0.002
model.min_alpha = model.alpha
sent_reg = r'[SENT].*'for item in model.vocab.keys():
sent = re.search(sent_reg, item)
if sent:
continue
else:
print item
###I'm not sure how to produce the vectors from here and this doesn't work##
sent_id = 0for item in model:
print model["SENT_"+str(sent_id)]
sent_id += 1
I'm not sure where to go from here so any help is appreciated.
Thanks!
*Joshua Valdez*
*Computational Linguist : Cognitive Scientist
*
(440)-231-0479
jdv12 at case.edu <jdv2 at uw.edu> | jdv2 at uw.edu | joshv at armsandanchors.com
<http://www.linkedin.com/in/valdezjoshua/>
More information about the Tutor
mailing list