which data structure should I use?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Fri Jan 15 17:25:14 EST 2010
En Fri, 15 Jan 2010 01:56:24 -0300, Eknath Venkataramani
<eknath.iyer at gmail.com> escribió:
> I have a txt file in the following format:
> [code]
> "confident" => {
> count => 4,
> trans => {
> "ashahvasahta" => 0.74918568,
> "atahmavaishahvaasa" => 0.09095465,
> "pahraaram\.nbha" => 0.06990729,
> "mailatae" => 0.02856427,
> "utanai" => 0.01929341,
> "anaa" => 0.01578552,
> "uthaanae" => 0.01403157,
> "jaitanae" => 0.01227762,
> },
> },
> "consumers" => {
> count => 4,
> trans => {
> "upabhaokahtaa" => 0.75144362,
> ...
> and I need to extract "confident" , "ashahvasahta" from the first
> record, "consumers", "upabhaokahtaa" from the second record...
> i.e. "word in english" and the "first word in the probable-translations"
The most robust way would be to write a specific parser for such format.
Should be easy using pyparsing http://pyparsing.wikispaces.com/
If you can guarantee certain properties (e.g. lines like "confident",
"consumers" are always in a separate line; translations appear one per
line; no line breaks before/after the => sign, etc.) then you could
process the file line by line, looking at those separators. But only do
that is you are completely sure the format is fixed (e.g. the file is
computer-generated, not human-written). Anyway, it isn't much easier than
writing a real parser, and the latter is a lot more reliable. Learning how
to use a tool like pyparsing is in no way a waste of time.
--
Gabriel Genellina
More information about the Python-list
mailing list