[Tutor] language aid (various)

spir denis.spir at free.fr
Sat Feb 6 13:09:37 CET 2010


On Fri, 05 Feb 2010 11:29:30 +0000
Owain Clarke <simbobo at cooptel.net> wrote:

> On Thu, Feb 4, 2010 at 5:43 AM, Owain Clarke <simbobo at cooptel.net> wrote:
> >   
> >> My question is, that if I proceed like this I will end up with a single list
> >> of potentially several hundred strings of the form "frword:engword". In
> >> terms of performance, is this a reasonable way to do it, or will the program
> >> increasingly slow down?
> >>     
> > From: "Carnell, James E" <jecarnell at saintfrancis.com>
> >   
> >> A dictionary (associative array of keys and values) seems a good 
> >> datatype to use. vocab = {} vocab[frenchword]?= englishword
> >> ?
> >>     
> > .......
> >   
> >> Cheers!!
> >> Albert-Jan
> >>     
> >
> > Sure, a dict is the obvious choice. For saving into file, if the app is
> > to be used internally, you can even print it in the form of a python
> > dict (with the '{}', ':' & ',') so that reading the dict data is just
> > importing:
> >     import french_english
> >
> > Denis
> >
> > I 3rd the dictionary choice. They (for me at least) aren't as clean on
> > the computer screen as arrays, but once you get good at it you can even
> > have multiple definitions and weights for how relevant that word is. You
> > (in the future when you get comfortable with dictionaries) can take it
> > into networkx or something and draw pictures of it, and really start
> > messing around with it (using subnetworks to try and get context
> > information). Google has some tech talks on  youtube concerning Language
> > Processing using networks etc if that kind of thing interests you.
> >
> > Sincerely,
> >
> > Bad answer man
> >   
> What a helpful forum - much thanks to all who've commented.  Seems to be 
> a bit of a consensus here about dictionaries.  Let me just restate my 
> reluctance, using examples from Spanish.
> 
> esperar = to hope
> esperar = to wait
> tambien = too [i.e. also]
> demasiado = too [i.e. excessive]
> 
> So there are repeats in both languages.  I would like to end up with a 
> file which I can use to generate flash cards, either to or from English, 
> and I suppose I want the flexibility to have 1 word with 1 definition.
> 
> Having said that, I obviously recognise the expertise of the group, so I 
> will probably pursue this option.
> 
> Owain

If you store simple one-to-one pairs, then it will be both more difficult and less efficient to retrieve several matching forms. I guess you'd better store in a dict all possible matches for each single form, eg
{..., "esperar":["to hope", "to wait", ...}
The issue is how to output that in a clear and practicle format. But this point is about the same whatever your choice about storing (and retrieving) information. The cleanest way to cope with this may be to subtype dict and establish a custom output format using the __str__ "magic" method. You can also use __repr__ for development feedback.

Also, your application can be much more complex than it seems at first sight. Look at the hierarchy of acceptions in a paper dict: from word nature to style variant. I would reproduce that in my storage and output formats.


Denis
________________________________

la vita e estrany

http://spir.wikidot.com/


More information about the Tutor mailing list