Aspell and

Kevin Atkinson kevina at gnu.org
Tue Oct 22 00:52:39 CEST 2002


On Mon, 21 Oct 2002, Mike C. Fletcher wrote:

> Sorry if I caused offense by using your code w/out notifying you.  

Not really.  I just like to know how Aspell is being used thats all.

> I 
> didn't really think you'd be interested in a project that's so early in 
> it's life-cycle (I only just set up the SourceForge project in the last 
> hour).  I cancelled a message I wrote to you Saturday night because I 
> figured you'd be too busy to be answering questions from the likes of me 
> before I have anything that's actually working :) .

I would generally be willing to answer questions provided you make your 
intentions clear.  If you said 

> As for compiling Aspell on Win32, I hadn't tried the MingWin32 version 
> of GCC.  I had noticed the post about the VC++ compilation patch, but 
> your comment on it seemed to suggest that it would require quite a bit 
> of work to be acceptable.  Given that I have no great C/C++ skill, it is 
> easier for me to build the infrastructure in Python and only use C/C++ 
> for a few key algorithms than it is to try and modify a complex C/C++ 
> project.

Thats fine.  However as Aspell improves your module won't unless you keep 
up with Aspell developments ;)

> Too bad about using the *.rws files directly, but in considering it, I'm 
> leaning toward giving (GUI) tools to both dictionary creators and users 
> for generating redistributable files for both dictionaries and 
> word-sets.  

If Aspell is installed you CAN dump the words lists to stdout using 
"aspell dump master <dict>".

> From the sound of it, it should be easy to allow users to 
> generate distributables for either system.  If they have aspell 
> installed we'll offer the word-list-(de)compress functionality, 
> otherwise I'll only accept/generate uncompressed lists.
> 
> I am somewhat at a loss for how you access the "compressed" files.  I'd 
> thought they were using a b-tree or similar index, but it doesn't seem 
> that way when I look at the code for word-list-compress.  Are you 
> loading the whole word-set into memory?  

You misunderstand.  word-list-compress simply compresses and decompresses 
a sorted word list to save space.  It is not used in any way by Aspell 
itself.

> I'll have to look at the typo-weighting code, as I'm not sure where to 
> hook it into the leditdistance algorithm.  

leditdistance != typo edit distance.  leditdistance uses a different algo. 
than the normal edit distance algorithm.  The normal edit distance algorithm 
"editdist.cpp" and the typo edit distance algorithm "typo_editdist.cpp" 
are basically the same except for the weights.

> It would seem that you'd need 
> each "swap" to be a lookup into the typo table.  

Yes that is correct.  But it is not a "swap" but a replacement.  A swap is 
when the adjective letters are interchanged, "teh" vs. "the".

--- 
http://kevin.atkinson.dhs.org






More information about the Python-list mailing list