[Tutor] could someone break this down

Nathan Smith nathan-tech at hotmail.com
Wed Apr 20 10:18:03 EDT 2022


Just as a closing on this one I ended up going with DN's mentioned 
approach in a way.


After another fruitless day of attempts I decided to hell with it and 
built my own DAWG out of fresh data sources, then rewrote parts of the 
TWL module to fit it. At the end it took me three attempts, but came out 
pretty well, all things considered. Admittedly though, my data sources 
are much bigger and I've rewritten it for more speed than small. Sadly, 
it no longer fits 500KB! :)


It works, though and that's what matters.

Amusing I did have a novice mistake which hopefully the list will laugh at.

I compiled the British and American word lists, and did a check to see 
if every word in the American one appeared in the British, expecting 
words like "color" and "favor" to not appear.

Actually, all the American words were in the British one! "I've wasted 
my time." I thought, "they're identical!" After about an hour of feeling 
sorry for myself, I then thought to run len(american) and len(british). 
Low and behold, american was 178,000ish, British was 270,000ish. Oh.

OOPS! :)

turns out the british wordlist for Scrabble includes the one used by 
TWL, then adds words.


Ah well, problem solved now.

thanks everyone for your help as ever.

Nathan

On 11/04/2022 00:50, dn wrote:
> On 11/04/2022 11.15, Nathan Smith wrote:
>> Yeah I reached out via email on the Github page but no response.
>> Considering the module was last updated in 2013 I'm not sure if that
>> email is even valid :)
>> Digging to see if anyone else has figures this though.
> Sad, but likely. GitHub is sometimes a great source of source (hah!) but
> is all too often a place to put 'stuff that I don't want to be bothered
> with any more'...
>
>
> I doubt that reverse-engineering the code will help, because it only
> traverses the DAWG data - it doesn't show how to build the DAWG from a
> (British) dictionary in the first place!
>
> ie head meet brick-wall!
>
>
>> Thanks to the resources provided I think I now have a rough
>> understanding of DAWG's but breaking down how the one in the module was
>> written to file and such is still challenge.
> I was thinking that the concept of building a Finite State Machine to
> traverse a DAWG - and to build the DAWG in the first place, would make
> quite a good programming (design and coding) assignment...
>
>
> If GitHub is no help, try 'the cheese shop' aka PyPi (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2F&data=04%7C01%7C%7C2a955077c73846bc591408da1b4e0223%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637852319105031206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GVpzhbq13PcGEs%2FRN9efXj8%2FtIs7X8XcG5HBZdHQzcg%3D&reserved=0).
> Searching for DAWG realised some familiar 'hit's.
>
> It also mentions tries as an alternative data-structure.
>
> Noted "lexpy" which has been updated more recently (but then, once it's
> working, what updates are necessary?). It offers the choice of DAWG or
> trie, and seems to feature both the 'building' of the lexicon, and its
> (later) interrogation.
>
>
> Let us know how you get on...
-- 

Best Wishes,

Nathan Smith, BSC


My Website: https://nathantech.net




More information about the Tutor mailing list