[Tutor] could someone break this down
Nathan Smith
nathan-tech at hotmail.com
Wed Apr 20 10:18:03 EDT 2022
Just as a closing on this one I ended up going with DN's mentioned
approach in a way.
After another fruitless day of attempts I decided to hell with it and
built my own DAWG out of fresh data sources, then rewrote parts of the
TWL module to fit it. At the end it took me three attempts, but came out
pretty well, all things considered. Admittedly though, my data sources
are much bigger and I've rewritten it for more speed than small. Sadly,
it no longer fits 500KB! :)
It works, though and that's what matters.
Amusing I did have a novice mistake which hopefully the list will laugh at.
I compiled the British and American word lists, and did a check to see
if every word in the American one appeared in the British, expecting
words like "color" and "favor" to not appear.
Actually, all the American words were in the British one! "I've wasted
my time." I thought, "they're identical!" After about an hour of feeling
sorry for myself, I then thought to run len(american) and len(british).
Low and behold, american was 178,000ish, British was 270,000ish. Oh.
OOPS! :)
turns out the british wordlist for Scrabble includes the one used by
TWL, then adds words.
Ah well, problem solved now.
thanks everyone for your help as ever.
Nathan
On 11/04/2022 00:50, dn wrote:
> On 11/04/2022 11.15, Nathan Smith wrote:
>> Yeah I reached out via email on the Github page but no response.
>> Considering the module was last updated in 2013 I'm not sure if that
>> email is even valid :)
>> Digging to see if anyone else has figures this though.
> Sad, but likely. GitHub is sometimes a great source of source (hah!) but
> is all too often a place to put 'stuff that I don't want to be bothered
> with any more'...
>
>
> I doubt that reverse-engineering the code will help, because it only
> traverses the DAWG data - it doesn't show how to build the DAWG from a
> (British) dictionary in the first place!
>
> ie head meet brick-wall!
>
>
>> Thanks to the resources provided I think I now have a rough
>> understanding of DAWG's but breaking down how the one in the module was
>> written to file and such is still challenge.
> I was thinking that the concept of building a Finite State Machine to
> traverse a DAWG - and to build the DAWG in the first place, would make
> quite a good programming (design and coding) assignment...
>
>
> If GitHub is no help, try 'the cheese shop' aka PyPi (https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpypi.org%2F&data=04%7C01%7C%7C2a955077c73846bc591408da1b4e0223%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637852319105031206%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=GVpzhbq13PcGEs%2FRN9efXj8%2FtIs7X8XcG5HBZdHQzcg%3D&reserved=0).
> Searching for DAWG realised some familiar 'hit's.
>
> It also mentions tries as an alternative data-structure.
>
> Noted "lexpy" which has been updated more recently (but then, once it's
> working, what updates are necessary?). It offers the choice of DAWG or
> trie, and seems to feature both the 'building' of the lexicon, and its
> (later) interrogation.
>
>
> Let us know how you get on...
--
Best Wishes,
Nathan Smith, BSC
My Website: https://nathantech.net
More information about the Tutor
mailing list