<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Aug 27, 2014 at 10:32 PM, Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com" target="_blank">rosuav@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div style="overflow:hidden">I'm not sure I understand how your 'common' value works, though. Does<br>
the default 0.6 mean you take the 60% most common words? Those above<br>
the 60th percentile of frequency? Something else?</div></blockquote></div><br>Yes, basically. A word has to pass the following hurdles before being deemed "common":</div><div class="gmail_extra"><br></div><div class="gmail_extra">
* length >= 4</div><div class="gmail_extra">* all lower case</div><div class="gmail_extra">* no punctuation</div><div class="gmail_extra">* not already "emitted" (made it to the common list)</div><div class="gmail_extra">
* seen this word at least 10 times</div><div class="gmail_extra">* have seen at least 100 words</div><div class="gmail_extra"><br></div><div class="gmail_extra">Then and only then, if its word count places it in the top T percent of all seen words (T defaults to 60%), is it added to the "emitted" or common word list. Only words in that list are chosen as password material. Further, the dict command allows you to identify words in the common list which aren't in your computer's words file. You can give any of them (or any other word you don't like) as arguments to the "bad" command.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">I won't pretend to understand all that entropy stuff, and I realize that given my 35k+ messages and my somewhat severe constraints, I have only deemed 1057 words from my corpus as "worthy" so far. That's about 10 bits of entropy per word? That obviously improves the chances my passwords can be guessed, but I suspect I can lower my T value sufficiently to increase the pool of candidate words to whatever amount of entropy you require. I agree though, it is a bit backwards from how the XKCD 936 thing works.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">I just realized something. To keep it from taking forever to start up before I had a pickle save file, I limited the messages to those since 2014-08-22. Not too many. Not sure how to deal with that, but for the moment, I initialize Polly.latest to 2014-05-01 in my sandbox (not checked in). That will considerably increase the number of messages scanned. While it's doing that (in a separate thread), I can watch the progress with the stat command at the ? prompt:</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">? stat</div><div class="gmail_extra">messages: 0</div><div class="gmail_extra">all words: 0</div><div class="gmail_extra">common words: 0</div><div class="gmail_extra">
'bad' words: 0</div><div class="gmail_extra">... time passes ...</div><div class="gmail_extra">? </div><div class="gmail_extra">messages: 716</div><div class="gmail_extra">all words: 4532</div><div class="gmail_extra">
common words: 725</div><div class="gmail_extra">'bad' words: 0</div><div class="gmail_extra">? bad flaskapp luofeiyu lilypond</div><div class="gmail_extra">? stat</div><div class="gmail_extra">messages: 1013</div>
<div class="gmail_extra">all words: 5637</div><div class="gmail_extra">common words: 994</div><div class="gmail_extra">'bad' words: 3</div><div class="gmail_extra">? </div><div class="gmail_extra">messages: 1361</div>
<div class="gmail_extra">all words: 6545</div><div class="gmail_extra">common words: 1251</div><div class="gmail_extra">'bad' words: 3</div><div class="gmail_extra">? password</div><div class="gmail_extra">formatted overlap relation itself</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">... and so on. After awhile I should near the 2000 common word set.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">Hmmm... I realize now that I'm not seeing all messages, at least I don't think so. So much to learn about IMAP...</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Skip<br></div><div class="gmail_extra"><br></div></div>