Automation

Chris Angelico rosuav at gmail.com
Tue Nov 19 10:26:18 CET 2013


On Tue, Nov 19, 2013 at 7:53 PM, Ian Kelly <ian.g.kelly at gmail.com> wrote:
> Aoilegpos for aidnoptg a cdocianorttry vwpiienot but, ttoheliacrley
> spkeaing, lgitehnneng the words can mnartafucue an iocnuurgons
> samenttet that is vlrtiauly isbpilechmoenrne.

isbpilechmoenrne. I totally want to find an excuse to use that word
somewhere.. It just looks awesome.

Paradoxically, it's actually more likely that a computer can figure
out what you're saying here. In fact, I could easily write a little
script that reads /usr/share/dict/words (or equivalent) and attempts
to decode your paragraph. Hmm. You know what, I think I will. It's now
0958 UTC, let's see how long this takes me.

Meh. I did something stupid and decided to use a regular expression.
It's not 1020 UTC, so that's 21 minutes of figuring out what I was
doing wrong with the regex and 1 minute solving the original problem.
But here's your translated paragraph:

-- cut --
Interestingly I'm studying this controversial phenomenon at the
Department of Linguistics at Absytrytewh University and my
extraordinary discoveries wholeheartedly contradict the picsbeliud
findings regarding the relative difficulty of instantly translating
sentences. My researchers developed a convenient contraption at
hnasoa/tw.nartswdbvweos/utrtek:p./il that demonstrates that the
hypothesis uniquely warrants credibility if the assumption that the
preponderance of your words is not extended is unquestionable.
Apologies for adopting a contradictory viewpoint but, theoretically
speaking, lengthening the words can manufacture an incongruous
statement that is virtually incomprehensible.
-- cut --

It couldn't figure out "Absytrytewh", "picsbeliud", or
"hnasoa/tw.nartswdbvweos/utrtek:p./il". That's not a bad result. (And
as a human, I'm guessing that the second one isn't an English word -
maybe it's Scots?) Here's the code:

words = {}
for word in open("/usr/share/dict/words"):
    word=word.strip().lower()
    transformed = word if len(word)==1 else
word[0]+''.join(sorted(word[1:-1]))+word[-1]
    words.setdefault(transformed,set()).add(word)
    words.setdefault(transformed.capitalize(),set()).add(word.capitalize())

import re
for line in open("input"):
    line=line.strip()
    for word in re.split("(\W+)",line):
        try:
            transformed = word if len(word)==1 else
word[0]+''.join(sorted(word[1:-1]))+word[-1]
            realword=words[transformed]
            if len(realword)>1: realword=repr(realword)
            else: realword=next(iter(realword))
            line=line.replace(word,realword)
        except LookupError: # catches three errors, all of which mean
we shouldn't translate anything
            pass
    print(line)


Yeah, it's not the greatest code, but it works :)

ChrisA



More information about the Python-list mailing list