Using Python for a demonstration in historical linguistics

Sat Nov 27 08:25:05 EST 2010

2010/11/27 Dax Bloom <bloom.dax at gmail.com>:
> On Nov 6, 6:41 am, Vlastimil Brom <vlastimil.b... at gmail.com> wrote:
>> 2010/11/6 Dax Bloom <bloom.... at gmail.com>:
>> ...
>> Rask_Grimm_re = ur"[bdgptk]ʰ?"
>> Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ...
>>
>> def repl_fn(m):
>>     return Rask_Grimm_dct.get(m.group(), m.group())
>>
>> ie_txt = u" bʰrāter ... "
>> almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
>> print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD
>>
>> ########################################
>>
>>  bʰrāter ...  >>  brāþer ...
>>
>> hth,
>>   vbr
> ...
> Hello Vlastimil,
>
> Could you please explain what the variables %s and % mean and how to
> implement this part of the code in a working python program? I can't
> fully appreciate Peter's quote on rules
>
>
> Best regards,
>
> Dax Bloom
>
Hi, the mentioned part is called string interpolation;
the last line is equivalent to
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD
is equivalent to the simple string concatenation:
print ie_txt+ u" >> " + almost_germ_txt
see:
http://docs.python.org/library/stdtypes.html#string-formatting-operations

The values of the tuple (or eventually dict or another mapping) given
after the modulo operator % are inserted at the respective positions
(here %s) of the preceding string (or unicode);
some more advanced adjustments or conversions are also possible here,
which aren't needed in this simple case.

(There is also another string formatting mechanism in the newer
versions of python
http://docs.python.org/library/string.html#formatstrings
which may be more suitable for more complex tasks.)

The implementation depends on the rest of your program and the
input/output of the data, you wish to have (to be able to print the
output with rather non-trivial characters, you will need the unicode
enabled console (Idle is a basic one available with python).
Otherwise the sample is self contained and should be runnable as is;
you can add other needed items to Rask_Grimm_dct and all substrings
matching Rask_Grimm_re will be replaced in one pass.
You can also add a series of such replacements (re pattern and a dict
of a ie: germ pairs), of course only for context-free changes.
On the other hand, I have no simple idea how th deal with Verner's Law
and the like (even if you passed the accents in the PIE forms); well
besides a lexicographic approach, where you would have to identify the
word stems to decide the changes to be applied.

hth,
  vbr