[Tutor] Absolute newbie - Transliteration
David Rogers
davidrogers@telus.net
Thu May 22 23:47:01 2003
Thank you, Mr. Lyck=E5, for your very helpful and detailed answer. I'll=20=
chew on it for a while and see what happens.
> I have a feeling, it might not be completely trivial to do this at =
all.
> But that depends...
I see how much complication one could get into. My goal here is to=20
make it faster for myself to transcribe the text of songs, which are=20
usually not too long - that means (I think) that I can have the script=20=
do the obvious stuff, and leave the "special" things to do myself. Not=20=
a perfect solution, but a drudgery-remover nonetheless.
> Your main problem is that little softing symbol (that looks a bit
> like 'b'). Somehow, you need to look ahead, to see if that's coming
> after the current consonant, or perhaps it's easier to handle that
> whe it comes, and make a correction after the fact.
Can I put some letter-combinations that include the 'soft' symbol at=20
the beginning of my dictionary, and have them evaluated first, thus=20
bypassing the single-letter entries that come later? Or does a=20
dictionary work in non-sequential order?
> I think most other languages are much, much harder than Russian. :(
>
> English is hopeless. Laugh, Garage, Women... Swedish is fairly =
hopeless
> as well.
Clearly, I'm only going to be able to use this on languages I already=20
know at least a little, so that I can correct the results afterward. =20=
(btw, I think Laugh Garage Women would make an excellent name for a=20
band...)
> I think you realize by now (if not before) that the amount of shared
> code for a thing like this is fairly small. =46rom Russian seems to be
> truly trivial compared to translitteration from most western European
> languages. For English, you would need to build in a major=20
> understanding
> of the language. I don't know if the information you need to include=20=
> can
> be described in a much shorter format than the output you would=20
> generate
> from a really big word list. And if that's the case, it's obviously=20
> rather
> futile... I assume there is linguistic research done in that sector=20
> though.
> Danny Yoo usually knows these things...
I think you're right about the general futility of a project like this,=20=
for use by real translators or anything like that. For my little=20
project of transcribing song texts to make them easier for non-native=20
speakers of those languages, I think it will save me some time, since I=20=
only have to do the simple stuff once, in the dictionary, and can then=20=
concentrate on fixing the exceptions.
For me, not knowing how to use Python yet, the "shared code" amounts to=20=
(1) seeing examples of the possible dictionary formats, and (2) samples=20=
of the incantations required to get stuff back out of them. :-)
You've given me that and much more besides, and now it's time for me to=20=
experiment and see if I can get it to work. I'll post again with=20
details if I get a half-decent result.
Again, thank you very much.
David=