Help to find a regular expression to parse po file

gialloporpora "sandrodll[remove]" at googlemail.com
Mon Jul 6 13:42:46 EDT 2009


Risposta al messaggio di MRAB :

> gialloporpora wrote:
>> Hi all,
>> I would like to extract string from a PO file. To do this I have created
>> a little python function to parse po file and extract string:
>>
>> import re
>> regex=re.compile("msgid (.*)\\nmsgstr (.*)\\n\\n")
>> m=r.findall(s)
>>
>> where s is a po file like this:
>>
>> msgctxt "write ubiquity commands.description"
>> msgid "Takes you to the Ubiquity<a
>> href=\"chrome://ubiquity/content/editor.html\">command editor</a>  page."
>> msgstr "Apre l'<a href=\"chrome://ubiquity/content/editor.html\">editor
>> dei comandi</a>  di Ubiquity."
>>
>>
>> #. list ubiquity commands command:
>> #. use | to separate multiple name values:
>> msgctxt "list ubiquity commands.names"
>> msgid "list ubiquity commands"
>> msgstr "elenco comandi disponibili"
>>
>> msgctxt "list ubiquity commands.description"
>> msgid "Opens<a href=\"chrome://ubiquity/content/cmdlist.html\">the
>> list</a>\n"
>> "      of all Ubiquity commands available and what they all do."
>> msgstr "Apre una<a
>> href=\"chrome://ubiquity/content/cmdlist.html\">pagina</a>\n"
>> "      in cui sono elencati tutti i comandi disponibili e per ognuno
>> viene spiegato in breve a cosa serve."
>>
>>
>>
>> #. change ubiquity settings command:
>> #. use | to separate multiple name values:
>> msgctxt "change ubiquity settings.names"
>> msgid "change ubiquity settings|change ubiquity preferences|change
>> ubiquity skin"
>> msgstr "modifica impostazioni di ubiquity|modifica preferenze di
>> ubiquity|modifica tema di ubiquity"
>>
>> msgctxt "change ubiquity settings.description"
>> msgid "Takes you to the<a
>> href=\"chrome://ubiquity/content/settings.html\">settings</a>  page,\n"
>> "      where you can change your skin, key combinations, etc."
>> msgstr "Apre la pagina<a
>> href=\"chrome://ubiquity/content/settings.html\">delle impostazioni</a>
>> di Ubiquity,\n"
>> "     dalla quale è possibile modificare la combinazione da tastiera
>> utilizzata per richiamare Ubiquity, il tema, ecc."
>>
>>
>>
>> but, obviusly,  with the code above the  last string is not matched. If
>> I use re.DOTALL to match also new line character it not works because it
>> match the entire file, I would like to stop the matching when "msgstr"
>> is found.
>>
>> regex=re.compile("msgid (.*)\\nmsgstr (.*)\\n\\n\\n",re.DOTALL)
>>
>> is it possible or not ?
>>
> You could try:
>
> regex = re.compile(r"msgid (.*(?:\n".*")*)\nmsgstr (.*(?:\n".*")*)$")
>
> and then, if necessary, tidy what you get.


MRAB, thank you for your help, I have tried the code posted by Hallvard 
because I have seen it before and it works. Now I'll check also your 
suggestions.
Sandro

-- 
*Pink Floyd – The Great Gig in the Sky* - http://sn.im/kggo7
* FAQ* di /it-alt.comp.software.mozilla/: http://bit.ly/1MZ04d



More information about the Python-list mailing list