Help to find a regular expression to parse po file
gialloporpora
"sandrodll[remove]" at googlemail.com
Mon Jul 6 12:32:47 EDT 2009
Risposta al messaggio di Hallvard B Furuseth :
>
> I don't know the syntax of a po file, but this works for the
> snippet you posted:
>
> arg_re = r'"[^\\\"]*(?:\\.[^\\\"]*)*"'
> arg_re = '%s(?:\s+%s)*' % (arg_re, arg_re)
> find_re = re.compile(
> r'^msgid\s+(' + arg_re + ')\s*\nmsgstr\s+(' + arg_re + ')\s*\n', re.M)
>
> However, can \ quote a newline? If so, replace \\. with \\[\s\S] or
> something.
> Can there be other keywords between msgid and msgstr? If so,
> add something like (?:\w+\s+<arg_re>\s*\n)*? between them.
> Can msgstr come before msgid? If so, forget using a single regexp.
> Anything else to the syntax to look out for? Single quotes, maybe?
>
> Is it a problem if the regexp isn't quite right and doesn't match all
> cases, yet doesn't report an error when that happens?
>
> All in all, it may be a bad idea to sqeeze this into a single regexp.
> It gets ugly real fast. Might be better to parse the file in a more
> regular way, maybe using regexps just to extract each (keyword, "value")
> pair.
>
Thank you very much, Haldvard, it seem to works, there is a strange
match in the file header but I could skip the first match.
The po files have this structure:
http://bit.ly/18qbVc
msgid "string to translate"
" second string to match"
" n string to match"
msgstr "translated sting"
" second translated string"
" n translated string"
One or more new line before the next group.
In past I have created a Python script to parse PO files where msgid
and msgstr are in two sequential lines, for example:
msgid "string to translate"
msgstr "translated string"
now the problem is how to match also (optional) string between msgid and
msgstr.
Sandro
More information about the Python-list
mailing list