problem negative lookahead assertion

sjoerd siebinga ssiebinga at fa.knaw.nl
Tue Apr 16 08:20:01 EDT 2002


Thanks Geoff and Amk for your advise.

Sadly enough my input data requires nested {} due to specialist
diacritics use.So the ([^}]*?) option gives more errors when
processing the data through the latex interpreter. I used the
([\s,\W]) regex to grab all the curly braces in  the emph.

Now it is time to ask a silly question. I am a trained linguist and
not a programmer.

> Note that this will break in a different situation, if you have {} inside  
> the contents of \emph, as for example in \emph{ab^{cd}}, because this 
> problem really needs a full parser to handle the general case.

What do you mean by a 'full parser'?

As Geoff asked I will send  a couple of samples. My Python script is
rather too large to be send to the newsgroup. If either of you want to
see it, I can send it to you with a couple of sample lemmata. Another
point is that the substitutions are preformed in turn much like an sed
script. The emph regex is processed last.


Input before processing.

\begin{lemma}\begin{entry}wald\end{entry}\begin{seman}approval\end{seman}\begin{grammar}subst.f.\end{grammar}
\begin{level}PIE\end{level} \begin{misc} On the Ds B \emph{weld(e)},
cf. Aofr. 143-6.\end{misc} \begin{pfris}*wald\end{pfris}
\begin{germdata} ON \emph{vald}, OE \emph{weald}, OS \emph{gi]wald},
OHG \emph{gi]walt}, MHG \emph{(ge)walt}, MLG \emph{(ge)walt}, 
\emph{(ge)w\=o lt}, ODu. \emph{ge]walt}, MDu. \emph{(ge)wout},
\emph{(ge)wolt}, \emph{(ge)walt}, \emph{ge]welt} `power', cf.  also \#
str. VII: Goth. \emph{waldan}, ON \emph{valda}, OE \emph{wealdan}, OS
\emph{waldan}, OHG \emph{waltan},  MHG \emph{walten}, MLG
\emph{w\textbrevemacron{a}lden}, \emph{w\=olden}, OFris. \emph{walda}
(q.v.), ODu.  \emph{waldan}, MDu. \emph{wouden} `rule'; \# *waldiga-:
OS \emph{gi]weldig}, OHG \emph{gi]welt\={\i}g}, \emph{gi]walt\={\i}g},
MHG \emph{(ge)waltec}, \emph{-waltic}, \emph{-weltic}, MLG
\emph{(ge)w\textbrevemacron{a}ldich}, \emph{-weldich},
\emph{-w\=oldich}, \emph{-w\"oldich}, OFris.  \emph{weldech} (q.v.),
MDu. \emph{(ge)weldich} `powerful', with denom. OFris. 
\emph{weldegia} (q.v.), MLG \emph{weldigen}, \emph{waldigen}
`adjudge', etc.\end{germdata} \begin{pgerm}*waldi-\end{pgerm}  
\begin{pie}*ulh$_2$-d\textsuperscript{h}-\end{pie} \begin{pokpage}
1111-2\end{pokpage} \begin{piemisc} Toch. A \emph{w\"al}, B
\emph{walo} `king', Lat. \emph{val\=e\=o} `be strong'; with a dental
extension as  in Gmc. Lith. \emph{v\'{e}ldu}, \emph{veld\d{e}?ti}
`rule'\end{piemisc} \begin{biblio}Seebold 536-7\end{biblio}
\begin{seealso} walda ,  weldich ,  weldegia \end{seealso}  
\end{lemma}

Input after processing without the emph substitution

\begin{lemma}
\begin{entry}wada \index{r1~wada} \end{entry}
\begin{seman}wade\end{seman}
\begin{grammar}str.vb.VI\end{grammar}
 \begin{level}PIE [?]\end{level}  
\begin{pfris}*wada \index{pfris~*wada} \end{pfris} 
\begin{germdata} ON \emph{va\symbol{'360}a}
\index{on~va\symbol{'360}a}`wade, rush, walk through', OE
\emph{wadan}, \index{oe~wadan} OHG, MHG \emph{watan} 
\index{mhg~watan}`wade, stride', MLG \emph{w\=aden},
\index{mlg~w\=aden} MDu. \emph{waden}, \index{mdu~waden} \emph{waeyen}
`wade, go'\end{germdata}
\begin{pgerm}*wadanaN \index{pgerm~*wadanaN} \end{pgerm}  
\begin{pie}*uh$_2$d\textsuperscript{h}- [?]
\index{pie~*uh$_2$d\textsuperscript{h}- [?]} \end{pie}
\begin{piemisc} Semantically close is Lat. \emph{vad\=are} 
\index{lat~vad\=are}`to wade through, ford', \emph{vadum} `ford'. It
cannot be established with certainty whether Alb. \emph{va} 
\index{alb~va}`ford' is historically cognate with \emph{vadum} or is a
borrowing of this Lat. form (Demiraj 1997: 405). The Latin forms seem
to be cognate with \emph{v\=adere} `go, walk, rush'. If we allow a
connection with words denoting a less specific way of moving than
wading in particular, like in this Lat. instance, Arm. \emph{gam} 
\index{arm~gam}`come' could furthermore be compared, pointing to a PIE
heritage [POK 1109]. Note also that less specific meanings are
attested within Gmc., where the ON verb can also mean `rush, walk
through' and the OE, OHG, MHG, MLG and MDu. verbs can also denote the
notion `go, stride'.\end{piemisc}
\begin{biblio}Schrijver 170; Seebold 530-1\end{biblio}   
\end{lemma}

After processing with the emph regex

\begin{lemma}
\begin{entry}wald \index{r1~wald} \end{entry}
\begin{seman}approval\end{seman}
\begin{grammar}subst.f.\end{grammar}
 \begin{level}PIE\end{level}
 \begin{misc} On the Ds Toch. B \emph{weld(e)}, \index{unl~weld(e)}
\index{tochb~weld(e)} \index{tochb~weld(e)} cf. Aofr. 143-6.\end{misc}
 \begin{pfris}*wald \index{pfris~*wald} \end{pfris}
 \begin{germdata} ON \emph{vald} \index{on~vald} OE \emph{weald},
\index{unl~weald} \index{oe~weald} OS \emph{gi]wald},
\index{unl~gi]wald} \index{os~gi]wald} OHG \emph{gi]walt},
\index{unl~gi]walt} \index{ohg~gi]walt} MHG \emph{(ge)walt},
\index{unl~(ge)walt} \index{mhg~(ge)walt} MLG \emph{(ge)walt},
\index{unl~(ge)walt} \index{mlg~(ge)walt} \emph{(ge)w\=o lt},
\index{unl~(ge)w\=o lt} ODu. \emph{ge]walt}, \index{unl~ge]walt}
\index{odu~ge]walt} MDu. \emph{(ge)wout}, \index{unl~(ge)wout}
\index{mdu~(ge)wout} \emph{(ge)wolt}, \index{unl~(ge)wolt}
\emph{(ge)walt}, \index{unl~(ge)walt} \emph{ge]welt} 
\index{unl~ge]welt}`power', cf. also \# str. VII: Goth. \emph{waldan},
\index{unl~waldan} \index{goth~waldan} ON \emph{valda}
\index{on~valda} OE \emph{wealdan}, \index{unl~wealdan}
\index{oe~wealdan} OS \emph{waldan}, \index{unl~waldan}
\index{os~waldan} OHG \emph{waltan}, \index{unl~waltan}
\index{ohg~waltan} MHG \emph{walten}, \index{unl~walten}
\index{mhg~walten} MLG \emph{w\textbrevemacron{a}lden},
\index{mlg~w\textbrevemacron{a}lden} \emph{w\=olden},
\index{unl~w\=olden} OFris. \emph{walda}  \index{unl~walda}
\index{ofris~walda} \index{ofris~walda}(q.v.), ODu. \emph{waldan},
\index{unl~waldan} \index{odu~waldan} MDu. \emph{wouden} 
\index{unl~wouden} \index{mdu~wouden}`rule'; \# *waldiga-: OS
\emph{gi]weldig}, \index{unl~gi]weldig} \index{os~gi]weldig} OHG
\emph{gi]welt\={\i}g}, \index{ohg~gi]welt\={\i}g}
\emph{gi]walt\={\i}g}, MHG \emph{(ge)waltec}, \index{unl~(ge)waltec}
\index{mhg~(ge)waltec} \emph{-waltic}, \index{unl~-waltic}
\emph{-weltic}, \index{unl~-weltic} MLG
\emph{(ge)w\textbrevemacron{a}ldich},
\index{mlg~(ge)w\textbrevemacron{a}ldich} \emph{-weldich},
\index{unl~-weldich} \emph{-w\=oldich}, \index{unl~-w\=oldich}
\emph{-w\"oldich}, \index{unl~-w\"oldich} OFris. \emph{weldech} 
\index{unl~weldech} \index{ofris~weldech} \index{ofris~weldech}(q.v.),
MDu. \emph{(ge)weldich}  \index{unl~(ge)weldich}
\index{mdu~(ge)weldich}`powerful', with denom. OFris. \emph{weldegia} 
\index{unl~weldegia} \index{ofris~weldegia}
\index{ofris~weldegia}(q.v.), MLG \emph{weldigen},
\index{unl~weldigen} \index{mlg~weldigen} \emph{waldigen} 
\index{unl~waldigen}`adjudge', etc.\end{germdata}
 \begin{pgerm}*waldi- \index{pgerm~*waldi-} \end{pgerm}  
\begin{pie}*ulh$_2$-d\textsuperscript{h}-
\index{pie~*ulh$_2$-d\textsuperscript{h}-} \end{pie}
\begin{pokpage} 1111-2\end{pokpage} 
\begin{piemisc} Toch. A \emph{w\"al}, \index{unl~w\"al}
\index{tocha~w\"al} Toch. B \emph{walo}  \index{unl~walo}
\index{tochb~walo} \index{tochb~walo}`king', Lat. \emph{val\=e\=o} 
\index{unl~val\=e\=o} \index{lat~val\=e\=o}`be strong'; with a dental
extension as in Gmc. Lith. \emph{v\'{e}ldu}, \index{lith~v\'{e}ldu}
\index{lith~v\'{e}ldu} \emph{veld\d{e}? \index{unl~veld\d{e}ti}
`rule'\end{piemisc}
\begin{biblio}Seebold 536-7\end{biblio} 
\begin{seealso} walda , weldich , weldegia \end{seealso}  
\end{lemma}

and what I wanted to see was

\begin{lemma}
\begin{entry}wada \index{r1~wada} \end{entry}
\begin{seman}wade\end{seman}
\begin{grammar}str.vb.VI\end{grammar}
 \begin{level}PIE [?]\end{level}  
\begin{pfris}*wada \index{pfris~*wada} \end{pfris} 
\begin{germdata} ON \emph{va\symbol{'360}a}
\index{on~va\symbol{'360}a}`wade, rush, walk through', OE
\emph{wadan}, \index{oe~wadan} OHG, MHG \emph{watan} 
\index{mhg~watan}`wade, stride', MLG \emph{w\=aden},
\index{mlg~w\=aden} MDu. \emph{waden}, \index{mdu~waden} \emph{waeyen}
<<<\index{unl~waeyen}>>> `wade, go'\end{germdata}
\begin{pgerm}*wadanaN \index{pgerm~*wadanaN} \end{pgerm}  
\begin{pie}*uh$_2$d\textsuperscript{h}- [?]
\index{pie~*uh$_2$d\textsuperscript{h}- [?]} \end{pie}
\begin{piemisc} Semantically close is Lat. \emph{vad\=are} 
\index{lat~vad\=are}`to wade through, ford', \emph{vadum}
<<<\index{unl~vadum}>>>`ford'. It cannot be established with certainty
whether Alb. \emph{va}  \index{alb~va}`ford' is historically cognate
with \emph{vadum} <<<\index{unl~vadum}>>> or is a borrowing of this
Lat. form (Demiraj 1997: 405). The Latin forms seem to be cognate with
\emph{v\=adere} <<<\index{unl~v\=adere}>>> `go, walk, rush'. If we
allow a connection with words denoting a less specific way of moving
than wading in particular, like in this Lat. instance, Arm. \emph{gam}
 \index{arm~gam}`come' could furthermore be compared, pointing to a
PIE heritage [POK 1109]. Note also that less specific meanings are
attested within Gmc., where the ON verb can also mean `rush, walk
through' and the OE, OHG, MHG, MLG and MDu. verbs can also denote the
notion `go, stride'.\end{piemisc}
\begin{biblio}Schrijver 170; Seebold 530-1\end{biblio}   
\end{lemma}

The forms that should have been substituted are inclosed between <<<
and >>>.

regards sjoerd



More information about the Python-list mailing list