[Tutor] Re: Regex (almost solved, parser problem remaining)

Andrei project5 at redrival.net
Sat Aug 30 19:39:35 EDT 2003


Erik Price wrote:
> 
> On Friday, August 29, 2003, at 01:31  PM, Andrei wrote:
> 
>> With my tests it works OK except for one thing: the SGML parser chokes 
>> on "&". The original says:
>>
>>    go to news://bl_a.com/?ha-ha&query=tb for more info
>>
>> and it's modified to:
>>
>>    go to <a 
>> href="news://bl_a.com/?ha-ha">news://bl_a.com/?ha-ha</a>&query=tb for 
>> more info
> I thought that it was invalid to have a "&" character that is not being 
> used as an entity reference in SGML or XML documents.  In other words, 
> the document should have "&amp;", not "&".  Most browsers don't enforce 
> it in HTML, though.

I don't know for sure that the source is valid HTML (and in fact I know 
for sure that some of the strings I process are plain text URLs). 
Anyway, I've eventually solved the issue by pre-processing the string 
(replacing the &'s with a longish string which is extremely unlikely to 
appear in the text), followed by the normal manipulation. At the end of 
course the instances of that magic string are converted back to &'s. Not 
very elegant, but it works :).

Andrei

=====
Mail address in header catches spam. Real contact info (decode with rot13):
cebwrpg5 at bcrenznvy.pbz. Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V 
ernq gur yvfg, fb gurer'f ab arrq gb PP.





More information about the Tutor mailing list