a simple unicode question

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Oct 22 05:23:32 EDT 2009


En Wed, 21 Oct 2009 15:14:32 -0300, <rurpy at yahoo.com> escribió:

> On Oct 21, 4:59 am, Bruno Desthuilliers <bruno.
> 42.desthuilli... at websiteburo.invalid> wrote:
>> beSTEfar a écrit :
>> (snip)
>>  > When parsing strings, use Regular Expressions.
>>
>> And now you have _two_ problems <g>
>>
>> For some simple parsing problems, Python's string methods are powerful
>> enough to make REs overkill. And for any complex enough parsing (any
>> recursive construct for example - think XML, HTML, any programming
>> language etc), REs are just NOT enough by themselves - you need a full
>> blown parser.
>
> But keep in mind that many XML, HTML, etc parsing problems
> are restricted to a subset where you know the nesting depth
> is limited (often to 0 or 1), and for that large set of
> problems, RE's *are* enough.

I don't think so. Nesting isn't the only problem. RE's cannot handle  
comments, by example. And you must support unquoted attributes, single and  
double quotes, any attribute ordering, empty tags, arbitrary whitespace...  
If you don't, you are not reading XML (or HTML), only a specific file  
format that resembles XML but actually isn't.

-- 
Gabriel Genellina




More information about the Python-list mailing list