a simple unicode question

rurpy at yahoo.com rurpy at yahoo.com
Thu Oct 22 16:08:21 EDT 2009


On 10/22/2009 03:23 AM, Gabriel Genellina wrote:
> En Wed, 21 Oct 2009 15:14:32 -0300, <rurpy at yahoo.com> escribió:
>
>> On Oct 21, 4:59 am, Bruno Desthuilliers <bruno.
>> 42.desthuilli... at websiteburo.invalid> wrote:
>>> beSTEfar a écrit :
>>> (snip)
>>>  > When parsing strings, use Regular Expressions.
>>>
>>> And now you have _two_ problems <g>
>>>
>>> For some simple parsing problems, Python's string methods are powerful
>>> enough to make REs overkill. And for any complex enough parsing (any
>>> recursive construct for example - think XML, HTML, any programming
>>> language etc), REs are just NOT enough by themselves - you need a full
>>> blown parser.
>>
>> But keep in mind that many XML, HTML, etc parsing problems
>> are restricted to a subset where you know the nesting depth
>> is limited (often to 0 or 1), and for that large set of
>> problems, RE's *are* enough.
>
> I don't think so. Nesting isn't the only problem. RE's cannot handle
> comments, by example. And you must support unquoted attributes, single and
> double quotes, any attribute ordering, empty tags, arbitrary whitespace...
> If you don't, you are not reading XML (or HTML), only a specific file
> format that resembles XML but actually isn't.

OK, then let me rephrase my point as: in the real world it is often
not necessary to parse XML in it's full generality; parsing, as you
put it, "a specific file format that resembles XML" is all that is
really needed.



More information about the Python-list mailing list