Trying to parse a HUGE(1gb) xml file

Sherm Pendley sherm.pendley at gmail.com
Tue Dec 28 10:29:43 EST 2010


"BartC" <bc at freeuk.com> writes:

>> Roy Smith, 28.12.2010 00:21:
>>> To go back to my earlier example of
>>>
>>>          <Parental-Advisory>FALSE</Parental-Advisory>
>>>
>
> Isn't it possible for XML to define a shorter alias for these tags? Isn't
> there a shortcut available for </Parental-Advisory> in simple examples like
> this (I seem to remember something like this)?

Yes, you can define your own entities in a DTD:

  <!ENTITY paf "<Parental-Advisory>FALSE</Parental-Advisory>">
  <!ENTITY pat "<Parental-Advisory>TRUE</Parental-Advisory>">

Later, in your document:

  &paf;
  &pat;

Although, this is a bit of a contrived example - if space is such a
major concern, one wouldn't be so wasteful of it to begin with, but
might instead use a short tag form whose value attribute defaults to
"FALSE".

  <!ELEMENT advisory EMPTY>
  <!ATTLIST advisory value (TRUE | FALSE) "FALSE">

Later, in your document:

  <movie title="Bambi"><advisory/></movie>
  <movie title="Scarface"><advisory value="TRUE"/></movie>

To save even more space, one could instead define a "pa" attribute as
part of the "movie" element, with a default value that would then take
no space at all:

  <!ATTLIST movie pa (TRUE | FALSE) "FALSE">

Later, in your document:

  <movie name="Bambi"/>
  <movie name="Scarface" pa="TRUE"/>

When you see someone doing stupid things with a tool, it's usually not
the tool's fault. Far more often, it's someone using the wrong tool for
the task at hand, or using the right tool the wrong way.

> And why not use 1 and 0 for TRUE and FALSE?

Sounds reasonable in general, although a parental advisory would more
often be a range of possible values (G, PG, R, MA, etc.) rather than a
boolean.

sherm--

-- 
Sherm Pendley
                                   <http://camelbones.sourceforge.net>
Cocoa Developer



More information about the Python-list mailing list