[Tutor] Trying to parse a HUGE(1gb) xml file in python

David Hutto smokefloat at gmail.com
Tue Dec 21 12:02:23 CET 2010


On Tue, Dec 21, 2010 at 5:45 AM, Alan Gauld <alan.gauld at btinternet.com> wrote:
>
> "David Hutto" <smokefloat at gmail.com> wrote
>
>> That';s what I saying above that xml seems to be the hog in terms of
>> it's user defined tags. Is that somewhat a confirmation of my hunch,
>> that it's the length of the users predefined tags that add to the
>> above mess, and that maybe a lessened tag system in accordance with
>> xml might be better, or a simple <a> tag <b> tag in the xml(other
>> files) with an index  to point to a and b would be better.
>
> Shorter tags reduce the data volume by a bit (and it can be a
> big bit if the names are all 20 characters long!) but the inherent tag
> structure, even with single char names will still often surpass the
> data content.
>
> <i>
> 5
> </i>


>
> 8 bytes to describe an int which could be represented in
> a single byte in binary (or even in CSV).

But that byte can't describe the tag(google hold my hand). I'll get
this eventually, but my iostream is long on content and hard on
parsing. So many languages, and technology, yet so little time.

Even if the int were
> a 64bit binary value (8 bytes) the minimal tag structure still
> consumes the same data width. Of course if the data
> content is a long string then simple tags become cost
> effective (think <p> in XHTML)...
>
> HTH,
>
>
> --
> Alan Gauld
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>



-- 
They're installing the breathalyzer on my email account next week.


More information about the Tutor mailing list