[XML-SIG] Re: ElementTree
Greg Wilson
gvwilson at cs.utoronto.ca
Wed Mar 16 22:08:27 CET 2005
Hi everyone. I posted a problem with ElementTree to c.l.py yesterday.
Fredrik sent me a one-line patch (included below). I applied it, but
ElementTree still fails in the same place, the same way, so I switched to
cElementTree. It parses half of my input document, but fails on the
fourth occurrence of &rquot; --- it handles the previous three, and
occurrences of &lquot; and &ldots;, just fine. As before, xml.dom.minidom
parses the document without complaint. Any ideas? My file, the DTD, and
my script are attached; validate.py is dying on line 130 of the input
file.
Thanks,
Greg
On Wed, 16 Mar 2005, Fredrik Lundh wrote:
> hi greg,
>
> > Hi Frederik. I added ElementTree to the data crunching book, then went
> > back and started revising my PSF-funded course material tools to use it.
> > Immediately ran into a problem with DTDs (described in the first
> > attachment, which I posted to c.l.python y'day).
>
> did you see my reply?
>
> http://article.gmane.org/gmane.comp.python.general/392915
-------------- next part --------------
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE lec SYSTEM "swc.dtd">
<lec title="Introduction" id="intro" svn="$Id: intro.swc 21 2005-03-16 18:08:45Z gvwilson $">
<topic title="Motivation" summary="motivation for course">
<slide>
<b1>Computers are as important to scientists as telescopes and test tubes
<b2>Analyze problems that are too complex for traditional means</b2>
<b2>Simulate things that can't be studied in laboratories</b2>
</b1>
<b1>Many scientists now spend much of their professional lives writing and maintaining software
<b2>A quarter of graduate students in science and engineering spend 25-50% of their time programming</b2>
</b1>
<b1>But most scientists have never been taught how to do this efficiently
<b2>It's a long way from the loops and arrays of first year to simulating bone development in foetal marsupials&ldots;</b2>
<b2>Like being shown how to differentiate polynomials, then expected to invent the rest of calculus</b2>
</b1>
<b1>This course will teach you how to design, build, maintain, and share programs more efficiently
<b2>Focus: tools and techniques appropriate for half a dozen people working together for a year
<b3>Everything you do at that scale will also make you more productive when you're working on your own for a week</b3>
</b2>
<b2>Will <e>not</e> turn you into a computer scientist
<b3>Far too many of them around anyway</b3>
</b2>
<b2>Instead, goal is to teach you the equivalent of good laboratory technique for computational science
<b3>The 20% of ideas that account for 80% of real world use</b3>
<b3>Software carpentry, rather than software engineering</b3>
</b2>
</b1>
</slide>
</topic>
<topic title="Meeting Standards" summary="need to improve quality as well as efficiency">
<slide>
<b1>Experimental results are only publishable if they are believed to be <e>correct</e> and <e>reproducible</e>
<b2>Equipment calibrated, samples uncontaminated, relevant steps recorded</b2>
<b2>In practice, almost always rely on the professionalism of the people doing the work</b2>
</b1>
<b1>How well do computational scientists meet these standards?
<b2>Correctness of code rarely questioned
<b3>We all know programs are buggy, but when was the last time you saw a paper rejected because of concerns over the quality of the software used to produce the results?</b3>
</b2>
<b2>Reproducibility often nonexistent
<b3>How many people can reproduce, much less trace, each result in their thesis?</b3>
</b2>
</b1>
<b1>Quality expectations can change overnight
<b2>Like the American car market when German and Japanese imports appeared in the 1970s</b2>
</b1>
</slide>
</topic>
<topic title="Who You Are" summary="target audience">
<slide>
<b1>User stories
<b2>Important part of designing user interfaces for mass-distribution software</b2>
<b2>Helps make discussion of features and usability more concrete</b2>
</b1>
<b1>Bhargan Basepair
<b2>27; B.Sc. in zoology</b2>
<b2>Did an introductory Fortran course nine years ago, and attended a workshop on web-based bioinformatics tools when he started his job</b2>
<b2>Now developing fuzzy pattern-matching algorithms for Genes'R'Us, a biotech firm with labs in four countries</b2>
</b1>
<b1>Harald Helmet
<b2>23; B.Eng in mechanical engineering, now doing an M.Sc. part time</b2>
<b2>Did C in first year; has been using MATLAB ever since</b2>
<b2>Modeling thermal degradation (a.k.a. &lquot;melting&rquot;) of firefighters's helmets</b2>
</b1>
<b1>Rachel Rotor
<b2>34; Ph.D. in physics</b2>
<b2>Took two courses on C and two on numerical analysis as an undergrad, and a computer graphics course as a graduate student</b2>
<b2>Now in charge of the 5-person flywheel braking group at Yoyodyne Inc.</b2>
</b1>
<b1>Sally Synthesis
<b2>22; finished a B.Eng. in chemical engineering last year, now doing an M.Sc. in chemistry</b2>
<b2>Did Java in first year, taught herself C, and has built a personal web site (static HTML only)</b2>
<b2>Thesis topic is improving the yield of fullerene production processes</b2>
</b1>
</slide>
</topic>
<topic title="A Quick Self-Test" summary="self-test">
<slide>
<b1>Adapted from Joel Spolsky <cite ref="spolsky-joel-on-software"/>
<b2>0 for &lquot;no&rquot;, 1 for &lquot;yes&rquot;</b2>
<b2>-1 if you don't know what the term means, or how to tell</b2>
</b1>
<b1>So:
<b2>Do you use version control?</b2>
<b2>Can you rebuild everything in one step?</b2>
<b2>Do you have an automated test suite?
<b3>Bonus marks if the tests report how much of the code they exercise</b3>
</b2>
<b2>Do you build the software, and run the test suite, daily?</b2>
<b2>Do you have a bug database?</b2>
<b2>Do you use a symbolic debugger?</b2>
<b2>Is your code written in a uniform, readable way?
<b3>Bonus marks if you use a style checker to check this automatically</b3>
</b2>
<b2>Is there a searchable archive of project-related communication?</b2>
<b2>Do you have an up-to-date schedule with binary milestones?</b2>
<b2>Can you trace everything you release back to the software that produced it?</b2>
<b2>Do you do code reviews?</b2>
<b2>Is time set aside in the schedule for infrastructure development and training?</b2>
</b1>
<b1>And your score is?</b1>
</slide>
</topic>
<topic title="Learn by Building" summary="course philosophy">
<slide>
<b1>So why are we where we are?
<b2>It's difficult to learn these things from academic computer scientists
<b3>CS research is more concerned with rapid prototyping than with reliability</b3>
</b2>
<b2>People are naturally sceptical of innovation
<b3>Particularly after they've seen a few bandwagons roll through</b3>
<b3>Glass's Law <cite ref="glass-software-engineering-facts"/>: any new way of doing things initially slows you down</b3>
</b2>
<b2>You only have to be as good as the competition
<b3>American auto makers in the 1970s</b3>
</b2>
</b1>
<b1>This course's approach:
<b2>Introduce some basic tools
<b3>Students immediately see benefit of taking the course</b3>
<b3>Tools can be used to manage the course itself</b3>
</b2>
<b2>Show students how to build tools like these
<b3>Where &lquot;how&rquot; includes both what goes into the software, and how to create it</b3>
<b3>Solidifies understanding of tools' capabilities and limitations</b3>
<b3>Makes discussion of technique more concrete</b3>
</b2>
<b2>Show students what else they can do with their new skills
<b3>The right way to tackle issues that come up over and over again</b3>
</b2>
</b1>
<b1>Key point: avoid overload
<b2>People who already know these things tend to underestimate how hard they are to learn</b2>
<b2>No point preaching to the top 10%</b2>
<b2>Try instead to move the middle of the bell curve to the right</b2>
</b1>
</slide>
</topic>
<topic title="Topics" summary="topics">
<slide>
<b1>Three Tools
<b2><ref sec="shell" text="title"/></b2>
<b2><ref sec="version" text="title"/></b2>
<b2><ref sec="make" text="title"/></b2>
</b1>
<b1>Programming
<b2><ref sec="py01" text="title"/></b2>
<b2><ref sec="py02" text="title"/></b2>
<b2><ref sec="py03" text="title"/></b2>
<b2><ref sec="ads" text="title"/></b2>
<b2><ref sec="py04" text="title"/></b2>
</b1>
<b1>Individual Practices
<b2><ref sec="test01" text="title"/></b2>
<b2><ref sec="test02" text="title"/></b2>
<b2><ref sec="debugger" text="title"/></b2>
<b2><ref sec="debugging" text="title"/></b2>
<b2><ref sec="style" text="title"/></b2>
<b2><ref sec="team" text="title"/></b2>
</b1>
<b1>Data Crunching
<b2><ref sec="re" text="title"/></b2>
<b2><ref sec="xml01" text="title"/></b2>
<b2><ref sec="xml02" text="title"/></b2>
<b2><ref sec="sql01" text="title"/></b2>
<b2><ref sec="sql02" text="title"/></b2>
</b1>
<b1>The Web
<b2><ref sec="http" text="title"/></b2>
<b2><ref sec="cgi01" text="title"/></b2>
<b2><ref sec="security" text="title"/></b2>
<b2><ref sec="cgi02" text="title"/></b2>
</b1>
<b1>Putting It All Together
<b2><ref sec="proj101" text="title"/></b2>
<b2><ref sec="proj102" text="title"/></b2>
<b2><ref sec="summary" text="title"/></b2>
</b1>
</slide>
</topic>
<topic title="Setting Up" summary="what you will need">
<slide>
<b1>Some previous programming experience
<b2><c>for</c> loops, <c>if</c>/<c>then</c>/<c>else</c></b2>
<b2>Function calls</b2>
<b2>Arrays</b2>
<b2>File I/O</b2>
<b2>Compilation</b2>
</b1>
<b1>Individual setup
<b2>Python (version 2.4 or higher)</b2>
<b2>Cygwin (on Windows)</b2>
<b2>DrPython
<b3>Or Komodo</b3>
<b3>At least get a smart editor</b3>
</b2>
</b1>
<b1>Course setup
<b2>Subversion</b2>
<b2>Trac
<b3>Apache</b3>
<b3>SQLite or PostgreSQL</b3>
<b3>PySVN</b3>
</b2>
</b1>
<b1>Time
<b2>Expect to spend 2-3 hours outside class for each lecture</b2>
</b1>
</slide>
</topic>
<topic title="Recommended Reading" summary="recommended reading">
<slide>
<b1>Books
<b2><cite ref="hunt-thomas-pragmatic-programmer"/></b2>
<b2><cite ref="glass-software-engineering-facts"/></b2>
<b2><cite ref="spolsky-joel-on-software"/></b2>
<b2><cite ref="lutz-ascher-learning-python"/></b2>
<b2><cite ref="wilson-data-crunching"/></b2>
</b1>
<b1>Web resources
<b2><fixme>Create list of useful links</fixme></b2>
</b1>
</slide>
</topic>
<topic title="The Rules" summary="rules of programming">
<slide format="enum">
<b1>A week of hard work can sometimes save you an hour of thought.</b1>
<b1>If it's worth doing again, it's worth automating.</b1>
<b1>Anything repeated in two or more places will eventually be wrong in at least one.</b1>
<b1>The three chief virtues of a programmer are laziness, impatience, and hubris.</b1>
<b1>It's not what you know, it's what you can.</b1>
<b1>The deadline isn't when you're supposed to finish; the deadline is when it starts to be late.</b1>
<b1>Never debug standing up.</b1>
<b1>Tools are signposts, not destinations.</b1>
<b1>Not everything worth doing is worth doing well.</b1>
<b1>Code unto others as you would have others code unto you.</b1>
<b1>Every complex file format eventually turns into a badly-designed programming language.</b1>
</slide>
</topic>
</lec>
-------------- next part --------------
<!-- $Id: swc.dtd 22 2005-03-16 18:09:19Z gvwilson $ -->
<!ENTITY ldots "舰">
<!ENTITY lquot "“">
<!ENTITY rquot "”">
-------------- next part --------------
#!/usr/bin/env python
import sys, os
import cElementTree as ElementTree
for filename in sys.argv[1:]:
ElementTree.parse(filename)
More information about the XML-SIG
mailing list