Python for large projects
Bill Rubenstein
wsr2 at swbell.net
Sun Mar 28 09:49:58 EST 2004
In article <mailman.357.1080147061.742.python-list at python.org>, gabor at z10n.net
says...
> On Wed, 2004-03-24 at 15:16, Bill Rubenstein wrote:
> > ...snip...
> > > > other thing is, that in the projects i work on, there seems to be
> > > > very hard to do unit tests
> > ...snip...
> >
> > The ability to do unit testing should not be an afterthought. It should be
> > considered as a major influence on the architecture of a project.
> >
> > If one cannot do proper unit testing, the architecture of the project is
> > questionable.
>
> ok, so let's use a specific example:
>
> imagine you're building a library, which fetches webpages.
>
> you have a library which can fetch 1 webpage at a time, but it is a
> synchronous library (like wget). you call him, and he returns the page.
>
> but you want an async one.
>
> so you decide to build a threadpool, where every thread will do this:
> look into a queue, and if there is a new URL to fetch, fetches it with
> his wget-like library, and saves the html page somewhere (and maybe
> signals something).
>
> and now the user who uses your library, simply adds the URL to fetch,
> and can check later asynchronously whether they are already fetched or
> not.
>
> could you tell me what unit tests would you create for this example?
>
>
> (a more generic request: is there on the internet a webpage with
> something like this? one where they have some complex
> modules/programs/algorithms, and they show how to write unittests for
> them?)
>
> thanks,
> gabor
>
>
>
Ok, I think I understand what the job is so, here is a try.
I'm assuming that this async wget's job is to start at a url, fetch it, track
down and fetch any links and such, get them, and make all of that available on
the local system for later viewing.
To make it testable, I'd design so that the application part of the system
(described above) has as limited a knowledge of its surroundings as possible --
except for the actual work performed. It should have no knowledge of a gui, for
instance.
Instead it should know about an object which represents a 'job'. This object
should have attributes and/or functions which can be accessed to find out the
base URL, the current status or state of the specific job (not started, in
progress (various states here),..., complete. There should be a log associated
with the job object where both normal and abnormal stuff can be kept. It should
also be able to provide information about the user if there is one, instructions
about the base URL, where in the local file system to store the results, etc.
During the development phase this job object is going to be a bit dynamic as new
needs for it are discovered.
There should probably be one object which can keep track of all of the job
objects and is responsible for creating new ones and deleting old ones.
All of the interfaces to the job management object and the job object need to be
formalized and properly documented. This whole subsystem can be tested, then, by
a test driver requesting services via the documented interfaces, changing the
state of a job via the documented interfaces and determining that the state
transitions are as expected. There is no need to fetch any real URLs to do this,
just pretend you did. This test driver also needs to exercise the interfaces
intended for use by a gui.
Now, as to testing the actual application code -- I'd think that you'd need a set
of URLs which would return known and stable results and a number of error
situations (bad links and such) to test against. Then a test driver would be
written to use the standard interfaces to the job management object and the job
object to schedule work against those URLs, determine when that work is done and
test that the results are as expected, highlight the differences between a prior
run against the particular URL and the current run, etc.
I've been retired for years but that was pretty much how we did it. There were
two small programming teams -- one writing application code against the formal
interface documentation and one writing test scaffolding against the same
documentation and building test cases. Things worked, the bug rate was very low,
implementation changes were localized and testable...
Anyway, it worked for us and we never had to claim that we just couldn't test
something except in production.
Bill
More information about the Python-list
mailing list