Python for large projects

Sun Mar 28 09:49:58 EST 2004

In article <mailman.357.1080147061.742.python-list at python.org>, gabor at z10n.net 
says...
> On Wed, 2004-03-24 at 15:16, Bill Rubenstein wrote:
> > ...snip... 
> > > > other thing is, that in the projects i work on, there seems to be
> > > > very hard to do unit tests
> > ...snip...
> > 
> > The ability to do unit testing should not be an afterthought.  It should be 
> > considered as a major influence on the architecture of a project.
> > 
> > If one cannot do proper unit testing, the architecture of the project is 
> > questionable.
> 
> ok, so let's use a specific example:
> 
> imagine you're building a library, which fetches webpages. 
> 
> you have a library which can fetch 1 webpage at a time, but it is a
> synchronous library (like wget). you call him, and he returns the page.
> 
> but you want an async one.
> 
> so you decide to build a threadpool, where every thread will do this:
> look into a queue, and if there is a new URL to fetch, fetches it with
> his wget-like library, and saves the html page somewhere (and maybe
> signals something).
> 
> and now the user who uses your library, simply adds the URL to fetch,
> and can check later asynchronously whether they are already fetched or
> not.
> 
> could you tell me what unit tests would you create for this example?
> 
> 
> (a more generic request: is there on the internet a webpage with
> something like this? one where they have some complex
> modules/programs/algorithms, and they show how to write unittests for
> them?)
> 
> thanks,
> gabor
> 
> 
> 
Ok, I think I understand what the job is so, here is a try.

I'm assuming that this async wget's job is to start at a url, fetch it, track 
down and fetch any links and such, get them, and make all of that available on 
the local system for later viewing.

To make it testable, I'd design so that the application part of the system 
(described above) has as limited a knowledge of its surroundings as possible -- 
except for the actual work performed. It should have no knowledge of a gui, for 
instance.

Instead it should know about an object which represents a 'job'.  This object 
should have attributes and/or functions which can be accessed to find out the 
base URL, the current status or state of the specific job (not started, in 
progress (various states here),..., complete.  There should be a log associated 
with the job object where both normal and abnormal stuff can be kept.  It should 
also be able to provide information about the user if there is one, instructions 
about the base URL, where in the local file system to store the results, etc.  
During the development phase this job object is going to be a bit dynamic as new 
needs for it are discovered.

There should probably be one object which can keep track of all of the job 
objects and is responsible for creating new ones and deleting old ones.

All of the interfaces to the job management object and the job object need to be 
formalized and properly documented.  This whole subsystem can be tested, then, by 
a test driver requesting services via the documented interfaces, changing the 
state of a job via the documented interfaces and determining that the state 
transitions are as expected.  There is no need to fetch any real URLs to do this, 
just pretend you did.  This test driver also needs to exercise the interfaces 
intended for use by a gui.

Now, as to testing the actual application code -- I'd think that you'd need a set 
of URLs which would return known and stable results and a number of error 
situations (bad links and such) to test against.  Then a test driver would be 
written to use the standard interfaces to the job management object and the job 
object to schedule work against those URLs, determine when that work is done and 
test that the results are as expected, highlight the differences between a prior 
run against the particular URL and the current run, etc.

I've been retired for years but that was pretty much how we did it.  There were 
two small programming teams -- one writing application code against the formal 
interface documentation and one writing test scaffolding against the same 
documentation and building test cases.  Things worked, the bug rate was very low, 
implementation changes were localized and testable...

Anyway, it worked for us and we never had to claim that we just couldn't test 
something except in production.

Bill