[Tutor] Sun says: Don't use Java, use Python!

Jeff Shannon jeff@ccvcorp.com
Mon Feb 10 20:45:02 2003


Paul Tremblay wrote:

>On Mon, Feb 10, 2003 at 12:18:59PM -0800, Jeff Shannon wrote:
>
>>If it's a short, quick script, then it's probably not going to make 
>>enough of a difference to worry about.  The simplicity of distributing a 
>>single file would be more valuable than the speed gain that you'd get 
>>from having the bytecode precompiled.  (Precompilation only affects 
>>startup speed, *not* actual running speed.)
>>    
>>
>
>The script will be over 3,000 lines long. In order to parse the RTF, it
>has to make a dozen or so passes through the file. The reason for so
>many passes is that RTF is very dense. I had written a successful perl
>script that parsed RTF in one pass, but the script is almost unreadable.
>
>The newer perl script (not the one above) makes a module for each pass
>through the file. I thought this was bad design, because I had a dozen
>or so modules.
>
>But you are telling me that it is perhaps best to keep the script
>modular? 
>

Yes.  The more modular it is, the easier it will be to understand what's 
going on.  A module for each pass through the file sounds fairly 
reasonable, since it's performing a completely different process each 
time.  You might also want to have a separate module for utility 
functions that are used in more than one pass, if there are any.  

But rather than simply separating functions by the pass that they're 
used on, think about your script in units of related functionality.  If 
you're converting RTF to XML, then you want to have the code that parses 
and makes sense of the RTF separate from the code that creates the XML. 
 Ideally, you should be able to point to each module and name a single 
task that it performs -- if you need to use "and" more than once in 
describing a module, then it's a good candidate for splitting into two 
or more separate modules.  Then you can describe the overall control 
flow of the program in terms of module interactions.  "The main module 
uses the parser module to organize the input, then processes that with 
the analyzer module.  The results of this are used by the configuration 
module to drive the xml_output module."  (I just made all this up, and 
have absolutely no idea if your code could reasonably be broken down 
into chunks in that sort of way, but the point is to show that each 
module should have a separate, easily identifiable task.)  That way, you 
can work on one small subset of the problem at a time (just basic 
parsing, or just analyzing your parse tree, or just writing XML) and not 
worry so much about keeping the entire process in your head at once.

>Setting up the distribution is not such an easy task. With the perl
>script, I thought it would be a piece of cake, but I am really strugging
>with writing the set-up script. For that reason, I wanted to make sure I
>knew how to do this with Python *before* the script was finished, and
>before I ralized that I should have done x, y or z--but realized it was
>going to be really hard to do so at this point.
>

If you can set your program up as a package, or possibly even a nested 
package with several subpackages (depending on how many layers of task 
heirarchy you can reasonably separate), then distribution in Python 
shouldn't be very difficult.  If just copying the package directory to a 
new machine's site-packages doesn't work, then look into distutils -- 
it's a little daunting at first, but it's really pretty easy to use, 
especially if your package is all Python.  (Things get more complicated 
if you've got a C extension that needs compiled on the target machine, 
but it doesn't sound like you're intending to do that.)

Jeff Shannon
Technician/Programmer
Credit International