Extending/embedding versus separation

John Machin sjmachin at lexicon.net
Thu Mar 28 18:03:34 EST 2002


bokr at oz.net (Bengt Richter) wrote in message news:<a7vokr$qdo$0 at 216.39.172.122>...
> On Thu, 28 Mar 2002 11:12:09 -0000, "skoria" <shomon at softhome.net> wrote:
> 
> >Hi
> >
> >Thanks for your help. No, I didn't write the hash tables. I'm writing
> >a stats program based on webalizer. Webalizer and Analog were the
> >fastest and least memory hungry of all the programs I know of, and
> >webalizer is GPL so I was able to use it as a basis for my work, so I
> >get to make free software at work! The hash tables are already written
> >there. 

I've just downloaded the webalizer source and inspected its hashtab.c
-- 5 or more different hashtables each with its own constructor,
destructor, insert, etc methods [maintenance nightmare] -- hash
function effectively uses only the last 6 to 7 bytes of the key on a
32-bit machine [hashval is multiplied by 31 for each byte in the key]
-- uses chaining for collison resolution but w/o move-to-front
heuristic [IOW it could use less memory or go faster!] -- reaction:
slam lid shut, nail it down, ... aarrgghh, how do we stop the timbot
from seeing this lest it collapse with an ExcessiveSnortCount
exception?

[snip]
> OTTOMH, I would guess that you may need to generate some special
> representations of visitor paths directly from the logs, unless
> the webalizer already does that. But if you have to process the
> raw logs in a new way, it will most likely be easier to get right
> in Python than by cannibalizing existing C.

Especially *that* existing C. Look at its parser.c; do you want to
have to understand that, let alone maintain it?

And saving memory??? You don't need the following in Python:

#define MAXHASH  2048      /* Size of our hash tables          */
#define BUFSIZE  4096      /* Max buffer size for log record   */
#define MAXHOST  128       /* Max hostname buffer size         */
#define MAXURL   1024      /* Max HTTP request/URL field size  */
#define MAXURLH  128       /* Max URL field size in htab       */
#define MAXREF   1024      /* Max referrer field size          */
#define MAXREFH  128       /* Max referrer field size in htab  */
#define MAXAGENT 64        /* Max user agent field size        */
#define MAXCTRY  48        /* Max country name size            */
#define MAXSRCH  256       /* Max size of search string buffer */
#define MAXSRCHH 64        /* Max size of search str in htab   */
#define MAXIDENT 64        /* Max size of ident string (user)  */

Bottom line: do it in Python.



More information about the Python-list mailing list