Extending/embedding versus separation
John Machin
sjmachin at lexicon.net
Thu Mar 28 18:03:34 EST 2002
bokr at oz.net (Bengt Richter) wrote in message news:<a7vokr$qdo$0 at 216.39.172.122>...
> On Thu, 28 Mar 2002 11:12:09 -0000, "skoria" <shomon at softhome.net> wrote:
>
> >Hi
> >
> >Thanks for your help. No, I didn't write the hash tables. I'm writing
> >a stats program based on webalizer. Webalizer and Analog were the
> >fastest and least memory hungry of all the programs I know of, and
> >webalizer is GPL so I was able to use it as a basis for my work, so I
> >get to make free software at work! The hash tables are already written
> >there.
I've just downloaded the webalizer source and inspected its hashtab.c
-- 5 or more different hashtables each with its own constructor,
destructor, insert, etc methods [maintenance nightmare] -- hash
function effectively uses only the last 6 to 7 bytes of the key on a
32-bit machine [hashval is multiplied by 31 for each byte in the key]
-- uses chaining for collison resolution but w/o move-to-front
heuristic [IOW it could use less memory or go faster!] -- reaction:
slam lid shut, nail it down, ... aarrgghh, how do we stop the timbot
from seeing this lest it collapse with an ExcessiveSnortCount
exception?
[snip]
> OTTOMH, I would guess that you may need to generate some special
> representations of visitor paths directly from the logs, unless
> the webalizer already does that. But if you have to process the
> raw logs in a new way, it will most likely be easier to get right
> in Python than by cannibalizing existing C.
Especially *that* existing C. Look at its parser.c; do you want to
have to understand that, let alone maintain it?
And saving memory??? You don't need the following in Python:
#define MAXHASH 2048 /* Size of our hash tables */
#define BUFSIZE 4096 /* Max buffer size for log record */
#define MAXHOST 128 /* Max hostname buffer size */
#define MAXURL 1024 /* Max HTTP request/URL field size */
#define MAXURLH 128 /* Max URL field size in htab */
#define MAXREF 1024 /* Max referrer field size */
#define MAXREFH 128 /* Max referrer field size in htab */
#define MAXAGENT 64 /* Max user agent field size */
#define MAXCTRY 48 /* Max country name size */
#define MAXSRCH 256 /* Max size of search string buffer */
#define MAXSRCHH 64 /* Max size of search str in htab */
#define MAXIDENT 64 /* Max size of ident string (user) */
Bottom line: do it in Python.
More information about the Python-list
mailing list