[Spambayes] CRM114 in November breaks 99.9%. :-)
Bill Yerazunis
wsy@merl.com
Mon Dec 2 15:57:52 2002
From: Matt Sergeant <msergeant@startechgroup.co.uk>
CRM114's learn and classify stuff looks really interesting, but it has a
really freaky syntax to someone who is used to regular procedural or OO
languages like Perl, Python, C, etc.
It _is_ procedural, it's just extremely high level. Perhaps higher-level
than APL if you count statements rather than operators.
And sorry about the syntax. I was being playful, and reading a book
on Latin at the time, which is why it uses symmetric declensional parsing
rather than something more sane, like recursive descent. (*)
Is there *any* chance the library
in crm114 for learning and classifying can be extracted into a plain
.so? That would be tremendous, and I'd be willing to build a perl XS
library for it in a heartbeat.
Yes, it's not difficult to get at the code.
Pop the .gz open, emacs the file crm114.c, and look for the case
headers "CRM_LEARN" and "CRM_CLASSIFY" respectively. The code there
is _not_ generated, but executed in-line, so cut and paste will work.
The current code requires a null-terminated string as input, but
that's because of the GNU regex library limits (when TRE gives me a
new library, that requirement will go away). You _will_ need to link
it against a regex library (of your choice, CRM114 uses the standard
ANSI regcomp/regexec calling sequence), and the OS itself needs to
support stat() [for file existence/length] and mmap() [to map a file
into virtual memory without actually reading it in a byte at a time-
this is just for efficiency and can be worked around].
How bad do you want it? :-)
If not, we'll just have to try and copy the sparse binary polynomial
hash idea ;-)
Always legitimate. It's GPLware, no problemo.
-Bill Yerazunis
(*) all in all, I like the way it ended up; one can just type programs
on the command line and they do useful things. But hindsight is always
20/20, and "less wierdass" might be better in the long run.
More information about the Spambayes
mailing list