OT: MoinMoin and Mediawiki?
ianb at colorstudy.com
Wed Jan 12 15:41:02 EST 2005
Paul Rubin wrote:
> Paul Rubin <http://phr.cx@NOSPAM.invalid> writes:
>>>>How does it do that? It has to scan every page in the entire wiki?!
>>>>That's totally impractical for a large wiki.
>>>So you want to say that c2 is not a large wiki? :-)
>>I don't know how big c2 is. My idea of a large wiki is Wikipedia.
>>My guess is that c2 is smaller than that.
> I just looked at c2; it has about 30k pages (I'd call this medium
> sized) and finds incoming links pretty fast. Is it using MoinMoin?
> It doesn't look like other MoinMoin wikis that I know of. I'd like to
> think it's not finding those incoming links by scanning 30k separate
> files in the file system.
c2 is the Original Wiki, i.e., the first one ever, and the system that
coined the term. It's written in Perl. It's a definitely not an
advanced Wiki, and it's generally relied on social rather than technical
solutions to problems. Which might be a Wiki principle in itself.
While I believe it used full text searches for things like backlinks in
the past, I believe it uses some kind of index now.
> Sometimes I think a wiki could get by with just a few large files.
> Have one file containing all the wiki pages. When someone adds or
> updates a page, append the page contents to the end of the big file.
> That might also be a good time to pre-render it, and put the rendered
> version in the big file as well. Also, take note of the byte position
> in the big file (e.g. with ftell()) where the page starts. Remember
> that location in an in-memory structure (Python dict) indexed on the
> page name. Also, append the info to a second file. Find the location
> of that entry and store it in the in-memory structure as well. Also,
> if there was already a dict entry for that page, record a link to the
> old offset in the 2nd file. That means the previous revisions of a
> file can be found by following the links backwards through the 2nd
> file. Finally, on restart, scan the 2nd file to rebuild the in-memory
That sounds like you'd be implementing your own filesystem ;)
If you are just trying to avoid too many files in a directory, another
option is to put files in subdirectories like:
base = struct.pack('i', hash(page_name))
base = base.encode('base64').strip().strip('=')
filename = os.path.join(base, page_name)
Ian Bicking / ianb at colorstudy.com / http://blog.ianbicking.org
More information about the Python-list