[lm@bitmover.com: Re: FW: Staging System - Requirements Meeting Recap]

hey there, two things: (a) is mailman better than majordomo? I assume it must be right, or you wouldn't be putting all this effort into it. I'm using majordomo now and I keep meaning to look for something that has better spam filters. Forgive the stupid questions, but should I switch?
(b) JC has told me that you are considering maybe using BitKeeper for mailman development. I had a call from a large company today and they sent in their requirements and I spent a bunch of time answering them. It might be worth it for you to skim through the questions and answers, I suspect there is some overlap.
I'd love it if it turned out that BK was useful to you (and I'd be equally bummed to find out it didn't work for you; if that's the case I want to know why: if there are technical reasons for you not using it, those will almost certainly be fixed, you guys have similar issues to everyone else so that means if it doesn't work for you, it doesn't work for very many other people).
Finally, if you have questions, you can mail to me and/or dev@bitmover.com . The dev alias is all the people hacking on BK right now.
Cheers,
--lm
-----Forwarded message-----
Hi folks, here are BitMover's responses to your requirements. We think BitKeeper might be a good fit for you, let us know what you think.
Open Discussion - Requirements a.. Direct FTP access
We provide SSH (secure remote shell), RSH (BSD remote shell), and SMTP (email) as transports for moving stuff around. We could do a FTP transport but it seems questionable given that ssh is faster and more secure. If the issue is Windows, no problem, we support ssh on Windows. Use it all the time, in fact, between Windows and Unix.
Oh, I forgot, we also work in local and networked file systems (NFS, SMB) and do resyncs between Unix and Windows using SMB.
b.. Graphical view of the file tree -or- dynamic tree structure
We actually don't have this right now. But see the checkin response. We have it on Windows because we integrate with their GUI glop, but that's not much help.
If we get far enough along and this is a go/no go thing for you, and you're buying 10,000 seats (:-) then we'll write one. Seriously, it's on the list.
c.. Check-in and check-out facilities
Got 'em. Both command line and graphical. The command line stuff is like so
bk new file - check a file into the system for the first time
- aliases: bk delta -i, bk ci -i, bk admin -i
bk edit file - check out a file locked for modification
- aliases bk co -l, bk get -e
bk ci file - check in modifications to a file
- aliases: bk delta
You can do all of the above on the entire tree, a sub tree, an explict list:
bk -r new - checks in everything it finds
bk -r edit - locks everything
bk -r ci -yComment - checks in everything
But the best tool is citool, which finds everything you've modified,
lists in a graphical tool, and shows you the diffs as you're doing the
check in. See http://www.bitkeeper.com/citool.html
That's exactly the right answer. Use any editor you want. What you get with BK is the revision history files locally. If you've worked with RCS or SCCS, in many respects, BK is like that with all the added stuff you need to have multiple copies of the revision history. So every playpen is a repository, you can lock, edit, checkin, browse, etc., everything locally. No need for network connectivity except when you want to update your tree or the other tree.
Oh, and you can update "sideways". One developer can slurp in the changes in a another developer's tree directly, without going through the "master" or "shared" tree. In CVS terms, I don't have to go back to the master repository, I can talk directly to you. The hierarchical nature of the system is strictly a convention thing, you can resync from anywhere to anywhere.
e.. Ability to cluster files into "Change Sets" for version control and configuration management
Chuckle. We have that in spades. We don't let you move data between playpens unless you've put it in a changeset (yeah, you could get the diffs and apply them with patch, we can't stop you, but if you want the system to do it, you have to "commit" whatever you have done to a changeset).
The normal way you do this is when you pop into citool, there will be N+1 entries, where N are the files with mods, and the Nth+1 entry is the ChangeSet file. If you comment the ChangeSet file, you are creating a ChangeSet. The "diffs" shown for the ChangeSet file are the comments you have just typed in on all the other files like so
src/foo.c
Fix a bug in arg processing
src/foo.h
Include getopt.h
with the idea being that the ChangeSet comments should be the idea or concept you just did, while the per file comments are implementation details.
f.. Allow user selectable update reports on one or more branches of the development system
I think this is what we call a LOD, but I need clarification. What exactly do you want?
g.. Ability for CGI to be accessible and scriptable from a remote shell - should support Mac/PC/Linux/Unix/etc.
Is this a source management system issue? I don't understand. And one huge bummer - we don't support the Mac; the system depends pretty heavily on various Unix tools (sed, sh, tr, diff, etc), which we provide for Windows but haven't worked up the ompf to do on the Mac. If the Mac is a requirement, we can support MacOS X no problem, that's Unix, but MacOS 8 is a real drag. Let me know, that might be a show stopper. There are ways we can work around this, using appletalk - you could do the source management crud on Linux and the editing on a Mac. Definitely a hack but I can relate to the requirement - the Mac has a bunch of useful tools for Web folks.
Long term we will do a port. But that isn't likely to happen in the next few months.
h.. Change Logs
You get these for free, it's the ChangeSet comment history.
i.. Concurrent Development of files - requires intelligent merging agent for coordinating updates from multiple developers (CVS does this now)
We do this better than anyone. Period. Everyone else screws up your history. Here's how: you have two developers A and B. They both do CVS update and have the same tree. They both modify file foo.c. A checks in first, so B "lost the race" and has to merge. What CVS will check in is not B's changes, it's B's changes plus whatever B had to do to merge. That is **two events** collapsed into one. Why is this a big deal? Undo. Suppose B's stuff was good and A's stuff wasn't. We can reconstruct B's stuff without the merge. CVS can't. It lost the information.
Look at http://www.bitkeeper.com/sccstool.html - what you are seeing is the race and the merge. I'm "lm" and "awc" is my Windows guy. We were working parallel and the graph that you are seeing is the revision history after we merged. 1.89 is the merge delta, 1.86.1.1 and 1.86.1.2 are my deltas. If I want to reconstruct just my work, I do an undo and tell it which branch I want gone.
And we do all this automagically. That tree of changes is a straight line, no branches tree as far as the user is concerned. The branches are created automatically when changes are merged in, you never have to do that.
j.. Access Control for each user of the development environment: Per user basis and Per file basis
We don't do this because this is an operating system issue. How you achieve the same effect is to restrict access using standard Unix file permissions on the master repository.
k.. Provide secure remote access to environment either through secure CRT, SSL, domain restrictions, etc.
SSH.
l.. Ability to differentiate gear, products and templates for exclusive use (for one private label) and public use (all private labels)
I don't understand this requirement.
This done through the OS. If you have an account and you can read and write the files, then you can read and write the files.
b.. Utilize a versioning source control system (SCS) for updates
We provide something which is file format compatible with SCCS, AT&T's original revision control system from the 70's. A lot of people question this, there are claims that the SCCS file format is worse than RCS. That's not true, in fact, the opposite is true. RCS has one potential advantage that they don't even use: they store the most recent delta as a clear file and all the previous ones as diffs (backward diffs for going up the trunk and forward diffs for going down the branches). If RCS stored an offset and a size in the file, then they could seek to to the clear text and write the file out in one system call. They don't do that so they end up reading the whole file anyway (or most of it, it depends). Whatever. Someone could modify RCS to do what I said and it would still suck. Here's why:
. No checksum. SCCS checksums the file and verifies the checksum every
time you get the file. BK adds an additional per delta checksum.
The point is that you put your IP into a system; that system should
guard that IP very carefully. RCS makes a performance vs integrity
tradeoff which will end up screwin you in the long run.
. Annotation. SCCS can trivially give you a copy of the file with each line prefixed by any combination of the revision/user/date which added that line. In addition, BK can check out a copy of the file with every line in every revision in the file. Think about that. You know that somebody changed something and you know the string they changed but you don't know when. In BK you can find out by doing this
bk sccscat -mu foo.c | grep string
. ChangeSets. SCCS is a changeset engine, people just don't know it. What that means is that I can edit a file on one branch and say "I also want that delta over there which is on a different branch". SCCS will happily include it. You can creat new deltas and include/exclude any arbitrary list of deltas. It may not be obvious, but this is really cool and has far reaching ramifications. Not only can RCS not do this, if you tried to kludge it in, it would grow the file linearly for each include. In SCCS, including or excluding a delta costs you about 4 bytes per included delta. In other words, you can construct multiple different views of your data and it doesn't cost you. Ping me in the conference call about this if I did a poor job explaining it, it is profoundly important. It's what makes SCCS a ChangeSet engine and what makes RCS not a ChangeSet engine.
c.. Each engineer works in an independant workspace (currently known as playpens)
Yup. Each engineer can have as many playpens as s/he wants. Each are fully independent and get this: NO NO NO environment variables. You work in a playpen by saying
cd ~/playpen/src
bk vi foo.c
If you want to work in a different playpen, you just say
cd ~/different_playpen/src
bk vi bar.c
Environment variables suck. Just say no. Dare to break the environment variable habit :-)
d.. Depending on the engineer's privileges - able to push edits to staging server at the branch or global levels
If they have a login and write permissions on the staging area, they can. I really felt no need to reinvent Unix file permissions. Yeah, this screws the NT people but they can emulate the same thing by limiting access to the machine which has the staging area.
Same as above.
We're weak here in some regards. We don't support partial repositories. In other words, when you do a resync, you get the whole tree. To deal with this, you will naturally split your project up into chunks. Each of these chunks is a repository. So far so good, but the one bummer is when you want to share data between two repositories. We currently don't have any way for data to be in two different "chunks" (we call 'em projects) at the same time.
This needs more explanation so please ping me during the conference call.
b.. Ability to rollback change sets and individual files
Chuckle. You bet. The easiest (but somewhat slow) way to roll back is like so: suppose you want to roll back to ChangeSet 1.123 (which was conviently tagged with alpha2).
$ bk resync -r..alpha2 master alhpa2-test
That will create a repository which is identical to the master repository when alpha2 went in (by the way, the tag is just for clarity, anywhere you use a tag you can use a revision).
The other way is suppose you had a tree with ... alpha2 - 1.124 - 1.125 and you wanted alpha2. You can dothis
$ bk undo 1.124,1.125
and that will DESTROY those two changesets. You'd better have a copy of them somewhere if you want 'em back).
c.. Tie in builds with bug tracking system to complete a closed loop tracking process
Busted. We want to do this but this is a 2.0 or perhaps 3.0 before we have it. We have a plan for doing it but right now we don't have diddly.
d.. Test and submit builds for Go Live!
???
Seems OK.
I don't know what these are.
One thing you forgot to ask about is file renames. This is another place where people fall down. We don't (surprise!). We handle file pathname changes identically to file content changes. Pathnames are revisioned and propogate. Consider a couple of test cases:
You Me Resync you to me
mv foo bar nothing moves foo to bar in my tree
change foo mv foo bar applies change in foo to bar in my tree
mv foo bar mv foo blech prompts you with name conflict
This is not a big deal until you need it to work but then it is a huge deal. It can bring your develpment to a halt for days while your engineers unscramble the mess. We just make that problem go away.
You also forgot file permissions. We pick up the permissions as of the time that the file was originally checked in "bk new". After that, if you want to change them, you just say "bk chmod 755 foo.sh" and it saves the modes. On windows, only the top bits (owner permissions) are used, but it doesn't stomp on the lower bits.
Also consider the following:
CVS (it's free and I can show you what you could do to sort of have some of these features, not all, but some).
Perforce, http://www.perforce.com - I know Chris, great guy, nice little tool. It doesn't do what you want but it is quite popular and you should know it exists.
TrueChange from TrueSoft, http://www.truesoft.com - this is the only commercially available ChangeSet engine other than BitKeeper.
Cheers,
Larry McVoy President, BitMover, Inc.
-----End of forwarded message-----
--
Larry McVoy lm@bitmover.com http://www.bitmover.com/lm

This is probably better on mailman-users...
Anyhow, I think mailman and majordomo each have their own place. Majordomo is much more flexible and, in my opinion, powerful. The basic structure is simplistic, but it's very easy to change things. However, it is also pretty admin-intensive and has a braindead digest scheme.
mailman takes administrative tasks away from the admin and puts them in the hands of the listowner. It saves time and is a lot easier to configure. Users also find the Web interface easier than an email one. However, Adding features is a pain in the butt, especially if you're not familiar with Python (i.e. I doubt that I'll ever see support for quoted addresses containing spaces, since Mailman splits on whitespace).
So if you want to pass administrative tasks to the listowner and have a pretty Web interface, use mailman. If you need to do heavy-duty funky stuff as an admin, use majordomo.
Don't get me wrong though -- I like mailman and find it very useful when I want to offload list responsibility. In fact, we're trying to implement it here at NCSA for a majority of our less complicated lists.
Chris

"LM" == Larry McVoy <lm@bitmover.com> writes:
LM> hey there, two things: (a) is mailman better than majordomo?
I still firmly believe so. We had never ending problems with Majordomo on python.org, which is why (more than the philosophical discomfort :) we made a strong push to get Mailman in operational shape. I've since been able to completely wax Majordomo from my hard drives :)
Yes, Mailman needs work. But there are some really great hackers on mailman-developers and I think we just need to structure the project better so that those folks can contribute more directly, without the frustration when the core 5 of us are busy with Real Work.
LM> Oh, and you can update "sideways". One developer can slurp in
LM> the changes in a another developer's tree directly, without
LM> going through the "master" or "shared" tree. In CVS terms, I
LM> don't have to go back to the master repository, I can talk
LM> directly to you. The hierarchical nature of the system is
LM> strictly a convention thing, you can resync from anywhere to
LM> anywhere.
This really fires me up. I would love to have Mailman's doco team announce "hey, we've got a new revision, please sync with our repository to check it out". Meanwhile the archivists say "we've got the new search engine ready for those who'd like to take a look". So then at some point I get an email saying that there's been enough testing and people are feeling confident about the changes. Until then, maybe I don't even look at the stuff, and only suck it into the master when there's been enough of that sideways development to make things stable. If I got vision right, that's exactly what I'm looking for
LM> with the idea being that the ChangeSet comments should be the
LM> idea or concept you just did, while the per file comments are
LM> implementation details.
Another very cool idea, because this is exactly how I'd like to work. Currently I put both levels of detail into the individual file log msgs, but I'd rather do it this way. Is there an equivalent of citool for Emacs?
>> h.. Change Logs
LM> You get these for free, it's the ChangeSet comment history.
One important thing for GNU projects (although I've been lax about it) is the ability to generate GNU ChangeLogs. Emacs has tools for extracting this info out of RCS/CVS log msgs. It would be nice to have the same capability with BK. If there are aliases for "cvs log" then it might be a no-brainer.
LM> We do this better than anyone. Period. Everyone else screws
LM> up your history. Here's how: you have two developers A and B.
LM> They both do CVS update and have the same tree. They both
LM> modify file foo.c. A checks in first, so B "lost the race"
LM> and has to merge. What CVS will check in is not B's changes,
LM> it's B's changes plus whatever B had to do to merge. That is
LM> **two events** collapsed into one. Why is this a big deal?
LM> Undo. Suppose B's stuff was good and A's stuff wasn't. We
LM> can reconstruct B's stuff without the merge. CVS can't. It
LM> lost the information.
LM> Look at http://www.bitkeeper.com/sccstool.html - what you are
LM> seeing is the race and the merge. I'm "lm" and "awc" is my
LM> Windows guy. We were working parallel and the graph that you
LM> are seeing is the revision history after we merged. 1.89 is
LM> the merge delta, 1.86.1.1 and 1.86.1.2 are my deltas. If I
LM> want to reconstruct just my work, I do an undo and tell it
LM> which branch I want gone.
Um, drool.
LM> . ChangeSets. SCCS is a changeset engine, people just don't
LM> know it. What that means is that I can edit a file on one
LM> branch and say "I also want that delta over there which is on
LM> a different branch".
Indeed, very cool, IIUC. Another thing that sucks in CVS, but which we need surprisingly often, is the ability to include a file in multiple projects. Example, in the Python project we've got a file called Lib/smtplib.py which Guido will edit and change as bug reports come into via the Python channel. Mailman usually just wants to use the latest smtplib.py file, but it lives under the Mailman project in Mailman/pythonlib/smtplib.py. When I see changes happen to the Python tree, I usually checkout the latest version, and commit it to the Mailman tree. This sucks because I've now lost the revision history to the file under Mailman. I can kludge around this by evil tricks like symlinking or <shudder> hardlinking the ,v file in the repository. That sucks for any number of reasons (rsync to the anonCVS, what if I want to do this for lots of files all over the place, etc.)
Does BK provide any kind of support for this?
LM> We're weak here in some regards. We don't support partial
LM> repositories. In other words, when you do a resync, you get
LM> the whole tree. To deal with this, you will naturally split
LM> your project up into chunks. Each of these chunks is a
LM> repository. So far so good, but the one bummer is when you
LM> want to share data between two repositories. We currently
LM> don't have any way for data to be in two different "chunks"
LM> (we call 'em projects) at the same time.
Ah, so I think my answer to above is "no". Well, I guess I'm no worse off :)
LM> and that will DESTROY those two changesets. You'd better have
LM> a copy of them somewhere if you want 'em back).
Hmm, it might be nice if some day those were re-doable.
LM> One thing you forgot to ask about is file renames.
Equally important is directory renames. Can't be done in CVS, so what you end up doing is creating the new directory, moving all the ,v files in the repository, then doing an update -d -P. The old dir doesn't go away, it's just pruned out 'cause it's empty. There's gotta be a better way.
Thanks for all the info Larry, -Barry

This is probably better on mailman-users...
Anyhow, I think mailman and majordomo each have their own place. Majordomo is much more flexible and, in my opinion, powerful. The basic structure is simplistic, but it's very easy to change things. However, it is also pretty admin-intensive and has a braindead digest scheme.
mailman takes administrative tasks away from the admin and puts them in the hands of the listowner. It saves time and is a lot easier to configure. Users also find the Web interface easier than an email one. However, Adding features is a pain in the butt, especially if you're not familiar with Python (i.e. I doubt that I'll ever see support for quoted addresses containing spaces, since Mailman splits on whitespace).
So if you want to pass administrative tasks to the listowner and have a pretty Web interface, use mailman. If you need to do heavy-duty funky stuff as an admin, use majordomo.
Don't get me wrong though -- I like mailman and find it very useful when I want to offload list responsibility. In fact, we're trying to implement it here at NCSA for a majority of our less complicated lists.
Chris

"LM" == Larry McVoy <lm@bitmover.com> writes:
LM> hey there, two things: (a) is mailman better than majordomo?
I still firmly believe so. We had never ending problems with Majordomo on python.org, which is why (more than the philosophical discomfort :) we made a strong push to get Mailman in operational shape. I've since been able to completely wax Majordomo from my hard drives :)
Yes, Mailman needs work. But there are some really great hackers on mailman-developers and I think we just need to structure the project better so that those folks can contribute more directly, without the frustration when the core 5 of us are busy with Real Work.
LM> Oh, and you can update "sideways". One developer can slurp in
LM> the changes in a another developer's tree directly, without
LM> going through the "master" or "shared" tree. In CVS terms, I
LM> don't have to go back to the master repository, I can talk
LM> directly to you. The hierarchical nature of the system is
LM> strictly a convention thing, you can resync from anywhere to
LM> anywhere.
This really fires me up. I would love to have Mailman's doco team announce "hey, we've got a new revision, please sync with our repository to check it out". Meanwhile the archivists say "we've got the new search engine ready for those who'd like to take a look". So then at some point I get an email saying that there's been enough testing and people are feeling confident about the changes. Until then, maybe I don't even look at the stuff, and only suck it into the master when there's been enough of that sideways development to make things stable. If I got vision right, that's exactly what I'm looking for
LM> with the idea being that the ChangeSet comments should be the
LM> idea or concept you just did, while the per file comments are
LM> implementation details.
Another very cool idea, because this is exactly how I'd like to work. Currently I put both levels of detail into the individual file log msgs, but I'd rather do it this way. Is there an equivalent of citool for Emacs?
>> h.. Change Logs
LM> You get these for free, it's the ChangeSet comment history.
One important thing for GNU projects (although I've been lax about it) is the ability to generate GNU ChangeLogs. Emacs has tools for extracting this info out of RCS/CVS log msgs. It would be nice to have the same capability with BK. If there are aliases for "cvs log" then it might be a no-brainer.
LM> We do this better than anyone. Period. Everyone else screws
LM> up your history. Here's how: you have two developers A and B.
LM> They both do CVS update and have the same tree. They both
LM> modify file foo.c. A checks in first, so B "lost the race"
LM> and has to merge. What CVS will check in is not B's changes,
LM> it's B's changes plus whatever B had to do to merge. That is
LM> **two events** collapsed into one. Why is this a big deal?
LM> Undo. Suppose B's stuff was good and A's stuff wasn't. We
LM> can reconstruct B's stuff without the merge. CVS can't. It
LM> lost the information.
LM> Look at http://www.bitkeeper.com/sccstool.html - what you are
LM> seeing is the race and the merge. I'm "lm" and "awc" is my
LM> Windows guy. We were working parallel and the graph that you
LM> are seeing is the revision history after we merged. 1.89 is
LM> the merge delta, 1.86.1.1 and 1.86.1.2 are my deltas. If I
LM> want to reconstruct just my work, I do an undo and tell it
LM> which branch I want gone.
Um, drool.
LM> . ChangeSets. SCCS is a changeset engine, people just don't
LM> know it. What that means is that I can edit a file on one
LM> branch and say "I also want that delta over there which is on
LM> a different branch".
Indeed, very cool, IIUC. Another thing that sucks in CVS, but which we need surprisingly often, is the ability to include a file in multiple projects. Example, in the Python project we've got a file called Lib/smtplib.py which Guido will edit and change as bug reports come into via the Python channel. Mailman usually just wants to use the latest smtplib.py file, but it lives under the Mailman project in Mailman/pythonlib/smtplib.py. When I see changes happen to the Python tree, I usually checkout the latest version, and commit it to the Mailman tree. This sucks because I've now lost the revision history to the file under Mailman. I can kludge around this by evil tricks like symlinking or <shudder> hardlinking the ,v file in the repository. That sucks for any number of reasons (rsync to the anonCVS, what if I want to do this for lots of files all over the place, etc.)
Does BK provide any kind of support for this?
LM> We're weak here in some regards. We don't support partial
LM> repositories. In other words, when you do a resync, you get
LM> the whole tree. To deal with this, you will naturally split
LM> your project up into chunks. Each of these chunks is a
LM> repository. So far so good, but the one bummer is when you
LM> want to share data between two repositories. We currently
LM> don't have any way for data to be in two different "chunks"
LM> (we call 'em projects) at the same time.
Ah, so I think my answer to above is "no". Well, I guess I'm no worse off :)
LM> and that will DESTROY those two changesets. You'd better have
LM> a copy of them somewhere if you want 'em back).
Hmm, it might be nice if some day those were re-doable.
LM> One thing you forgot to ask about is file renames.
Equally important is directory renames. Can't be done in CVS, so what you end up doing is creating the new directory, moving all the ,v files in the repository, then doing an update -d -P. The old dir doesn't go away, it's just pruned out 'cause it's empty. There's gotta be a better way.
Thanks for all the info Larry, -Barry
participants (3)
-
Barry A. Warsaw
-
Christopher Lindsey
-
Larry McVoy