Adding Optik to the standard library

I think it's time to declare the work of the getopt-sig finished: several competing proposals were put forward, but Optik appears to be the only really complete, documented, field-tested (by someone other than its author) library. Not everyone on the getopt-sig will agree, but I think that's the broad consensus. Take this with a grain of salt, though -- I'm biased. ;-) Anyways, I think further consensus is needed on how precisely to add Optik to the standard library. The only constraint I've heard from Guido is to give it a less-cutesy name, which is fine by me. First, background: Optik consists of three modules: optik.option, optik.option_parser, and optik.errors, but that detail is hidden from users -- Optik applications normally just do this: from optik import OptionParser although there are a handful of other names that are occasionally useful to import from the 'optik' package: Option, SUPPRESS_HELP, OptionValueError, etc. Optik's __init__.py file is here: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/optik/optik/lib/__init__.py?rev=1.11&content-type=text/plain It's only about 1100 lines, including docs and comments and blanks, so they could easily be merged into one module if people think that's preferable. So the main issues are 1) where those names should be imported from (interface), and 2) how the standard library should be rearranged to make this interface unobtrusive and efficient (implementation). I'm going to toss out ideas at random until I get bored. Please provide feedback and/or extra ideas on getopt-sig@python.org. IDEA #1: interface: from getopt import OptionParser # and the other Optik names implementation: * turn the getopt module into a package * put the current getopt.py into, say, getopt/classic_getopt.py * make getopt/__init__.py import everything from classic_getopt.py and Optik's three modules, so that either interface is there for the asking pros: * dead simple cons: * applications using just the classic getopt interface suddenly find themselves importing lots more code than they used to IDEA #2: interface: from getopt.option_parser import OptionParser, ... implementation: * as before, getopt.py becomes getopt/classic_getopt.py * getopt/__init__.py consists solely of "from classic_getopt import *" * Optik's three modules are copied into getopt, with the right imports added to getopt/option_parser.py so that applications don't have to worry about where Optik's other names come from pros: * only slightly more burden on apps now using classic getopt cons: * interface is a tad clunkier IDEA #3: interface: from getopt.option_parser import OptionParser, SUPPRESS_HELP, ... from getopt.option import Option from getopt.errors import OptionValueError implementation: * classic getopt handled the same as #2 * just dump Optik's three modules into getopt/ and be done with it pros: * dead simple cons: * clunky interface -- imports expose a lot of implementation detail IDEA #4: interface: same as #1 implementation: * same as #1, except use some funky import-time magic to ensure that the Optik code is only imported if someone actually needs it. Barry Warsaw provided a patch to do this: http://sourceforge.net/tracker/index.php?func=detail&aid=544175&group_id=38019&atid=421099 pros: * more efficient for apps using classic getopt cons: * more complicated; apparently Guido expressed distaste for the import-time magic. I'm a little leery of it myself, although I have not carefully read the code. Preferences? Other ideas? Surely the right solution is out there somewhere, just beyond my reach... Greg -- Greg Ward - Unix geek gward@python.net http://starship.python.net/~gward/ "Passionate hatred can give meaning and purpose to an empty life." -- Eric Hoffer

I think it's better to pick a new name and leave the existing getopt module alone. I think keeping it a package is fine. I prefer to have little or no magic in __init__.py though (the email package's __init__.py is borderline :-). I think that "options" is a fine package name. Yes, there are other things that one could consider options. No, I don't think that will cause confusion. After all "getopt" isn't much more specific. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

"guido" == Guido van Rossum <guido@python.org> writes:
guido> I think it's better to pick a new name and leave the existing getopt guido> module alone. How about "OptParser" (alternatives: OptionsParser, OptsParser) as an analogue to the existing ConfigParser? They do go together, both conceptually and in practice, after all. That would leave the more general "options" free for something, well, more general. :> Best, Kendall Clark

A decent guideline is to use the dominant class name as the module name. That would be OptionParser. Then, instead of from optik import OptionParser we'd be writing from OptionParser import OptionParser I like this! This works best if it is a single file; making OptionParser a package would just complicate things. So maybe I should take Greg up on his offer to refactor the code into a single .py file. (Barry prefers that there's only one class per file; fortunately I don't have that hangup. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

On 30 May 2002, Guido van Rossum said:
I think the BDFL has spoken. I can live with this, although I prefer lower-case module names. Whatever. Greg -- Greg Ward - geek-at-large gward@python.net http://starship.python.net/~gward/ Paranoia is simply an optimistic outlook on life.

I prefer lower-case module names for most situations, but I make an exception for modulename == classname. --Guido van Rossum (home page: http://www.python.org/~guido/)

"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> I prefer lower-case module names for most situations, but I GvR> make an exception for modulename == classname. Agreed! That's what the style guide says too. :) -Barry

[Guido]
from OptionParser import OptionParser
I like this!
[Greg]
I think the BDFL has spoken. I can live with this, although I prefer lower-case module names. Whatever.
[Guido]
I prefer lower-case module names for most situations, but I make an exception for modulename == classname.
But there's more than just the one class (OptionParser) in the module, and the other classes (Option, Values) *are* used. Barry's rule may hold for 1 class per module; that's not the case here. +1 for "options" -1 for "OptionParser" -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/

"DG" == David Goodger <goodger@users.sourceforge.net> writes:
DG> But there's more than just the one class (OptionParser) in the DG> module, and the other classes (Option, Values) *are* used. DG> Barry's rule may hold for 1 class per module; that's not the DG> case here. If that's so, then I'd prefer to see each class in its own module inside a parent package. | +1 for "options" | -1 for "OptionParser" I still think getopt-as-package could be made to work, but I'd be fine with `options'. Whatever though; Guido's weighed in and Greg should just decide and go for it! -Barry

"DG" == David Goodger <goodger@users.sourceforge.net> writes:
DG> We all know *your* bias! ;-) You've sat next to me at a Python conference then? My biological interference with acoustic systems is soooo embarrassing. http://www.acronymfinder.com/af-query.asp?p=dict&String=exact&Acronym=BIAS -Barry

getopt-as-package is definitely out. I'll leave it to Greg what to make of the remaining two alternatives (options or OptionParser). --Guido van Rossum (home page: http://www.python.org/~guido/)

On 30 May 2002, Guido van Rossum said:
getopt-as-package is definitely out. I'll leave it to Greg what to make of the remaining two alternatives (options or OptionParser).
I strongly prefer OptionParser, because that's the main class; it's the one that's always used (ie. directly instantiated). There are always instances of Option, OptionValues, and the various exception classes floating around -- but most Optik applications don't have to import those names directly. So in spite of David G.'s -1 on OptionParser, that's what I'm going with... Greg -- Greg Ward - Python bigot gward@python.net http://starship.python.net/~gward/

[Barry A. Warsaw]
If that's so, then I'd prefer to see each class in its own module inside a parent package.
Without trying to open a can of worms here, is there any sort of consensus on the use of packages with multiple smaller modules vs. one module containing everything? I'm asking about the Python standard library, specifically. According to the one-class-per-module rule of thumb, there are some Python modules that could be refactored into packages. Weighing against that is the convenience of importing a single module. I'm just wondering if there are any guidelines that should frame one's thinking beyond the fairly obvious ones? For example, is the standard library an exceptional case because it must appeal to new users as well as experts? Does a good part of this issue come down to personal preference? Or are there advantages and disadvantages that should be documented? (Maybe they already have.) Is the current library configuration considered healthy? There are a mix of packages and single modules. Are these implementations pretty optimal, or would they be organized differently if one had the chance to do it all over again? Just curious. --- Patrick K. O'Brien Orbtech

Barry is the proponent of the one-class-per-module rule. I don't mind having more classes per module at all; even having the module named after the "dominant" class doesn't bother me. A single module containing several classes makes for shorter imports, e.g. you can write from damodule import DaClass rather than from dapackage.DaClass import DaClass I realize that you can put magic in dapackage's __init__.py that lets you write from dapackage import DaClass but it's not pretty, and if DaClass is still defined in a module DaClass.py inside dapackage, there's an unpleasant ambiguity: on the one hand dapackage.DaClass is a module, on the other hand it's a class! Barry's email package avoids the __init__ nastiness but you end up having to write the rather verbose from email.Message import Message
The standard library grew organically and represents many points of view.
If I had to do it all over again, I'd probably organize it differently, but I don't see a point in attempting a massive reorganization -- it will only upset the users without real benefits (humans are pretty good at dealing with semi-disorganized data). --Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, 30 May 2002, Patrick K. O'Brien wrote:
Interesting question! Python Style Guide doesn't tell much about this. I think that if there is (more or less) natural hierarchy, then monolith module is better to partition into smaller ones. For example, os module. Even docs are mentioning 4 aspects of it. But only os.path is (kinda) separate. os.fd could have their own, as well as os.proc os.fs ... This way they are remembered better... xml-packages is another good example. Otherwise there is no point in adding hierarchy in standard libs. I think, modularizing is very similar to constructing good object class hierarchy. With the main fear - not to overdesign it. Sincerely yours, Roman Suzi -- \_ Russia \_ Karelia \_ Petrozavodsk \_ rnd@onego.ru \_ \_ Thursday, May 30, 2002 \_ Powered by Linux RedHat 7.2 \_ \_ "How do I love thee? My accumulator overflows." \_

[Trimming recipients to just python-dev... BAW]
"PKO" == Patrick K O'Brien <pobrien@orbtech.com> writes:
PKO> Without trying to open a can of worms here, is there any sort PKO> of consensus on the use of packages with multiple smaller PKO> modules vs. one module containing everything? It's an interesting topic, to be sure, and no doubt will generate some nice cooked (charred?) worms. I'll only describe my own thoughts on the matter. I personally like files in bite-sized chunks, which means when they get to be more than one or a few tall emacs screenfuls, I start to get the urge to split things up. I'm probably somewhat influenced too by my early C++ days when we adopted a one class per .h file (and one class implementation per .cc file). IIRC, Objective-C also encouraged this granularity of organization. Even earlier influences include the FORTH convention of organizing everything into 1024 byte blocks, and that's it! For whatever reason, I definitely prefer to edit more smaller files than fewer large files (but all in moderation!), and the class seems to be a good organizational structure around which to split things. PKO> I'm just wondering if there are any guidelines that should PKO> frame one's thinking beyond the fairly obvious ones? For PKO> example, is the standard library an exceptional case because PKO> it must appeal to new users as well as experts? Does a good PKO> part of this issue come down to personal preference? Or are PKO> there advantages and disadvantages that should be documented? PKO> (Maybe they already have.) I think most of the standard modules are special because they were written before Python had a good (or any!) package system. For legacy modules that export lots of classes, there's probably little benefit to refactoring them into packages. That might change if we want to start separating things out into separately installable distutilized packages though. The package seems to be the smallest convenient unit for distutils. PKO> Is the current library configuration considered healthy? PKO> There are a mix of packages and single modules. Are these PKO> implementations pretty optimal, or would they be organized PKO> differently if one had the chance to do it all over again? Some of the newer packages are designed as packages because they're complex and big: distutils, xml, email, compiler. That makes perfect sense to me. It's possible something like Cookie.py if re-done /might/ make sense as a package, but I'm not sure. I doubt it makes much sense for something like smtplib.py to ever be a package, and besides other than exceptions, it only exports one main class. BTW, exceptions are...exceptions! I don't have much problem lumping all of a package's exception classes in one module. so-yeah-it-probably-is-personal-preference-ly y'rs, -Barry

Let me just clarify that this is Barry Warsaw's opinion. --Guido van Rossum (home page: http://www.python.org/~guido/)

barry@zope.com (Barry A. Warsaw):
Deciding how to split things up into files is not such a big issue in C-related languages, because file organisation is not tied to naming. You can change your mind about it without having to change any of the code which refers to the affected items. In Python, one is encouraged to put more thought into the matter, because it affects how things are named. One-class-per-module is convenient for editing, but it introduces an extra unneeded level into the naming hierarchy. It's unfortunate that editing convenience and naming convenience seem to be in conflict in Python. Maybe a folding editor is the answer... Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Or a literate programming tool that separates these concerns nicely. --- Greg Ewing <greg@cosc.canterbury.ac.nz> wrote:
===== -- S. Lott, CCP :-{) S_LOTT@YAHOO.COM http://www.mindspring.com/~slott1 Buccaneer #468: KaDiMa Macintosh user: drinking upstream from the herd. __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com

The class isn't really the unit of reuse. The old one-class-per-file rules from C++ aren't helpful for good reusable design. They are for optimizing compiling and making. This is great book on large-scale design considerations. Much of it is C++ specific, but parts apply to Python. Large-Scale C++ Software Design, John Lakos Addison-Wesley, Paperback, Published July 1996, 845 pages, ISBN 0201633620. The module of related classes is the unit of reuse. A cluster of related modules can make sense for a large, complex reusable component, like an application program. As a user, anything in a module file that is not class definition (or the odd occaisional convenience function) is a show-stopper. If there is some funny business to implement submodules, that ends my interest. Part of the open source social contract is that if I'm going to use it, I'd better be able to support it. Even if you win the lottery and retire to a fishing boat in the Caribbean. The question of <was>Optik</was><is>options</is> having several reusable elements pushes my envelope. If it's job is to parse command line arguments, how many different reusable elements can their really be? Perhaps there are several candidate modules here. It seems difficult to justify putting them all into a library. The problem doesn't seem complex enough to justify a complex solution. --- "Patrick K. O'Brien" <pobrien@orbtech.com> wrote:
===== -- S. Lott, CCP :-{) S_LOTT@YAHOO.COM http://www.mindspring.com/~slott1 Buccaneer #468: KaDiMa Macintosh user: drinking upstream from the herd. __________________________________________________ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com

On 31 May 2002, Steven Lott said:
I think I agree with everything you said. There are only two important classes in Optik: OptionParser and Option. Together with one trivial support class (OptionValue) and some exception classes, that is the module -- the unit of reusability, in your terms. For convenience while developing, I split Optik into three source files -- optik/option_parser.py, optik/option.py, and optik/errors.py. There's not that much code; about 1100 lines. And it's all pretty tightly related -- the OptionParser class is useless without Option, and vice-versa. If you just want to use the code, it doesn't much matter if optik (or OptionParser) is a package with three sub-modules or a single file. If you just want to read the code, it's probably easier to have a single file. If you're hacking on it, it's probably easier to split the code up. I think Optik is now moving into that long, happy phase where it is mostly read and rarely hacked on, so I think it's time to merge the three separate source files into one. I very much doubt that it's too complex for this -- I have worked hard to keep it tightly focussed on doing one thing well. Greg -- Greg Ward - nerd gward@python.net http://starship.python.net/~gward/ I appoint you ambassador to Fantasy Island!!!

If you're hacking on it, it's probably easier to split the code up.
Hm, that's not how I tend to hack on things (except when working with others who like that style). Why do you find hacking on several (many?) small files easier for you than on a single large file? Surely not because loading a large file (in the editor, or in Python) takes too long? That was in the 80s. :-) Is it because multiple Emacs buffers allow you to maintain multiple current positions, with all the context that that entails? Or is it something else? --Guido van Rossum (home page: http://www.python.org/~guido/)

On 01 June 2002, Guido van Rossum said:
Actually, Optik started out in one file; I split it up somewhere around 600 or 700 lines of code expecting it to grow more. It only grew to around 1100 lines, which I suppose is a good thing. I think having small modules makes me more comfortable about adding code -- I don't feel at all hemmed-in adding 50 lines to a 300-line module, but adding 50 lines to an 800-line module makes me nervous. I think it all boils down to having things in easily-digested chunks, rather than concerns about stressing Emacs out. (OTOH and wildly OT: since I gave in a couple years ago and started using Emacs syntax-colouring, it *does* take a lot longer to load modules up -- eg. ~2 sec for the 1000-line rfc822.py. But that's probably just because Emacs is a great shaggy beast of an editor ("Eight(y) Megs and Constantly Swapping", "Eventually Mallocs All Core Storage", you know...). I'm sure if I got a brain transplant so that I could use vim, it would be different.) Greg -- Greg Ward - programmer-at-big gward@python.net http://starship.python.net/~gward/ Gee, I feel kind of LIGHT in the head now, knowing I can't make my satellite dish PAYMENTS!

"GW" == Greg Ward <gward@python.net> writes:
GW> (OTOH and wildly OT: since I gave in a couple years ago and GW> started using Emacs syntax-colouring, it *does* take a lot GW> longer to load modules up -- eg. ~2 sec for the 1000-line GW> rfc822.py. But that's probably just because Emacs is a great GW> shaggy beast of an editor ("Eight(y) Megs and Constantly GW> Swapping", "Eventually Mallocs All Core Storage", you GW> know...). I'm sure if I got a brain transplant so that I GW> could use vim, it would be different.) Actually, I've found jed to be a very nice quick-in-quick-out alternative to XEmacs (the one true Emacs :). Its default bindings and operation is close enough that I never notice the difference, for simple quick editing jobs. -Barry

BAW> (add-hook 'font-lock-mode-hook 'turn-on-fast-lock) Whoa! What a difference! After seeing your response to /F, I assume his solution was for GNU Emacs and yours if for XEmacs, right? Skip

"SM" == Skip Montanaro <skip@pobox.com> writes:
BAW> (add-hook 'font-lock-mode-hook 'turn-on-fast-lock) SM> Whoa! What a difference! SM> After seeing your response to /F, I assume his solution was SM> for GNU Emacs and yours if for XEmacs, right? What's "GNU Emacs"? <wink> -Barry

On Mon, Jun 03, 2002, Barry A. Warsaw wrote:
Which proves that vi[m] is the true Pythonic editor -- there's only one way. baiting-ly y'rs -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In the end, outside of spy agencies, people are far too trusting and willing to help." --Ira Winkler

On 03 June 2002, Barry A. Warsaw said:
Neither one worked for me (XEmacs 21.4.6). You're *never* going to believe what did work: load a font-locked file go to Options menu go to "Syntax Highlighting" select "Lazy lock" back to Options menu select "Save Options ..." restart XEmacs XEmacs seems to have added this bit of gibberish, err sorry line of Lisp code, to my ~/.xemacs/custom.el: '(lazy-lock-mode t nil (lazy-lock)) The wonderful thing about (X)Emacs is that there are so very many ways for it not to do what you want it to do, and every one of those ways just might work in some version of (X)Emacs somewhere... Greg -- Greg Ward - Python bigot gward@python.net http://starship.python.net/~gward/ Vote anarchist.

"GW" == Greg Ward <gward@python.net> writes:
GW> Neither one worked for me (XEmacs 21.4.6). Well of course (wink) you also have to (require 'fast-lock). | You're *never* | going to believe what did work: | load a font-locked file | go to Options menu | go to "Syntax Highlighting" | select "Lazy lock" | back to Options menu | select "Save Options ..." | restart XEmacs Oh, but I do believe it. GW> XEmacs seems to have added this bit of gibberish, err sorry GW> line of Lisp code, to my ~/.xemacs/custom.el: GW> '(lazy-lock-mode t nil (lazy-lock)) Pshhh. D'oh. Obvious. GW> The wonderful thing about (X)Emacs is that there are so very GW> many ways for it not to do what you want it to do, and every GW> one of those ways just might work in some version of (X)Emacs GW> somewhere... how-many-more-would-you-like?-ly y'rs, -Barry

On Sat, Jun 01, 2002, Guido van Rossum wrote:
s/Emacs/vi sessions/ Yes. -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "In the end, outside of spy agencies, people are far too trusting and willing to help." --Ira Winkler

"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> we'd be writing GvR> from OptionParser import OptionParser GvR> I like this! GvR> This works best if it is a single file; making OptionParser a GvR> package would just complicate things. So maybe I should take GvR> Greg up on his offer to refactor the code into a single .py GvR> file. GvR> (Barry prefers that there's only one class per file; GvR> fortunately I don't have that hangup. :-) If OptionParser were to be a package with lots of public classes then yah, I'd be (more of a) neurotic noodge about it, but as it is... +1 ! :) -Barry

I think it's better to pick a new name and leave the existing getopt module alone. I think keeping it a package is fine. I prefer to have little or no magic in __init__.py though (the email package's __init__.py is borderline :-). I think that "options" is a fine package name. Yes, there are other things that one could consider options. No, I don't think that will cause confusion. After all "getopt" isn't much more specific. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

"guido" == Guido van Rossum <guido@python.org> writes:
guido> I think it's better to pick a new name and leave the existing getopt guido> module alone. How about "OptParser" (alternatives: OptionsParser, OptsParser) as an analogue to the existing ConfigParser? They do go together, both conceptually and in practice, after all. That would leave the more general "options" free for something, well, more general. :> Best, Kendall Clark

A decent guideline is to use the dominant class name as the module name. That would be OptionParser. Then, instead of from optik import OptionParser we'd be writing from OptionParser import OptionParser I like this! This works best if it is a single file; making OptionParser a package would just complicate things. So maybe I should take Greg up on his offer to refactor the code into a single .py file. (Barry prefers that there's only one class per file; fortunately I don't have that hangup. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

On 30 May 2002, Guido van Rossum said:
I think the BDFL has spoken. I can live with this, although I prefer lower-case module names. Whatever. Greg -- Greg Ward - geek-at-large gward@python.net http://starship.python.net/~gward/ Paranoia is simply an optimistic outlook on life.

I prefer lower-case module names for most situations, but I make an exception for modulename == classname. --Guido van Rossum (home page: http://www.python.org/~guido/)

"GvR" == Guido van Rossum <guido@python.org> writes:
GvR> I prefer lower-case module names for most situations, but I GvR> make an exception for modulename == classname. Agreed! That's what the style guide says too. :) -Barry

[Guido]
from OptionParser import OptionParser
I like this!
[Greg]
I think the BDFL has spoken. I can live with this, although I prefer lower-case module names. Whatever.
[Guido]
I prefer lower-case module names for most situations, but I make an exception for modulename == classname.
But there's more than just the one class (OptionParser) in the module, and the other classes (Option, Values) *are* used. Barry's rule may hold for 1 class per module; that's not the case here. +1 for "options" -1 for "OptionParser" -- David Goodger <goodger@users.sourceforge.net> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/

"DG" == David Goodger <goodger@users.sourceforge.net> writes:
DG> But there's more than just the one class (OptionParser) in the DG> module, and the other classes (Option, Values) *are* used. DG> Barry's rule may hold for 1 class per module; that's not the DG> case here. If that's so, then I'd prefer to see each class in its own module inside a parent package. | +1 for "options" | -1 for "OptionParser" I still think getopt-as-package could be made to work, but I'd be fine with `options'. Whatever though; Guido's weighed in and Greg should just decide and go for it! -Barry

"DG" == David Goodger <goodger@users.sourceforge.net> writes:
DG> We all know *your* bias! ;-) You've sat next to me at a Python conference then? My biological interference with acoustic systems is soooo embarrassing. http://www.acronymfinder.com/af-query.asp?p=dict&String=exact&Acronym=BIAS -Barry
participants (12)
-
Aahz
-
barry@zope.com
-
David Goodger
-
Fredrik Lundh
-
Greg Ewing
-
Greg Ward
-
Guido van Rossum
-
Kendall Grant Clark
-
Patrick K. O'Brien
-
Roman Suzi
-
Skip Montanaro
-
Steven Lott