making a module callable

While I don't ordinarily endorse this use case it'd be nice not to require hacks involving sys.modules monkeying such as: https://github.com/has207/flexmock/pull/89 specifically: https://github.com/has207/flexmock/commit/bd47fa8189c7dff349de257c0e061b9fce... to make a module callable. This obviously touches on the larger ideas of what is a module vs a class and why are they different given that they're "just" namespaces. (sorry. not offering ideas myself just hoping others have them) -gps

2013/11/19 Gregory P. Smith <greg@krypto.org> While I don't ordinarily endorse this use case it'd be nice not to require
hmm, interesting thought. there are some modules who just have one single main use (pprint) and could profit from that. imho it would simplify the situation. currently, everything is callable that has a __call__ property which is itself callable: class Foo(): __call__(self): pass def bar(): pass foo = Foo() foo() foo.bar() def baz(): pass foo.baz = baz foo.baz() except modules. which feels is imho like an artificial limitation. – phil

On 19 November 2013 18:09, Philipp A. <flying-sheep@web.de> wrote:
Not quite true: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
This is why module objects are not callable even if they have a __call__. They are *instances* of ModuleType and the __call__ method is looked up on their type, not the instance itself. So modules not being callable even when they a __call__ is not an anomaly, even if it is not convenient sometimes. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

2013/11/19 Michael Foord <fuzzyman@gmail.com> On 19 November 2013 18:09, Philipp A. <flying-sheep@web.de> wrote:
you’re right, apologies. so the hack consists of switching a module’s class during runtime… there’s also another hack, calldules<https://pypi.python.org/pypi/calldules>, making that automatic (funnily via implicits effects when doing import calldules). note that it isn’t serious! just a programming exercise.

there are some modules who just have one single main use (pprint) and could profit from that.
A milion times this! pprint.pprint() time.time() random.random() copy.copy() md5.md5() timeit.timeit() glob.glob() cStringIO.cStringIO() StringIO.StringIO() On Tue, Nov 19, 2013 at 1:01 PM, Philipp A. <flying-sheep@web.de> wrote:

Maybe the solution would be to make it possible to "return" something else than a module object, similar to how node.js does it? In node.js: callme.js: module.exports = function (a) { console.log("a:",a); };
Possible way to do it in Python *already*: callme.py: import sys sys.modules[__name__] = lambda a: print("a:",a)
Maybe not very nice, but is there a reason why not to do this (except for it's ugliness)? On 11/19/2013 10:39 PM, Haoyi Li wrote:

On 20 Nov 2013 08:06, "Mathias Panzenböck" <grosser.meister.morti@gmx.net> wrote:
Maybe the solution would be to make it possible to "return" something
else than a module object, similar to how node.js does it? Already supported, modules just have to replace themselves with an instance of a custom class in sys.modules. PEP 451 makes it even easier to write custom finders and loaders that return custom module types. Cheers, Nick.
it's ugliness)?
On 11/19/2013 10:39 PM, Haoyi Li wrote:
there are some modules who just have one single main use (pprint) and
could profit from that.

On 11/19/2013 4:39 PM, Haoyi Li wrote:
In order to make modules callable, ModuleType must have a __call__ method. In order to make the call execute code in the module, that method should delegate to a callable in the module instance that has a known special name, such as __main__. class ModuleType(): def __call__(self, *args, **kwds): return self.__main__(*args, **kwds) Doc: "The __main__ object of a module is its main callable, the one that is called if the module is called without specifying anything else. If this were done...
pprint.pprint()
then adding __main__ = pprint to pprint should make the following work: import pprint; pprint(ob)
etc -- Terry Jan Reedy

On Wed, Nov 20, 2013 at 9:37 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Hmm Classes allow you to control the metaclass. Should modules allow such a declaration? That would make this sort of thing fully customizable. But is there any way to avoid the chicken-and-egg problem of trying to logically put that into the same source file as the module whose metaclass is being changed? Considering that the creation of a class involves building up its dictionary of contents and _then_ calling type(), it could in theory be possible to build up a dictionary of module contents, possibly find something with a magic name like __metamodule__, and then use that as the module's type. But this might become rather convoluted. ChrisA

On Nov 19, 2013 10:52 AM, "Gregory P. Smith" <greg@krypto.org> wrote:
What's the use case for a callable module? In the flexmock example, is it just so they can do an import instead of a from..import? As Georg said, modules are just top-level namespaces, API containers. Importing the callable you want out of a module is easy. However, the underlying idea is something that has come up before and may be worth more consideration. tl;dr: __metamodule__ (pre-bikeshedding) would be a good way to go, but isn't worth it and may be an attractive nuisance. If we are going to support customization of module classes, I'd rather we do it via a general API (e.g. Chris's __metamodule__) than piecemeal (via special-casing __call__, etc.). However, you can already use a custom module type in the two ways that Nick mentioned, the first of which flexmock is doing (and Django does IIRC). Sticking something into sys.modules to replace the currently executing module is indeed a hack. The import system accommodates this not by design (unless someone is willing to come forward and admit guilt <wink>) but mostly as an incidental implementation artifact of the import machinery from many releases ago. [1] As Nick mentioned, PEP 451 introduces an optional create_module() method on loaders that returns the module object to use during loading. This is nice if you are already writing a loader. Otherwise it's a pain (the import hook machinery isn't exactly simple) and usually won't be worth your time. Furthermore, your loader will probably be applied to multiple modules (which may be what you need). It certainly isn't a one-off, add-something-to-the-affected-module sort of thing. Basically, having to write a loader and plug it in is like (only more complicated) having to use a metaclass just to implement a __prepare_class__() that returns an OrderedDict, all so you can have an ordered class namespace. Loader.create_module() is a good addition, but is too low level to use as a replacement for the sys.modules hack. In contrast, something like __metamodule__ would be an effective replacement. It would be similar in spirit and in syntax to __init_class__ in PEP 422 (and __metaclass__ in Python 2), defined at the top of the module and used for the module. The thing that appeals to me is that we could deprecate the sys.modules hack. :) The big question is, is having a custom module type a common enough need? To me the desire for it usually implies a misunderstanding of the purpose of modules. If we had an easier API would it be an attractive nuisance? Unless it's a big win I don't think it's a good idea, and I'm not convinced it's common enough a need. -eric [1] A module replacing itself in sys.modules came up during the importlib bootstrap integration, where it required adding yet another special-case backward-compatibility pain point to the importlib implementation. I can't find the actual email, but I refer to what happened in http://bugs.python.org/msg166630, note "[3]". It certainly surprised us that you could do it and that people actually were. At this point I guess the latter shouldn't have been surprising. :)

On 11/20/2013 12:14 PM, Eric Snow wrote:
Actually, it is intentional. An excerpt from https://mail.python.org/pipermail/python-ideas/2012-May/014969.html
-- ~Ethan~

Yeah, and the fact that people are jumping throw these hoops and doing "nasty hacks" despite their nastiness means that there's a real need for the functionality. If it was easy and people did it, then we don't learn anything, same if it's difficult and people don't do it. On the other hand, if a feature is easy and people don't do it, then maybe that feature deserves to be deprecated/made less easy. Similarly, if it's difficult/nasty/hacky and you find people doing it anyway, then the functionality probably deserves to be made easier to use. The hackiness if an artifact of the way things are now, but this whole thread is about changing the way things are now. We should be shaping the machinery to fit what people do, rather than trying to shape people to fit the machinery which was arbitrarily designed a long time ago. On Wed, Nov 20, 2013 at 12:23 PM, Ethan Furman <ethan@stoneleaf.us> wrote:

On Nov 20, 2013, at 12:14, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Given that __metaclass__ was removed in Python 3, doesn't "this is an exact parallel to __metaclass__" argue against the idea, rather than for? Or at least against the name? (Maybe __init_module__?) Anyway, I think a module replacing itself with something callable is both more flexible and more in line with the way people are actually doing things today, so maybe a "less hacky" way to do the sys.modules hack is what people actually want here.

On 21 Nov 2013 13:02, "Ethan Furman" <ethan@stoneleaf.us> wrote:
It potentially causes problems for module reloading and it definitely causes problems for the import engine PEP (since sys.modules is process global state). I expect we'll revisit this later in the 3.5 development cycle (we haven't even merged the accepted PEP 451 for 3.4 yet), but formalising the current module replacement idiom is already a more likely outcome than making module instances callable. Cheers, Nick.

On Wed, Nov 20, 2013 at 7:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Agreed, formalizing how to do the replacement trick sounds good. It'd be ideal for a module to not need to know its own name within the code doing the replacement and for it to not need to reimplement common interface bits in a replacement class so that it quacks like a module. Making it callable? well, that does just seem silly so I'm not actually worried about making that specifically easier itself.

On 22 November 2013 16:45, Gregory P. Smith <greg@krypto.org> wrote:
Well, it just has to use __name__. Any formalisation that doesn't get passed the name is going to have to use frame trickery to get the name which is just smelly. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

It'd be nice to formalize a way to get rid of the __name__ == '__main__' idiom as well in the long long run. Sure everyone's editor types that for them now but it's still a wart. Anyways, digressing... ;) -- blame half the typos on my phone.

On Fri, 22 Nov 2013 22:02:56 +0100 "Philipp A." <flying-sheep@web.de> wrote:
we’re all accustomed to it, but objectively, it’s horribly implicit and unobvious.
It's funny, when I first learned Python, I actually found it quite simple and elegant (leveraging the power of built-in introspection metadata). Regards Antoine.

Steven D'Aprano <steve@pearwood.info> writes:
Even if the executable only does one thing, it's still good to be able to *rename* the program and not have to change the usage and error messages:: import os import sys progname = os.path.basename(__file__) # … sys.stdout.write( "{progname}: Couldn't frobnicate the spangule.\n".format( progname=progname)) So, definitely ‘sys.argv’ needs to continue having all command-line arguments, including the command name used to invoke the program. -- \ “We must respect the other fellow's religion, but only in the | `\ sense and to the extent that we respect his theory that his | _o__) wife is beautiful and his children smart.” —Henry L. Mencken | Ben Finney

Ben Finney wrote:
So, definitely ‘sys.argv’ needs to continue having all command-line arguments, including the command name used to invoke the program.
That doesn't necessarily mean it has to be passed along with the arguments to a __main__() function, though. You can always extract it from sys.argv if you need it. Arguably it's more convenient to get it from there, since you're most often to want it for things like formatting error messages, which are probably not happening right inside the main function. -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Yet the ‘__main__’ function needs to get the arguments as a parameter:: def __main__(argv): (or at least, that's how I've seen it done most commonly, and I agree that it makes a good intreface for ‘__main__’ functions). Now you're saying there is one command-line parameter which chould not come through that interface? Why the special case?
It's more convenient to look in “the sequence of command-line parameters” for all the command-line parameters, without special-casing the command name. -- \ “If [a technology company] has confidence in their future | `\ ability to innovate, the importance they place on protecting | _o__) their past innovations really should decline.” —Gary Barnett | Ben Finney

From: Ben Finney <ben+python@benfinney.id.au>
Or to get the arguments as separate parameters: def __main__(*argv): … or, more realistically: def __main__(inpath, outpath): Then: if __name__ == '__main__': __main__(*sys.argv[1:]) The benefit is that the names document what the arguments mean, and also give you better error messages if the script is called wrong. Obviously any serious script is going to have a real usage error, etc., but then any serious script is going to use argparse anyway. For quick & dirty scripts, the first error below is obviously nicer than the second, and no more work. $ ./script.py TypeError: main() missing 2 required positional arguments: 'inpath' and 'outpath' $ ./script.py IndexError: list index out of range There's no reason you _couldn't_ write the idiom with main(argv0, inpath, outpath); I just haven't seen it that way. Scripts that explode argv always seem to do *argv[1:], while those that use it as a list usually seem to do all of argv.

Ben Finney wrote:
But you hardly ever want to process argv[0] the same way as the rest of the arguments, so you end up treating it as a special case anyway. It seems to me we only think of it as a command line argument because C traditionally presents it that way. I don't think it's something that would naturally come to mind otherwise. I know I found it quite surprising when I first encountered it. -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
This isn't about *processing* that argument; it's about *receiving* it in the first place to the function. Having it omitted by default means there's a special case just to *get at* the first command-line argument:: def __main__(argv_without_first_arg=None): if argv is None: argv = sys.argv[1:] first_arg = sys.argv[0] Which still sucks, because how do I then pass a different command name to ‘__main__’ since it now expects to get it from elsewhere? Much better to have the interface just accept the *whole* sequence of command line arguments: def __main__(argv=None): if argv is None: argv = sys.argv Now it's completely up to the caller what the command-line looks like, which means the ‘__main__’ code needs no special cases for using the module as a library or for unit tests etc. You just construct the command-line as you need it to look, and pass it in to the function.
It seems to me we only think of it as a command line argument because C traditionally presents it that way.
What C does isn't relevant here. I think of the whole command line as a sequence of arguments because that's how the program receives the command line from the Python interpreter. Mangling it further just makes a common use case more difficult for no good reason.
Many useful things are surprising when one first encounters them :-) -- \ “I don't accept the currently fashionable assertion that any | `\ view is automatically as worthy of respect as any equal and | _o__) opposite view.” —Douglas Adams | Ben Finney

On 25 November 2013 03:58, Ben Finney <ben+python@benfinney.id.au> wrote:
The name of the script is not an argument *to* the script. Having it there in the first place is the special case, not removing it. It's only an old C convention (and now an old Python convention) that makes you think it is. Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Michael Foord <fuzzyman@gmail.com> writes:
No, the command name is part of the command line arguments, as I pointed out earlier. That is, when any program is started with:: foo bar baz the command line arguments are “foo”, “bar”, “baz”. That says nothing about what those arguments *mean* yet; they're all available to the program, for it to figure out their significance.
It's only an old C convention (and now an old Python convention) that makes you think it is.
Not at all. Giving the command-line arguments to the program as a sequence of strings is a standard cross-language interface; the operating system hands the command line to the running program without caring what language it was implemented in. This is all prior to talking about what those arguments *mean*; the discussion of “‘foo’ is a command, ‘bar’ and ‘baz’ are its parameters” all comes afterward and is irrelevant to the process of *getting at* the command line. If the program's main code wants to discard the first argument, as many programs do for good reasons, that's up to the programmer to decide explicitly. Many other programs make use of the whole command line, and ty should not need some different way to get at the contents of the command line. If we're going to make the command line sequence a parameter to the main code, there should be one interface, no special cases. As it stands, that conventional interface in Python code is:: def main(argv=None): if argv is None: argv = sys.argv and then it's up to the rest of the ‘main’ function (or however it's spelled) to process the full command line sequence that was received. So, while the name “argv” is a C convention, the handling of the command line as a homogeneous sequence of strings is language-agnostic. Automatically discarding the first argument, on the assumption that the program doesn't care about it, is making a false assumption in many cases and makes a common use case needlessly difficult. -- \ “We must find our way to a time when faith, without evidence, | `\ disgraces anyone who would claim it.” —Sam Harris, _The End of | _o__) Faith_, 2004 | Ben Finney

Ben Finney wrote:
If you're talking about doing different things based on argv[0], I wouldn't call it a *common* use case. The last time I saw it done was on an early version of SunOS that didn't have shared libraries, so they linked all the gui tools into one big executable to reduce disk and memory usage. Now that we have shared libraries, there's much less need for that kind of trick. -- Greg

On Tue, Nov 26, 2013 at 10:43 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Upstart has a set of symlinks to initctl called "start", "stop", "reload", etc. They're shortcuts for "initctl start", "initctl stop", etc. Also, I've often fetched up argv[0] as part of a usage message, which isn't strictly "doing different things", but it does mean that renaming the program won't leave an old name inside its help message. ChrisA

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
The common use case I'm referring to is to use the program name in output messages (e.g. help, errors) without needing to change the code when the program file is renamed, or when a different command line is constructed by the caller. Doing different things based on how the program is invoke is another common use case, yes. Both of these use cases argue for retaining the full command line sequence (or whatever replacement command line sequence the caller chooses to construct) as input to the main code, and allow the main code to decide which parts are important. -- \ “I am too firm in my consciousness of the marvelous to be ever | `\ fascinated by the mere supernatural …” —Joseph Conrad, _The | _o__) Shadow-Line_ | Ben Finney

Greg Ewing writes:
busybox is still standard in many Linux distros, though I don't really know why (embedded systems, "small" rescue media?), and surely you've seen fgrep/grep/egrep (hardlinked to the same file on Mac OS X as of "Snow Leopard"), even if you personally use POSIX-standard "grep -F" and "grep -E". Linking /bin/sh to bash or zsh is a common trick on GNU systems, and typically invokes strict POSIX conformance (well, as strictly as any GNU program ever conforms to another organization's standard :-( ). So I rather suspect you "see" it frequently, even today. You just don't recognize it when you see it. Could we do without such trickery? Sure. However, the point about usage messages still stands: it's useful to fetch the actual command line token used to invoke the program, because program names and invocation methods do change.

Hi! On Tue, Nov 26, 2013 at 12:43:25PM +1300, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
sys.argv[0] is used for: 1) Setup sys.path; something like lib_dir = os.path.dirname(sys.argv[0]) + os.sep + 'lib') sys.path.append(lib_dir) 2) Setup relative path(s) (to start a helper script, e.g.); something like Subprocess("%s/Robots/sub.py" % os.path.dirname(sys.argv[0])) 3) Report usage: sys.stderr.write("Usage: %s [-o|--old]\n" % sys.argv[0]) 4) Change behaviour based on the script's name. I am -1 on removing sys.argv[0] from main(argv). Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, Nov 22, 2013 at 10:02:56PM +0100, Philipp A. wrote:
we’re all accustomed to it, but objectively, it’s horribly implicit and unobvious.
Certainly you are correct that it is unobvious, but the "if __name__" idiom is anything but implicit. It's the opposite, you are *explicitly* testing whether the module is being run as the main module (__name__ == "__main__") and if so, you explicitly run some code. Of course you can also run code only when *not* the main module: if __name__ != '__main__': print "Module is being imported" else: print "Module is being executed" And you aren't limited to a single "main function", you can dot such tests all throughout your code, including inside functions. Aside: perhaps it would have been better to have an explicit ismain() function that returns True when running as the main module, that's more discoverable. -- Steven

On Fri, Nov 22, 2013 at 9:07 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I do think it's a bad idea. It would be like replacing a builtin module in sys.modules, which is inadvisable (particularly key ones like sys). Like builtins, __main__ is a special module, one created during interpreter startup. It plays a special part in the REPL. Various parts of the stdlib have special-casing for __main__, which could be affected by replacement. Replacing __main__ in sys.modules is, to me, just as inadvisable as replacing sys. The catch is that a script is exec'ed into the __main__ module's namespace, so during execution (nearly) all the import-related attributes relate to __main__. In contrast, the equivalent module from the same file would be loaded into its own namespace, with its own import-related attributes, and cached independently at sys.modules[module_name]. This duality causes all sorts of grief (PEP 395 is a response to some of the pain points). A key hangup is that __name__ is different depending on run-as-script or imported-as-module. That brings us back to the idea of a more formal replace-module-in-sys-modules API. Any solution to that which uses __name__ to determine the module's name has to take into account that it may have been run as a script (where __name__ will be "__main__"). If we simply used __name__ staight up we might end up replacing __main__ in sys.modules, which I suggest is a bad idea. Hence the point of special-casing __main__. Sorry I wasn't clear. Hopefully this was more so. -eric

On 24 November 2013 02:53, Haoyi Li <haoyi.sg@gmail.com> wrote:
The main thing that makes __main__ special is that it's a builtin module, but we then use its namespace to run Python code. Various parts of the interpreter assume that __main__ will always be the same module that was initialized during interpreter startup, so they don't have to keep re-initializing it (or checking if it has been replaced). It's not quite as intertwined with the interpreter internals as sys, since there's no direct reference to it from the interpreter state, but the case can certainly made that there *should* be such a reference if we're going to assume consistency over the the lifetime of the process. However, while I can't vouch for earlier versions, replacing __main__ in 3.3+ shouldn't cause any major issues, although it does mean certain things may not behave as expected (such as the -i switch and the PYTHONINSPECT option). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

2013/11/19 Gregory P. Smith <greg@krypto.org> While I don't ordinarily endorse this use case it'd be nice not to require
hmm, interesting thought. there are some modules who just have one single main use (pprint) and could profit from that. imho it would simplify the situation. currently, everything is callable that has a __call__ property which is itself callable: class Foo(): __call__(self): pass def bar(): pass foo = Foo() foo() foo.bar() def baz(): pass foo.baz = baz foo.baz() except modules. which feels is imho like an artificial limitation. – phil

On 19 November 2013 18:09, Philipp A. <flying-sheep@web.de> wrote:
Not quite true: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin Type "help", "copyright", "credits" or "license" for more information.
This is why module objects are not callable even if they have a __call__. They are *instances* of ModuleType and the __call__ method is looked up on their type, not the instance itself. So modules not being callable even when they a __call__ is not an anomaly, even if it is not convenient sometimes. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

2013/11/19 Michael Foord <fuzzyman@gmail.com> On 19 November 2013 18:09, Philipp A. <flying-sheep@web.de> wrote:
you’re right, apologies. so the hack consists of switching a module’s class during runtime… there’s also another hack, calldules<https://pypi.python.org/pypi/calldules>, making that automatic (funnily via implicits effects when doing import calldules). note that it isn’t serious! just a programming exercise.

there are some modules who just have one single main use (pprint) and could profit from that.
A milion times this! pprint.pprint() time.time() random.random() copy.copy() md5.md5() timeit.timeit() glob.glob() cStringIO.cStringIO() StringIO.StringIO() On Tue, Nov 19, 2013 at 1:01 PM, Philipp A. <flying-sheep@web.de> wrote:

Maybe the solution would be to make it possible to "return" something else than a module object, similar to how node.js does it? In node.js: callme.js: module.exports = function (a) { console.log("a:",a); };
Possible way to do it in Python *already*: callme.py: import sys sys.modules[__name__] = lambda a: print("a:",a)
Maybe not very nice, but is there a reason why not to do this (except for it's ugliness)? On 11/19/2013 10:39 PM, Haoyi Li wrote:

On 20 Nov 2013 08:06, "Mathias Panzenböck" <grosser.meister.morti@gmx.net> wrote:
Maybe the solution would be to make it possible to "return" something
else than a module object, similar to how node.js does it? Already supported, modules just have to replace themselves with an instance of a custom class in sys.modules. PEP 451 makes it even easier to write custom finders and loaders that return custom module types. Cheers, Nick.
it's ugliness)?
On 11/19/2013 10:39 PM, Haoyi Li wrote:
there are some modules who just have one single main use (pprint) and
could profit from that.

On 11/19/2013 4:39 PM, Haoyi Li wrote:
In order to make modules callable, ModuleType must have a __call__ method. In order to make the call execute code in the module, that method should delegate to a callable in the module instance that has a known special name, such as __main__. class ModuleType(): def __call__(self, *args, **kwds): return self.__main__(*args, **kwds) Doc: "The __main__ object of a module is its main callable, the one that is called if the module is called without specifying anything else. If this were done...
pprint.pprint()
then adding __main__ = pprint to pprint should make the following work: import pprint; pprint(ob)
etc -- Terry Jan Reedy

On Wed, Nov 20, 2013 at 9:37 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Hmm Classes allow you to control the metaclass. Should modules allow such a declaration? That would make this sort of thing fully customizable. But is there any way to avoid the chicken-and-egg problem of trying to logically put that into the same source file as the module whose metaclass is being changed? Considering that the creation of a class involves building up its dictionary of contents and _then_ calling type(), it could in theory be possible to build up a dictionary of module contents, possibly find something with a magic name like __metamodule__, and then use that as the module's type. But this might become rather convoluted. ChrisA

On Nov 19, 2013 10:52 AM, "Gregory P. Smith" <greg@krypto.org> wrote:
What's the use case for a callable module? In the flexmock example, is it just so they can do an import instead of a from..import? As Georg said, modules are just top-level namespaces, API containers. Importing the callable you want out of a module is easy. However, the underlying idea is something that has come up before and may be worth more consideration. tl;dr: __metamodule__ (pre-bikeshedding) would be a good way to go, but isn't worth it and may be an attractive nuisance. If we are going to support customization of module classes, I'd rather we do it via a general API (e.g. Chris's __metamodule__) than piecemeal (via special-casing __call__, etc.). However, you can already use a custom module type in the two ways that Nick mentioned, the first of which flexmock is doing (and Django does IIRC). Sticking something into sys.modules to replace the currently executing module is indeed a hack. The import system accommodates this not by design (unless someone is willing to come forward and admit guilt <wink>) but mostly as an incidental implementation artifact of the import machinery from many releases ago. [1] As Nick mentioned, PEP 451 introduces an optional create_module() method on loaders that returns the module object to use during loading. This is nice if you are already writing a loader. Otherwise it's a pain (the import hook machinery isn't exactly simple) and usually won't be worth your time. Furthermore, your loader will probably be applied to multiple modules (which may be what you need). It certainly isn't a one-off, add-something-to-the-affected-module sort of thing. Basically, having to write a loader and plug it in is like (only more complicated) having to use a metaclass just to implement a __prepare_class__() that returns an OrderedDict, all so you can have an ordered class namespace. Loader.create_module() is a good addition, but is too low level to use as a replacement for the sys.modules hack. In contrast, something like __metamodule__ would be an effective replacement. It would be similar in spirit and in syntax to __init_class__ in PEP 422 (and __metaclass__ in Python 2), defined at the top of the module and used for the module. The thing that appeals to me is that we could deprecate the sys.modules hack. :) The big question is, is having a custom module type a common enough need? To me the desire for it usually implies a misunderstanding of the purpose of modules. If we had an easier API would it be an attractive nuisance? Unless it's a big win I don't think it's a good idea, and I'm not convinced it's common enough a need. -eric [1] A module replacing itself in sys.modules came up during the importlib bootstrap integration, where it required adding yet another special-case backward-compatibility pain point to the importlib implementation. I can't find the actual email, but I refer to what happened in http://bugs.python.org/msg166630, note "[3]". It certainly surprised us that you could do it and that people actually were. At this point I guess the latter shouldn't have been surprising. :)

On 11/20/2013 12:14 PM, Eric Snow wrote:
Actually, it is intentional. An excerpt from https://mail.python.org/pipermail/python-ideas/2012-May/014969.html
-- ~Ethan~

Yeah, and the fact that people are jumping throw these hoops and doing "nasty hacks" despite their nastiness means that there's a real need for the functionality. If it was easy and people did it, then we don't learn anything, same if it's difficult and people don't do it. On the other hand, if a feature is easy and people don't do it, then maybe that feature deserves to be deprecated/made less easy. Similarly, if it's difficult/nasty/hacky and you find people doing it anyway, then the functionality probably deserves to be made easier to use. The hackiness if an artifact of the way things are now, but this whole thread is about changing the way things are now. We should be shaping the machinery to fit what people do, rather than trying to shape people to fit the machinery which was arbitrarily designed a long time ago. On Wed, Nov 20, 2013 at 12:23 PM, Ethan Furman <ethan@stoneleaf.us> wrote:

On Nov 20, 2013, at 12:14, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Given that __metaclass__ was removed in Python 3, doesn't "this is an exact parallel to __metaclass__" argue against the idea, rather than for? Or at least against the name? (Maybe __init_module__?) Anyway, I think a module replacing itself with something callable is both more flexible and more in line with the way people are actually doing things today, so maybe a "less hacky" way to do the sys.modules hack is what people actually want here.

On 21 Nov 2013 13:02, "Ethan Furman" <ethan@stoneleaf.us> wrote:
It potentially causes problems for module reloading and it definitely causes problems for the import engine PEP (since sys.modules is process global state). I expect we'll revisit this later in the 3.5 development cycle (we haven't even merged the accepted PEP 451 for 3.4 yet), but formalising the current module replacement idiom is already a more likely outcome than making module instances callable. Cheers, Nick.

On Wed, Nov 20, 2013 at 7:44 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Agreed, formalizing how to do the replacement trick sounds good. It'd be ideal for a module to not need to know its own name within the code doing the replacement and for it to not need to reimplement common interface bits in a replacement class so that it quacks like a module. Making it callable? well, that does just seem silly so I'm not actually worried about making that specifically easier itself.

On 22 November 2013 16:45, Gregory P. Smith <greg@krypto.org> wrote:
Well, it just has to use __name__. Any formalisation that doesn't get passed the name is going to have to use frame trickery to get the name which is just smelly. Michael
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

It'd be nice to formalize a way to get rid of the __name__ == '__main__' idiom as well in the long long run. Sure everyone's editor types that for them now but it's still a wart. Anyways, digressing... ;) -- blame half the typos on my phone.

On Fri, 22 Nov 2013 22:02:56 +0100 "Philipp A." <flying-sheep@web.de> wrote:
we’re all accustomed to it, but objectively, it’s horribly implicit and unobvious.
It's funny, when I first learned Python, I actually found it quite simple and elegant (leveraging the power of built-in introspection metadata). Regards Antoine.

Steven D'Aprano <steve@pearwood.info> writes:
Even if the executable only does one thing, it's still good to be able to *rename* the program and not have to change the usage and error messages:: import os import sys progname = os.path.basename(__file__) # … sys.stdout.write( "{progname}: Couldn't frobnicate the spangule.\n".format( progname=progname)) So, definitely ‘sys.argv’ needs to continue having all command-line arguments, including the command name used to invoke the program. -- \ “We must respect the other fellow's religion, but only in the | `\ sense and to the extent that we respect his theory that his | _o__) wife is beautiful and his children smart.” —Henry L. Mencken | Ben Finney

Ben Finney wrote:
So, definitely ‘sys.argv’ needs to continue having all command-line arguments, including the command name used to invoke the program.
That doesn't necessarily mean it has to be passed along with the arguments to a __main__() function, though. You can always extract it from sys.argv if you need it. Arguably it's more convenient to get it from there, since you're most often to want it for things like formatting error messages, which are probably not happening right inside the main function. -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
Yet the ‘__main__’ function needs to get the arguments as a parameter:: def __main__(argv): (or at least, that's how I've seen it done most commonly, and I agree that it makes a good intreface for ‘__main__’ functions). Now you're saying there is one command-line parameter which chould not come through that interface? Why the special case?
It's more convenient to look in “the sequence of command-line parameters” for all the command-line parameters, without special-casing the command name. -- \ “If [a technology company] has confidence in their future | `\ ability to innovate, the importance they place on protecting | _o__) their past innovations really should decline.” —Gary Barnett | Ben Finney

From: Ben Finney <ben+python@benfinney.id.au>
Or to get the arguments as separate parameters: def __main__(*argv): … or, more realistically: def __main__(inpath, outpath): Then: if __name__ == '__main__': __main__(*sys.argv[1:]) The benefit is that the names document what the arguments mean, and also give you better error messages if the script is called wrong. Obviously any serious script is going to have a real usage error, etc., but then any serious script is going to use argparse anyway. For quick & dirty scripts, the first error below is obviously nicer than the second, and no more work. $ ./script.py TypeError: main() missing 2 required positional arguments: 'inpath' and 'outpath' $ ./script.py IndexError: list index out of range There's no reason you _couldn't_ write the idiom with main(argv0, inpath, outpath); I just haven't seen it that way. Scripts that explode argv always seem to do *argv[1:], while those that use it as a list usually seem to do all of argv.

Ben Finney wrote:
But you hardly ever want to process argv[0] the same way as the rest of the arguments, so you end up treating it as a special case anyway. It seems to me we only think of it as a command line argument because C traditionally presents it that way. I don't think it's something that would naturally come to mind otherwise. I know I found it quite surprising when I first encountered it. -- Greg

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
This isn't about *processing* that argument; it's about *receiving* it in the first place to the function. Having it omitted by default means there's a special case just to *get at* the first command-line argument:: def __main__(argv_without_first_arg=None): if argv is None: argv = sys.argv[1:] first_arg = sys.argv[0] Which still sucks, because how do I then pass a different command name to ‘__main__’ since it now expects to get it from elsewhere? Much better to have the interface just accept the *whole* sequence of command line arguments: def __main__(argv=None): if argv is None: argv = sys.argv Now it's completely up to the caller what the command-line looks like, which means the ‘__main__’ code needs no special cases for using the module as a library or for unit tests etc. You just construct the command-line as you need it to look, and pass it in to the function.
It seems to me we only think of it as a command line argument because C traditionally presents it that way.
What C does isn't relevant here. I think of the whole command line as a sequence of arguments because that's how the program receives the command line from the Python interpreter. Mangling it further just makes a common use case more difficult for no good reason.
Many useful things are surprising when one first encounters them :-) -- \ “I don't accept the currently fashionable assertion that any | `\ view is automatically as worthy of respect as any equal and | _o__) opposite view.” —Douglas Adams | Ben Finney

On 25 November 2013 03:58, Ben Finney <ben+python@benfinney.id.au> wrote:
The name of the script is not an argument *to* the script. Having it there in the first place is the special case, not removing it. It's only an old C convention (and now an old Python convention) that makes you think it is. Michael Foord
-- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html

Michael Foord <fuzzyman@gmail.com> writes:
No, the command name is part of the command line arguments, as I pointed out earlier. That is, when any program is started with:: foo bar baz the command line arguments are “foo”, “bar”, “baz”. That says nothing about what those arguments *mean* yet; they're all available to the program, for it to figure out their significance.
It's only an old C convention (and now an old Python convention) that makes you think it is.
Not at all. Giving the command-line arguments to the program as a sequence of strings is a standard cross-language interface; the operating system hands the command line to the running program without caring what language it was implemented in. This is all prior to talking about what those arguments *mean*; the discussion of “‘foo’ is a command, ‘bar’ and ‘baz’ are its parameters” all comes afterward and is irrelevant to the process of *getting at* the command line. If the program's main code wants to discard the first argument, as many programs do for good reasons, that's up to the programmer to decide explicitly. Many other programs make use of the whole command line, and ty should not need some different way to get at the contents of the command line. If we're going to make the command line sequence a parameter to the main code, there should be one interface, no special cases. As it stands, that conventional interface in Python code is:: def main(argv=None): if argv is None: argv = sys.argv and then it's up to the rest of the ‘main’ function (or however it's spelled) to process the full command line sequence that was received. So, while the name “argv” is a C convention, the handling of the command line as a homogeneous sequence of strings is language-agnostic. Automatically discarding the first argument, on the assumption that the program doesn't care about it, is making a false assumption in many cases and makes a common use case needlessly difficult. -- \ “We must find our way to a time when faith, without evidence, | `\ disgraces anyone who would claim it.” —Sam Harris, _The End of | _o__) Faith_, 2004 | Ben Finney

Ben Finney wrote:
If you're talking about doing different things based on argv[0], I wouldn't call it a *common* use case. The last time I saw it done was on an early version of SunOS that didn't have shared libraries, so they linked all the gui tools into one big executable to reduce disk and memory usage. Now that we have shared libraries, there's much less need for that kind of trick. -- Greg

On Tue, Nov 26, 2013 at 10:43 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Upstart has a set of symlinks to initctl called "start", "stop", "reload", etc. They're shortcuts for "initctl start", "initctl stop", etc. Also, I've often fetched up argv[0] as part of a usage message, which isn't strictly "doing different things", but it does mean that renaming the program won't leave an old name inside its help message. ChrisA

Greg Ewing <greg.ewing@canterbury.ac.nz> writes:
The common use case I'm referring to is to use the program name in output messages (e.g. help, errors) without needing to change the code when the program file is renamed, or when a different command line is constructed by the caller. Doing different things based on how the program is invoke is another common use case, yes. Both of these use cases argue for retaining the full command line sequence (or whatever replacement command line sequence the caller chooses to construct) as input to the main code, and allow the main code to decide which parts are important. -- \ “I am too firm in my consciousness of the marvelous to be ever | `\ fascinated by the mere supernatural …” —Joseph Conrad, _The | _o__) Shadow-Line_ | Ben Finney

Greg Ewing writes:
busybox is still standard in many Linux distros, though I don't really know why (embedded systems, "small" rescue media?), and surely you've seen fgrep/grep/egrep (hardlinked to the same file on Mac OS X as of "Snow Leopard"), even if you personally use POSIX-standard "grep -F" and "grep -E". Linking /bin/sh to bash or zsh is a common trick on GNU systems, and typically invokes strict POSIX conformance (well, as strictly as any GNU program ever conforms to another organization's standard :-( ). So I rather suspect you "see" it frequently, even today. You just don't recognize it when you see it. Could we do without such trickery? Sure. However, the point about usage messages still stands: it's useful to fetch the actual command line token used to invoke the program, because program names and invocation methods do change.

Hi! On Tue, Nov 26, 2013 at 12:43:25PM +1300, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
sys.argv[0] is used for: 1) Setup sys.path; something like lib_dir = os.path.dirname(sys.argv[0]) + os.sep + 'lib') sys.path.append(lib_dir) 2) Setup relative path(s) (to start a helper script, e.g.); something like Subprocess("%s/Robots/sub.py" % os.path.dirname(sys.argv[0])) 3) Report usage: sys.stderr.write("Usage: %s [-o|--old]\n" % sys.argv[0]) 4) Change behaviour based on the script's name. I am -1 on removing sys.argv[0] from main(argv). Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On Fri, Nov 22, 2013 at 10:02:56PM +0100, Philipp A. wrote:
we’re all accustomed to it, but objectively, it’s horribly implicit and unobvious.
Certainly you are correct that it is unobvious, but the "if __name__" idiom is anything but implicit. It's the opposite, you are *explicitly* testing whether the module is being run as the main module (__name__ == "__main__") and if so, you explicitly run some code. Of course you can also run code only when *not* the main module: if __name__ != '__main__': print "Module is being imported" else: print "Module is being executed" And you aren't limited to a single "main function", you can dot such tests all throughout your code, including inside functions. Aside: perhaps it would have been better to have an explicit ismain() function that returns True when running as the main module, that's more discoverable. -- Steven

On Fri, Nov 22, 2013 at 9:07 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I do think it's a bad idea. It would be like replacing a builtin module in sys.modules, which is inadvisable (particularly key ones like sys). Like builtins, __main__ is a special module, one created during interpreter startup. It plays a special part in the REPL. Various parts of the stdlib have special-casing for __main__, which could be affected by replacement. Replacing __main__ in sys.modules is, to me, just as inadvisable as replacing sys. The catch is that a script is exec'ed into the __main__ module's namespace, so during execution (nearly) all the import-related attributes relate to __main__. In contrast, the equivalent module from the same file would be loaded into its own namespace, with its own import-related attributes, and cached independently at sys.modules[module_name]. This duality causes all sorts of grief (PEP 395 is a response to some of the pain points). A key hangup is that __name__ is different depending on run-as-script or imported-as-module. That brings us back to the idea of a more formal replace-module-in-sys-modules API. Any solution to that which uses __name__ to determine the module's name has to take into account that it may have been run as a script (where __name__ will be "__main__"). If we simply used __name__ staight up we might end up replacing __main__ in sys.modules, which I suggest is a bad idea. Hence the point of special-casing __main__. Sorry I wasn't clear. Hopefully this was more so. -eric

On 24 November 2013 02:53, Haoyi Li <haoyi.sg@gmail.com> wrote:
The main thing that makes __main__ special is that it's a builtin module, but we then use its namespace to run Python code. Various parts of the interpreter assume that __main__ will always be the same module that was initialized during interpreter startup, so they don't have to keep re-initializing it (or checking if it has been replaced). It's not quite as intertwined with the interpreter internals as sys, since there's no direct reference to it from the interpreter state, but the case can certainly made that there *should* be such a reference if we're going to assume consistency over the the lifetime of the process. However, while I can't vouch for earlier versions, replacing __main__ in 3.3+ shouldn't cause any major issues, although it does mean certain things may not behave as expected (such as the -i switch and the PYTHONINSPECT option). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (19)
-
Andrew Barnert
-
Antoine Pitrou
-
Ben Finney
-
Chris Angelico
-
Eric Snow
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Gregory P. Smith
-
Haoyi Li
-
Mathias Panzenböck
-
Michael Foord
-
Nick Coghlan
-
Oleg Broytman
-
Philipp A.
-
random832@fastmail.us
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy