[niemeyer@conectiva.com: Re: [Python-Dev] Python's footprint]
Hi everyone!
Now that 2.2 is history (well, kind of ;-), would it be the time to
think about this again?
Thank you!
----- Forwarded message from Gustavo Niemeyer
It means that about 10% of python's executable is documentation. [...] Anyways, that sounds like a useful idea. It would probably be a big patch that touches lots of files, so it's unlikely to get into Python 2.2. You might consider whipping up a patch now to get it under consideration early in 2.3's life-cycle.
Ok. The patch is ready (attached). It's very simple. Just introducing two new macros: Py_DOCSTR() to be used in usual doc strings, and WITH_DOC_STRINGS, for more complex ones (sys module's doc string comes into my mind). I'd just like to know the moment when it is going to be applied, so I can change every documentation string accordingly and submit the patch. I could do this right now, for sure. But if it's going to be applied just for 2.3, the patch will certainly be broken at that time. Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ] --- Python-2.2.orig/pyconfig.h.in Wed Nov 14 17:54:31 2001 +++ Python-2.2/pyconfig.h.in Wed Nov 14 19:08:08 2001 @@ -765,3 +765,13 @@ #define STRICT_SYSV_CURSES /* Don't use ncurses extensions */ #endif +/* Define if you want to have inline documentation. */ +#undef WITH_DOC_STRINGS + +/* Define macro for inline documentation. */ +#ifdef WITH_DOC_STRINGS +#define Py_DOCSTR(x) x +#else +#define Py_DOCSTR(x) "" +#endif + --- Python-2.2.orig/configure.in Wed Nov 14 17:54:31 2001 +++ Python-2.2/configure.in Wed Nov 14 19:20:07 2001 @@ -1305,6 +1305,20 @@ fi AC_MSG_RESULT($with_cycle_gc) +# Check for --with-doc-strings +AC_MSG_CHECKING(for --with-doc-strings) +AC_ARG_WITH(doc-strings, +[ --with(out)-doc-strings disable/enable documentation strings]) + +if test -z "$with_doc_strings" +then with_doc_strings="yes" +fi +if test "$with_doc_strings" != "no" +then + AC_DEFINE(WITH_DOC_STRINGS) +fi +AC_MSG_RESULT($with_doc_strings) + # Check for Python-specific malloc support AC_MSG_CHECKING(for --with-pymalloc) AC_ARG_WITH(pymalloc, ----- End forwarded message ----- -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Now that 2.2 is history (well, kind of ;-), would it be the time to think about this again?
By "consideration early in 2.3's life cycle", the OP probably meant that a patch should be posted to SF. Are you willing to implement the complete change (i.e. create a patch that changes each and every source file)? If so, please post one to SF. You may want to start this slowly, first creating only the infrastructure and touching a single file (say, stringobject.c) I'd personally like to see opportunities for more magic used. E.g. in a compiler that uses sections, putting all doc strings into a single section might be desirable. They will be a contiguous fragment of the python executable, which helps on demand-paged systems to reduce the startup time. Going further, it might be possible to strip off "unused sections" from the binary after it has been linked, deferring the choice of doc string presence to the installation time. For that to work, we'd first need to know what compilers offer what syntax to implement such magic, then generalize it to the right macro. If that is a desirable goal, I'd be willing to investigate how to achieve things with gcc, on ELF systems. Regards, Martin
Hi Martin!
Now that 2.2 is history (well, kind of ;-), would it be the time to think about this again?
By "consideration early in 2.3's life cycle", the OP probably meant that a patch should be posted to SF. Are you willing to implement the complete change (i.e. create a patch that changes each and every source file)? If so, please post one to SF. You may want to start this slowly, first creating only the infrastructure and touching a single file (say, stringobject.c)
Yes, I'm going to implement it. I'd just like to know if there was interest in the patch. Implementing it slowly looks like a nice idea as well. I'll post a patch there. Thanks!
I'd personally like to see opportunities for more magic used. E.g. in a compiler that uses sections, putting all doc strings into a single section might be desirable. They will be a contiguous fragment of the python executable, which helps on demand-paged systems to reduce the startup time. Going further, it might be possible to strip off "unused sections" from the binary after it has been linked, deferring the choice of doc string presence to the installation time.
Interesting. I know it's possible to discard a session. OTOH, I don't know what happens if somebody refer to discarded data. I'll have a look at this.
For that to work, we'd first need to know what compilers offer what syntax to implement such magic, then generalize it to the right macro. If that is a desirable goal, I'd be willing to investigate how to achieve things with gcc, on ELF systems.
This is something pretty easy with gcc. When reading your email, I remembered that the kernel uses this magic to discard a session with code used just when initializing. Looking in the kernel code, I found out this in include/linux/init.h: /* * Mark functions and data as being only used at initialization * or exit time. */ #define __init __attribute__ ((__section__ (".text.init"))) #define __exit __attribute__ ((unused, __section__(".text.exit"))) #define __initdata __attribute__ ((__section__ (".data.init"))) #define __exitdata __attribute__ ((unused, __section__ (".data.exit"))) #define __initsetup __attribute__ ((unused,__section__ (".setup.init"))) #define __init_call __attribute__ ((unused,__section__ (".initcall.init"))) #define __exit_call __attribute__ ((unused,__section__ (".exitcall.exit"))) After surrounding doc strings with a macro, this will be easy to achieve. Thanks! -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo> Yes, I'm going to implement it. I'd just like to know if there Gustavo> was interest in the patch. Implementing it slowly looks like a Gustavo> nice idea as well. I'll post a patch there. Thanks! Gustavo, I recommend you do the whole patch thing through SourceForge. Just post a link to your patch to python-dev. Skip
#define __init __attribute__ ((__section__ (".text.init"))) [...] After surrounding doc strings with a macro, this will be easy to achieve.
Unfortunately, not with the doc string you propose. Apparently, your macro is going to be used as char foo__doc__[] = Py_DocString("this is foo"); However, with the attribute, the resulting code should read char foo__doc__[] __attribute__((__section__("docstring")) = "this is foo"; You cannot define the macro so that it comes out as expanding to __attribute__, atleast not with that specific macro. Regards, Martin
Martin v. Loewis wrote:
#define __init __attribute__ ((__section__ (".text.init")))
[...]
After surrounding doc strings with a macro, this will be easy to achieve.
Unfortunately, not with the doc string you propose. Apparently, your macro is going to be used as
char foo__doc__[] = Py_DocString("this is foo");
However, with the attribute, the resulting code should read
char foo__doc__[] __attribute__((__section__("docstring")) = "this is foo";
You cannot define the macro so that it comes out as expanding to __attribute__, atleast not with that specific macro.
Why don't you use macro which only takes the name of the static array and the doc-string itself as argument ? This could then be expanded to whatever needs to be done for a particular case/platform, e.g. Py_DefineDocString(foo__doc__, "foo does bar"); (I use such an approach in the mx stuff and it works great.) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Why don't you use macro which only takes the name of the static array and the doc-string itself as argument ? This could then be expanded to whatever needs to be done for a particular case/platform, e.g.
Py_DefineDocString(foo__doc__, "foo does bar");
(I use such an approach in the mx stuff and it works great.)
Yes, it's a nice idea! I'm looking for some way to "discard" the string using a macro. Let me explain with code: [...] #define Py_DOCSTR(name, str) static char *name = str #ifdef WITH_DOC_STRINGS #define Py_DOCSTR_START(name) Py_DOCSTR(name,) #define Py_DOCSTR_END ; #else #define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what follows somehow */ #define Py_DOCSTR_END /* Stop discarding */ #endif [...] This would make it possible to do something like this: Py_DOCSTR(simple_doc, "This is a simple doc string."); ...and also... Py_DOCSTR_START(complex_doc) "This is a complex doc string" #ifndef MS_WIN16 "like the one in sysmodule.c" #endif "Something else" Py_DOCSTR_END This seems to be the most elegant way to allow these complex strings. But unfortunately, I haven't found any way so far to do this "discarding thing", besides including another "#if" in the documentation itself. Any good ideas? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer wrote:
Why don't you use macro which only takes the name of the static array and the doc-string itself as argument ? This could then be expanded to whatever needs to be done for a particular case/platform, e.g.
Py_DefineDocString(foo__doc__, "foo does bar");
(I use such an approach in the mx stuff and it works great.)
Yes, it's a nice idea!
I'm looking for some way to "discard" the string using a macro. Let me explain with code:
[...] #define Py_DOCSTR(name, str) static char *name = str #ifdef WITH_DOC_STRINGS #define Py_DOCSTR_START(name) Py_DOCSTR(name,) #define Py_DOCSTR_END ; #else #define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what follows somehow */ #define Py_DOCSTR_END /* Stop discarding */ #endif [...]
This would make it possible to do something like this:
Py_DOCSTR(simple_doc, "This is a simple doc string.");
...and also...
Py_DOCSTR_START(complex_doc) "This is a complex doc string" #ifndef MS_WIN16 "like the one in sysmodule.c" #endif "Something else" Py_DOCSTR_END
This seems to be the most elegant way to allow these complex strings. But unfortunately, I haven't found any way so far to do this "discarding thing", besides including another "#if" in the documentation itself.
Any good ideas?
Wouldn't it be much simpler to wrap the complete Py_DOCSTR() into #ifdefs ? BTW, I don't we'll ever need to #ifdef doc-strings for platforms; you can just as well put the information for all platforms into the doc-string -- after the recipient is a human with enough non-AI to parse the doc-string into meaningful sections ;-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Wouldn't it be much simpler to wrap the complete Py_DOCSTR() into #ifdefs ?
Yes, it's going to be wrapped! I took this code out of a file I was using to show the #ifdef problem.
BTW, I don't we'll ever need to #ifdef doc-strings for platforms;
This would make things pretty easy, but note that we are *already* #ifdef'ing doc-strings for platforms. Python/sysmodule.c is an example of such.
you can just as well put the information for all platforms into the doc-string -- after the recipient is a human with enough non-AI to parse the doc-string into meaningful sections ;-)
Cool! Are we going to change the existent doc strings then? -- Gustavo Niemeyer [ 2AAC 7928 0FBF 0299 5EB5 60E2 2253 B29A 6664 3A0C ]
Gustavo Niemeyer wrote:
Wouldn't it be much simpler to wrap the complete Py_DOCSTR() into #ifdefs ?
Yes, it's going to be wrapped! I took this code out of a file I was using to show the #ifdef problem.
BTW, I don't we'll ever need to #ifdef doc-strings for platforms;
This would make things pretty easy, but note that we are *already* #ifdef'ing doc-strings for platforms. Python/sysmodule.c is an example of such.
Hmm, I wasn't aware of such doc-strings.
you can just as well put the information for all platforms into the doc-string -- after the recipient is a human with enough non-AI to parse the doc-string into meaningful sections ;-)
Cool! Are we going to change the existent doc strings then?
Well, can't speak for PythonLabs, but I don't see any benefit from making doc-string complicated by introducing #ifdefs. It doesn't buy us anything, IMHO. Even worse: it makes translating the doc-strings harder. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Well, can't speak for PythonLabs, but I don't see any benefit from making doc-string complicated by introducing #ifdefs. It doesn't buy us anything, IMHO. Even worse: it makes translating the doc-strings harder.
If there is platform-specific functionality, the docstring should document that only on the platform where it applies. --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
Well, can't speak for PythonLabs, but I don't see any benefit from making doc-string complicated by introducing #ifdefs. It doesn't buy us anything, IMHO. Even worse: it makes translating the doc-strings harder.
If there is platform-specific functionality, the docstring should document that only on the platform where it applies.
Just to make sure... I was talking about something like: open__doc__ = \ "Open the file. On Windows, the MBCS encoding is assumed, "\ "on all other systems, the file name must be given in ASCII."; vs. #ifdef MS_WINDOWS open__doc__ = \ "Open the file, assuming the filename is given in the MBCS "\ "encoding."; #else open__doc__ = \ "Open the file, assuming the filename is given in ASCII."; #endif -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
Just to make sure... I was talking about something like:
open__doc__ = \ "Open the file. On Windows, the MBCS encoding is assumed, "\ "on all other systems, the file name must be given in ASCII.";
vs.
#ifdef MS_WINDOWS open__doc__ = \ "Open the file, assuming the filename is given in the MBCS "\ "encoding."; #else open__doc__ = \ "Open the file, assuming the filename is given in ASCII."; #endif
Given the main use case for docstrings, I'd prefer the latter. The library manual should contain the "all-platforms" documentation. --Guido van Rossum (home page: http://www.python.org/~guido/)
>> If there is platform-specific functionality, the docstring should >> document that only on the platform where it applies. mal> Just to make sure... I was talking about something like: mal> open__doc__ = \ mal> "Open the file. On Windows, the MBCS encoding is assumed, "\ mal> "on all other systems, the file name must be given in ASCII."; +1 mal> vs. mal> #ifdef MS_WINDOWS mal> open__doc__ = \ mal> "Open the file, assuming the filename is given in the MBCS "\ mal> "encoding."; mal> #else mal> open__doc__ = \ mal> "Open the file, assuming the filename is given in ASCII."; mal> #endif -1 I agree w/ MAL. I happen to be developing an application on Linux right now, but I'm interested in where I might encounter problems when it migrates to Windows. I would much prefer the documentation make it eas(y|ier) to identify platform differences. This holds true for docstrings, because they are the most readily available documentation format. Skip
I agree w/ MAL. I happen to be developing an application on Linux right now, but I'm interested in where I might encounter problems when it migrates to Windows. I would much prefer the documentation make it eas(y|ier) to identify platform differences. This holds true for docstrings, because they are the most readily available documentation format.
But what about optional features that are only available on platform X? Do you really want those to clutter up the docstring on platforms where they aren't available? On the platform where they *are*, their docstring should have a "(Platform X only)" note. --Guido van Rossum (home page: http://www.python.org/~guido/)
>> I would much prefer the documentation make it eas(y|ier) to identify >> platform differences. This holds true for docstrings, because they >> are the most readily available documentation format. Guido> But what about optional features that are only available on Guido> platform X? Do you really want those to clutter up the docstring Guido> on platforms where they aren't available? On the platform where Guido> they *are*, their docstring should have a "(Platform X only)" Guido> note. Perhaps I should take a half-step back under Guido's withering stare. That's probably why I've been feeling a chill all day... ;-) I don't think it's necessary for the docstring to contain all the excruciating detail available in the library reference manual, but I think a quick help(open) at the interpreter prompt or a docstring popped up in PyCrust or other IDE-like thing should give you an indication that there are semantic differences for that function across platforms. Ideally, these differences would only be documented at the highest level they can come into play. For example, if a class or module exhibits some platform-dependency, its docstring would indicate that, not the docstring of every one of its methods. Also, consider time.strptime. It's not always available, so the time module's docstring should mention its possible absence depending on platform. On platforms where it's not supported, putting a "platform x only" note in strptime's docstring won't help much to the confused programmer wondering where to disappeared to. Of course, it's easy for me to spout platitudes here. Adjusting to such a convention will probably add a fair amount of work to somebody's already full schedule. Skip
What's the current thinking about making docstrings optional? Does everybody agree on Gustavo's patch? http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505375&group_id=5470 --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
What's the current thinking about making docstrings optional?
Does everybody agree on Gustavo's patch?
10% space saving? That doesn't seem to be worth the effort. OTOH, I'm not dealing with any platforms that are memory constrained right now. Neil
"NS" == Neil Schemenauer
writes:
>> What's the current thinking about making docstrings optional? >> Does everybody agree on Gustavo's patch? NS> 10% space saving? That doesn't seem to be worth the effort. NS> OTOH, I'm not dealing with any platforms that are memory NS> constrained right now. Personally I don't care either for the same reasons. I'll just note that what Emacs used to do (maybe it still does, I dunno), is extract all its inlined docstrings into a separate file which could be thrown away if you didn't want to pay for the bloat. All that complexity was built in a time when 300KB or so of docstrings really could make a huge difference for download times or storage resources. -Barry
"Barry A. Warsaw" wrote:
"NS" == Neil Schemenauer
writes: >> What's the current thinking about making docstrings optional? >> Does everybody agree on Gustavo's patch?
NS> 10% space saving? That doesn't seem to be worth the effort. NS> OTOH, I'm not dealing with any platforms that are memory NS> constrained right now.
Personally I don't care either for the same reasons. I'll just note that what Emacs used to do (maybe it still does, I dunno), is extract all its inlined docstrings into a separate file which could be thrown away if you didn't want to pay for the bloat. All that complexity was built in a time when 300KB or so of docstrings really could make a huge difference for download times or storage resources.
You should also consider the possibility of using the macros for translating the docs-strings. They are a form of markup. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
You should also consider the possibility of using the macros for translating the docs-strings. They are a form of markup.
While that is true, most of the current strings are marked-up already, by means of having an __doc__ suffix. I have an extractor that understands this form of markup, and the Python .pot file in CVS has those strings extracted. Regards, Martin
Guido van Rossum wrote:
What's the current thinking about making docstrings optional?
Does everybody agree on Gustavo's patch?
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505375&group_id=5470
+1. This will help Python embedders and porters to embedded systems a lot. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
On Friday, January 18, 2002, at 04:23 , M.-A. Lemburg wrote:
Guido van Rossum wrote:
What's the current thinking about making docstrings optional?
Does everybody agree on Gustavo's patch?
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=505375&group_id= 5470
+1.
This will help Python embedders and porters to embedded systems a lot.
+1. Same reasoning.
--
- Jack Jansen
The following is the solution that comes to mind for me. My other idea was creating a static char* or a static function with the char* inside it, in the hopes it would be discarded as unused, but gcc doesn't seem to do that. Seems to me that compared to this, rewriting those docstrings that are victim of preprocessor definitions already is certainly better for readability of the docstrings in the source code... Jeff Epler jepler@inetnebr.com On Mon, Jan 14, 2002 at 09:30:53AM -0200, Gustavo Niemeyer wrote:
I'm looking for some way to "discard" the string using a macro. Let me explain with code:
[...] #define Py_DOCSTR(name, str) static char *name = str #ifdef WITH_DOC_STRINGS #define Py_DOCSTR_START(name) Py_DOCSTR(name,) #define Py_DOCSTR_END ; #define Py_DOCSTR_PART(s) s #else #define Py_DOCSTR_START(name) Py_DOCSTR(name, ""); /* Also discards what follows somehow */ #define Py_DOCSTR_END /* Stop discarding */ #define Py_DOCSTR_PART(s) /* (nothing) */ #endif [...]
This would make it possible to do something like this:
Py_DOCSTR(simple_doc, "This is a simple doc string.");
...and also...
Py_DOCSTR_START(complex_doc) Py_DOCSTR_PART( "This is a complex doc string") #ifndef MS_WIN16 Py_DOCSTR_PART( "like the one in sysmodule.c") #endif Py_DOCSTR_PART( "Something else") Py_DOCSTR_END
participants (9)
-
barry@zope.com
-
Guido van Rossum
-
Gustavo Niemeyer
-
Jack Jansen
-
jepler@inetnebr.com
-
M.-A. Lemburg
-
Martin v. Loewis
-
Neil Schemenauer
-
Skip Montanaro