[Twisted-Python] Re: Twisted 1.0.4 - Heavy search for `plugins.tml'

Hi. This is a resend of a message which was rejected, because too big. Wise mailing list system! :-). So, I removed 1255 lines from the middle of the last transcript, showing the first 20 and the last 20 only. You'll merely have to believe me that those lines I removed were there! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

On 27 Apr 2003 17:26:32 -0400 Francois Pinard <pinard@iro.umontreal.ca> wrote:
Thank you! We will see about fixing this ASAP. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, April 27, 2003, at 04:44 PM, Itamar Shtull-Trauring wrote:
Whoooops! Since all the mailing list messages I get to OK are spam, I just scan the headers to make sure it's from someone valid. This time, that meant I OK'd a huuge message (as I rejected 8 other spams...). Smart mailing list systems don't help if you have dumb mailing list admin.
Thank you! We will see about fixing this ASAP.
This is a subtle issue... right now we're going to fix it by hard-coding /usr/bin and /usr/local/bin as paths that should not be checked, but the real problem is an interaction between Python's default path behavior, UNIX's filesystem layout, and the way twisted.python.plugins works. Itamar and I had a few ideas for fixing it "for real" but the caveats that we would have to put in the documentation are so ugly that we discarded them. Hard coding '/usr/bin' should fix most cases. While we were in there, we also removed all paths with a '.' in them: since these can't be valid Python packages anyway, it doesn't affect its correctness, and it has the benefit of making mktap at least 30% faster, even without filtering /usr/bin. Thanks for bringing 'strace mktap' to our attention. I'm sure everybody will appreciate mktap being much less sluggish :). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin) iD8DBQE+rFnbvVGR4uSOE2wRAg4VAJ9GaUH7G0/8pSIzwPBkn5Do2ePw/ACeNHLf 4XcMqTTBAvo+gReI4HfjJ6E= =YDQQ -----END PGP SIGNATURE-----

On 2003.04.27 17:26, Francois Pinard wrote:
It's not using the shell load path. /usr/bin just happens to be in your sys.path.
No, it is not. How can you be so assertive?
Because I know :-) The default sys.path isn't there, but your `mktap' is located in /usr/bin, isn't it? :-) You probably thought I was being over-assertive because I wasn't explaining myself fully. I apologize for that. But I did the research, and found out why this was happening. (more below)
Stated where? The book does not clearly indicate (or I did not read that part yet) where `.tml' files are exactly looked for.
The "Writing Twisted Plugins" section, IIRC.
Trolls like this aren't necessary. If you have a suggestion for improvement (a special-case for leaving out /usr/bin despite its being in the python path, for example), then I'll gladly hear it out. But I'm fairly certain that there's no debate on what the *current* behavior is.
OK then. I am now reporting a bug. The default Python path for this Python 2.2.1, which I installed over a SuSE 8.0 Linux distribution, is:
As I stated above, /usr/bin is in your sys.path because that's probably where your `mktap' is located. If you don't believe me, modify your local Twisted installation to add a `print sys.path' to twisted.scripts.mktap.
Yeah, I examined the output of that strace that you quoted in your original mail; I discovered, as I had expected, that /usr/bin was in my python path because that's where `mktap' is located. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/

[Christopher Armstrong]
The default sys.path isn't there, but your `mktap' is located in /usr/bin, isn't it? :-)
It is. Twisted was installed using `root', and with --prefix=/usr.
And inventing a new environment variable is contrary to the stated goal, which is the lack of a registration process [...]
Stated where? The book does not clearly indicate (or I did not read that part yet) where `.tml' files are exactly looked for.
The "Writing Twisted Plugins" section, IIRC.
I have "3.3 - Writing a New Plug-In for Twisted", and I presume this is what you are referring to. I read that section many times, and once more before sending the above question, and once more now. The documentation says that: Twisted finds its plug-ins by using pre-existing Python concepts; the load path, and packages. Every top-level Python package[4] [...] can potentailly contain some number of plug-ins. [...] The only difference between a package and a drop-in is the existence of a file named `plugins.tml' [...] that contains some special Python expression to identify the location of sub-packages or modules which can be loaded. This is the most precise description I could find. Maybe there is a linguistic barrier (I'm not an English speaker), but to me, this is a bit generic. It is not said that plug-ins _are_ `plugins.tml' files, nor that those TML files are necessarily at the same place the plug-ins are, (like next to `__init__.py', say), only their existence is asserted. It is not so clearly stated either that this is a goal that TML files shall never be themselves registered. Maybe you could use my naive reading as a hint that the documentation could be more precise on these points, as other readers might have the same interpretation difficulties than me.
[...] Being receptive to feedback is a good way to keep it coming, you know.
Trolls like this aren't necessary.
Hopefully, you'll come to know be better, and discover that I never troll. I am a peaceful and joyful man, and try to speak in all honesty, always. Oh, I may tease in a friendly way, but then, there are smileys not far.
If you have a suggestion for improvement [...] then I'll gladly hear it out.
For now, being a Twisted newcomer that knows very little about it, my contributions may be timid for a good while. I'm just sharing the mere existence of problems I see with my naive eyes, despite I know that in the area of free software, patches and precise suggestions are usually quite welcome. I hope acquiring a bit of competence, and become a better contributor with the time passing. Let me apologise for all the false notes I may sing in the meantime.
If you don't believe me, modify your local Twisted installation to add a `print sys.path' to twisted.scripts.mktap.
I'm quite willing to believe you, you know! :-)
[...] /usr/bin is in your sys.path because that's probably where your `mktap' is located.
Is Twisted modifying `sys.path' to include the first part of argv[0]? Or is it standard Python behaviour? I never noticed this yet in my own scripts, but it might just have escaped my attention, and I might not know. -- François Pinard http://www.iro.umontreal.ca/~pinard

On 2003.04.27 18:46, Francois Pinard wrote:
Yeah, it's a common misconception about Python, actually. Yes, Python always adds the location of the main script to sys.path (the misconception is that people think it's "the current directory" rather than "the location of the main script", because people so often run their main scripts from the directory that they're located in. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/

[Christopher Armstrong]
Is Twisted modifying `sys.path' to include the first part of argv[0]? Or is it standard Python behaviour?
Yes, Python always adds the location of the main script to sys.path [...]
Thanks for enlightening me. For the case of the heaviness for `plugins.tml', at least on Unix, there might be something we could do that would be reasonable on average, but I'm not fully sure, I'll merely let judge those who are grown up! :-) Usually, `/usr/bin/' and such do not have subdirectories, so if a directory contains 1000 entries, one may immediately check if one of these is `plugins.tml' by a direct try. Then, if the number of links of that directory -- itself -- is exactly 2, we can conclude that it has no sub-directories, and that there is nothing more to check for this one. We could then get more speed even for directories other than `/usr/bin/'. This trick is heavily used in GNU find as a way to avoid `stat'ing all entries in a directory to find sub-directories, when it can be proven in advance there is no such sub-directories. When the trick was added, the speed-up that resulted was judged spectacular. But I do not remember if Linux existed at the time. As Linux seems to beat many other systems at properly caching disk accesses, the benefits of the trick might be more hidden. A year ago, maybe, I tried the same trick on Linux in hope of increasing the performance of `os.path.walk', and the speed-up was not so significant. This is part of my hesitation. Moreover, maybe the above trick is meaningless on Windows or on MacOS. I would be surprised if there are Unix filesystems where the link count of 2 for directories (`/' excepted) is not dependable; this might be another thing to check. With some luck, GNU find sources would tell us what the problems may be. The maintainers surely met them all! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

On Sun, Apr 27, 2003 at 08:28:48PM -0400, Francois Pinard wrote:
The important thing is to make that optimization disabled if link count != 2. That is, in any other case you really need to go through the stat calls.
A Linux box with enough memory should have really good stat performance with a hot cache. For /usr/bin, it may be that the cache is mostly hot anyway. But even Linux will suffer the same performance problems with a cold cache or too little RAM to cache much. -- :(){ :|:&};:

[Tommi Virtanen]
[Francois Pinard]
If I remember well, GNU `find' keeps the link count for a given directory. Every time it stats an entry which is a sub-directory, it decreases the link count. When the link count reaches 2, the loop is exited (provided `stat' calls are only needed for directory recursion -- `find' sometimes need them for other purposes).
One may consider that Twisted may also be used on lesser Unices than Linux -- I'm not going to name any! :-) -- where the performance hit might hurt more. As a good rule, I think one should not overly rely on various systems' disk cache, and be on the parsimonious side while accessing disks. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Sun, 27 Apr 2003 19:20:57 -0400 Christopher Armstrong <radix@twistedmatrix.com> wrote:
Of course, mktap also adds "the current directory" (".") to sys.path. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting

[Christopher Armstrong]
Interesting. Thanks! -- François Pinard http://www.iro.umontreal.ca/~pinard

On 27 Apr 2003 17:26:32 -0400 Francois Pinard <pinard@iro.umontreal.ca> wrote:
Thank you! We will see about fixing this ASAP. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sunday, April 27, 2003, at 04:44 PM, Itamar Shtull-Trauring wrote:
Whoooops! Since all the mailing list messages I get to OK are spam, I just scan the headers to make sure it's from someone valid. This time, that meant I OK'd a huuge message (as I rejected 8 other spams...). Smart mailing list systems don't help if you have dumb mailing list admin.
Thank you! We will see about fixing this ASAP.
This is a subtle issue... right now we're going to fix it by hard-coding /usr/bin and /usr/local/bin as paths that should not be checked, but the real problem is an interaction between Python's default path behavior, UNIX's filesystem layout, and the way twisted.python.plugins works. Itamar and I had a few ideas for fixing it "for real" but the caveats that we would have to put in the documentation are so ugly that we discarded them. Hard coding '/usr/bin' should fix most cases. While we were in there, we also removed all paths with a '.' in them: since these can't be valid Python packages anyway, it doesn't affect its correctness, and it has the benefit of making mktap at least 30% faster, even without filtering /usr/bin. Thanks for bringing 'strace mktap' to our attention. I'm sure everybody will appreciate mktap being much less sluggish :). -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (Darwin) iD8DBQE+rFnbvVGR4uSOE2wRAg4VAJ9GaUH7G0/8pSIzwPBkn5Do2ePw/ACeNHLf 4XcMqTTBAvo+gReI4HfjJ6E= =YDQQ -----END PGP SIGNATURE-----

On 2003.04.27 17:26, Francois Pinard wrote:
It's not using the shell load path. /usr/bin just happens to be in your sys.path.
No, it is not. How can you be so assertive?
Because I know :-) The default sys.path isn't there, but your `mktap' is located in /usr/bin, isn't it? :-) You probably thought I was being over-assertive because I wasn't explaining myself fully. I apologize for that. But I did the research, and found out why this was happening. (more below)
Stated where? The book does not clearly indicate (or I did not read that part yet) where `.tml' files are exactly looked for.
The "Writing Twisted Plugins" section, IIRC.
Trolls like this aren't necessary. If you have a suggestion for improvement (a special-case for leaving out /usr/bin despite its being in the python path, for example), then I'll gladly hear it out. But I'm fairly certain that there's no debate on what the *current* behavior is.
OK then. I am now reporting a bug. The default Python path for this Python 2.2.1, which I installed over a SuSE 8.0 Linux distribution, is:
As I stated above, /usr/bin is in your sys.path because that's probably where your `mktap' is located. If you don't believe me, modify your local Twisted installation to add a `print sys.path' to twisted.scripts.mktap.
Yeah, I examined the output of that strace that you quoted in your original mail; I discovered, as I had expected, that /usr/bin was in my python path because that's where `mktap' is located. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/

[Christopher Armstrong]
The default sys.path isn't there, but your `mktap' is located in /usr/bin, isn't it? :-)
It is. Twisted was installed using `root', and with --prefix=/usr.
And inventing a new environment variable is contrary to the stated goal, which is the lack of a registration process [...]
Stated where? The book does not clearly indicate (or I did not read that part yet) where `.tml' files are exactly looked for.
The "Writing Twisted Plugins" section, IIRC.
I have "3.3 - Writing a New Plug-In for Twisted", and I presume this is what you are referring to. I read that section many times, and once more before sending the above question, and once more now. The documentation says that: Twisted finds its plug-ins by using pre-existing Python concepts; the load path, and packages. Every top-level Python package[4] [...] can potentailly contain some number of plug-ins. [...] The only difference between a package and a drop-in is the existence of a file named `plugins.tml' [...] that contains some special Python expression to identify the location of sub-packages or modules which can be loaded. This is the most precise description I could find. Maybe there is a linguistic barrier (I'm not an English speaker), but to me, this is a bit generic. It is not said that plug-ins _are_ `plugins.tml' files, nor that those TML files are necessarily at the same place the plug-ins are, (like next to `__init__.py', say), only their existence is asserted. It is not so clearly stated either that this is a goal that TML files shall never be themselves registered. Maybe you could use my naive reading as a hint that the documentation could be more precise on these points, as other readers might have the same interpretation difficulties than me.
[...] Being receptive to feedback is a good way to keep it coming, you know.
Trolls like this aren't necessary.
Hopefully, you'll come to know be better, and discover that I never troll. I am a peaceful and joyful man, and try to speak in all honesty, always. Oh, I may tease in a friendly way, but then, there are smileys not far.
If you have a suggestion for improvement [...] then I'll gladly hear it out.
For now, being a Twisted newcomer that knows very little about it, my contributions may be timid for a good while. I'm just sharing the mere existence of problems I see with my naive eyes, despite I know that in the area of free software, patches and precise suggestions are usually quite welcome. I hope acquiring a bit of competence, and become a better contributor with the time passing. Let me apologise for all the false notes I may sing in the meantime.
If you don't believe me, modify your local Twisted installation to add a `print sys.path' to twisted.scripts.mktap.
I'm quite willing to believe you, you know! :-)
[...] /usr/bin is in your sys.path because that's probably where your `mktap' is located.
Is Twisted modifying `sys.path' to include the first part of argv[0]? Or is it standard Python behaviour? I never noticed this yet in my own scripts, but it might just have escaped my attention, and I might not know. -- François Pinard http://www.iro.umontreal.ca/~pinard

On 2003.04.27 18:46, Francois Pinard wrote:
Yeah, it's a common misconception about Python, actually. Yes, Python always adds the location of the main script to sys.path (the misconception is that people think it's "the current directory" rather than "the location of the main script", because people so often run their main scripts from the directory that they're located in. -- Twisted | Christopher Armstrong: International Man of Twistery Radix | Release Manager, Twisted Project ---------+ http://twistedmatrix.com/users/radix.twistd/

[Christopher Armstrong]
Is Twisted modifying `sys.path' to include the first part of argv[0]? Or is it standard Python behaviour?
Yes, Python always adds the location of the main script to sys.path [...]
Thanks for enlightening me. For the case of the heaviness for `plugins.tml', at least on Unix, there might be something we could do that would be reasonable on average, but I'm not fully sure, I'll merely let judge those who are grown up! :-) Usually, `/usr/bin/' and such do not have subdirectories, so if a directory contains 1000 entries, one may immediately check if one of these is `plugins.tml' by a direct try. Then, if the number of links of that directory -- itself -- is exactly 2, we can conclude that it has no sub-directories, and that there is nothing more to check for this one. We could then get more speed even for directories other than `/usr/bin/'. This trick is heavily used in GNU find as a way to avoid `stat'ing all entries in a directory to find sub-directories, when it can be proven in advance there is no such sub-directories. When the trick was added, the speed-up that resulted was judged spectacular. But I do not remember if Linux existed at the time. As Linux seems to beat many other systems at properly caching disk accesses, the benefits of the trick might be more hidden. A year ago, maybe, I tried the same trick on Linux in hope of increasing the performance of `os.path.walk', and the speed-up was not so significant. This is part of my hesitation. Moreover, maybe the above trick is meaningless on Windows or on MacOS. I would be surprised if there are Unix filesystems where the link count of 2 for directories (`/' excepted) is not dependable; this might be another thing to check. With some luck, GNU find sources would tell us what the problems may be. The maintainers surely met them all! :-) -- François Pinard http://www.iro.umontreal.ca/~pinard

On Sun, Apr 27, 2003 at 08:28:48PM -0400, Francois Pinard wrote:
The important thing is to make that optimization disabled if link count != 2. That is, in any other case you really need to go through the stat calls.
A Linux box with enough memory should have really good stat performance with a hot cache. For /usr/bin, it may be that the cache is mostly hot anyway. But even Linux will suffer the same performance problems with a cold cache or too little RAM to cache much. -- :(){ :|:&};:

[Tommi Virtanen]
[Francois Pinard]
If I remember well, GNU `find' keeps the link count for a given directory. Every time it stats an entry which is a sub-directory, it decreases the link count. When the link count reaches 2, the loop is exited (provided `stat' calls are only needed for directory recursion -- `find' sometimes need them for other purposes).
One may consider that Twisted may also be used on lesser Unices than Linux -- I'm not going to name any! :-) -- where the performance hit might hurt more. As a good rule, I think one should not overly rely on various systems' disk cache, and be on the parsimonious side while accessing disks. -- François Pinard http://www.iro.umontreal.ca/~pinard

On Sun, 27 Apr 2003 19:20:57 -0400 Christopher Armstrong <radix@twistedmatrix.com> wrote:
Of course, mktap also adds "the current directory" (".") to sys.path. -- Itamar Shtull-Trauring http://itamarst.org/ http://www.zoteca.com -- Python & Twisted consulting

[Christopher Armstrong]
Interesting. Thanks! -- François Pinard http://www.iro.umontreal.ca/~pinard
participants (5)
-
Christopher Armstrong
-
Francois Pinard
-
Glyph Lefkowitz
-
Itamar Shtull-Trauring
-
Tommi Virtanen