Make Difflib example callable as module __main__

Hey, I was reading over the difflib docs this morning and when I got to the bottom, I expected, probably due to lack of coffee, that the example would be callable as the module from the command line. There are already a number of modules which export command line functionality, ie. unittest, and I thought it would be great if difflib module offered the same. The code is pretty much there in the example from the documentation. It would just need to be included in the module itself. --Dan

On 2/22/2012 10:49 AM, Dan Colish wrote:
This is slightly garbled, but after looking, I see what you mean. As the doc says, the 'example' is available as Tools/Scripts/diff. Tools/Scripts/ndiff is another command-line front end for difflib. I believe difflib was extracted from the original version of ndiff.
If you run difflib directly, it runs difflib._test. which runs a doctest on difflib. Most modules do something similar. Having a real command-line interface in the module itself is unusual.
The code is pretty much there in the example from the documentation. It would just need to be included in the module itself.
I don't immediately see it as worth the trouble. I bet someone somewhere has a script that uses the interface in its current location. -- Terry Jan Reedy

On 2/22/12 1:40 PM, Terry Reedy wrote:
Yes, I realized shortly after sending how unintelligible that sounded. Yes, even thought those tools exist, they are not installed as part of the Python build.
Oh, I was unaware of that behavior. That's really good to know. Is this behavior documented?
I didn't think it would be that much trouble. It would be simple to install the scripts from Tools/Scripts. Either way I liked the idea of providing a cli frontend to difflib as part of the python install. --Dan

On Thu, Feb 23, 2012 at 7:40 AM, Terry Reedy <tjreedy@udel.edu> wrote:
That's largely a historical artifact though - prior to -m direct execution was a pain, so the only time it really happened was in a source checkout during development. (plus I don't believe regrtest always had selective test execution, so run the library directly was a good way to only run some of the tests). If there's useful functionality that can be provided via -m, I'm a fan of moving tests out of the way to make room for it (it's also a good opportunity to make sure regrtest is covering whatever __main__ execution tests). I think there's also an open tracker issue suggesting the creation of a dedicated section in the standard library docs that summarises all the modules that offer useful -m functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-02-22, at 23:08 , Nick Coghlan wrote:
Last time this popped up, Raymond Hettinger noted undocumented command-line interfaces to stdlib modules are mostly intentional: http://mail.python.org/pipermail/docs/2011-February/003171.html Maybe things have changed since, at the time the sentiment Raymond expressed was pretty much "not going to happen". But if you want a list, there's one at http://www.reddit.com/r/Python/comments/fofan/suggestion_for_a_python_blogge... Though things may have changed since and it's for Python 2, it's a starting point.

On Thu, Feb 23, 2012 at 8:24 AM, Masklinn <masklinn@masklinn.net> wrote:
In my view, the most important points in Raymond's email are the first and the last: * Many of the undocumented command-line interfaces are intentionally undocumented -- they were there for the convenience of the developer for exercising the module as it was being developed and are not part of the official API. Most are not production quality and would have been done much differently if that had been the intent. * All that being said, there are some exceptions and it make may sense to document the interface in some where we really do want a command-line app. I'll look at any patches you want to submit, but try to not go wild turning the library into a suite of applications. For the most part, that is not what the standard library is about. What I'm envisioning is a dedicated section along the lines of X. Command Line Functionality in the Standard Library X.1 Supported Command Line Interfaces This section would list modules that provide a command line interface as detailed in the module documentation. A brief description would be given here, along with a link to the relevant section of the module docs. It would mainly consist of Python specific utilities for dumping diagnostic information about the interpreter's own state or analysing Python programs. Any CLIs in this section should also have associated unittests in their regression test suites. Interpreter Diagnostics - site - platform - locale Execution and Analysis of Python Code - runpy - unittest - doctest - pydoc - timeit - dis - tokenize - pdb - profile - pstats - modulefinder X.2 Unsupported Command Line Interfaces This section would list modules that offer command line functionality that is *not* designed to be production quality, but rather exists primarily as an interactive testing tool for sanity checking when working on the modules themselves. The only documentation of the functionality would be the brief descriptions here and the module's own interactive help (if any). It should be made clear that these interfaces are *not* covered by the regression test suite and they may break without warning. All the simple cross-platform file processing, networking and protocol handling utilities would be listed here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/22/12 4:27 PM, Nick Coghlan wrote:
That sounds like a good guide to getting started. I like the idea of only supporting modules which help with python development. I am also wondering if libraries which are not going to be supported should have their cli removed? I've come around to see difflib is probably not that critical for that since we're all using hg these days. Finally, I tried a number of searches in the bug tracker to see if a ticket for something like this existed and I found nothing. Nick had mentioned that a ticket might already exist? --Dan

On Thu, 23 Feb 2012 08:08:08 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 for moving self-tests to the regular test suite. Nobody, and especially not the buildbots, runs self-tests included in __main__ sections. (and, as a matter of fact, many of those may be broken without anyone noticing) Regards Antoine.

Can I put in a plea that postings to this list try to minimise the use of acronyms and jargon that may not be universally intelligible? This list is often read with interest by non-specialists such as myself. I have no idea for example what "VCS" means. Thanks Rob Cliffe On 23/02/2012 10:56, anatoly techtonik wrote:

I don't think it was an actual question, and clearly for Rob it's not a sustainable approach to be expanding acronyms on request. I'd suggest a acronym FAQ but that also isn't sustainable, and google won't always help. Status: Won't fix, maintain status quo. On Feb 23, 2012 7:25 PM, "Paul Moore" <p.f.moore@gmail.com> wrote:

On Thu, Feb 23, 2012 at 9:32 PM, Matt Joiner <anacrolix@gmail.com> wrote:
Status: Won't fix, maintain status quo.
But also, since language related discussions *will* occasional encounter domain specific discussions, people shouldn't be afraid to ask that such jargon be clarified. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 23, 2012 at 2:22 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Bazaar and Mercurial in this case. Mercurial's differ: http://selenic.com/hg/file/816211dfa3a5/mercurial/pure/bdiff.py Bazaar's: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/diff.py -- anatoly t.

Rob Cliffe wrote:
Can I put in a plea that postings to this list try to minimise the use of acronyms and jargon that may not be universally intelligible?
"Universally intelligible" is an awfully big request. There are English speakers who don't know what you mean by either "postings" or "list", since both of those are themselves jargon. (My parents, for two.) To say nothing of children or non-English speakers who may not know what "acronym" means.
This list is often read with interest by non-specialists such as myself. I have no idea for example what "VCS" means.
While I sympathise, this is a list aimed at programmers, and while non-specialists are welcome, they are not the primary audience. I think you will be better off trying to learn programmer's jargon than asking programmers not to use common, if specialised, words in their technical conversations. You wouldn't expect (say) car enthusiasts to stop using the word "torque", or doctors not to use "dialysis", just because a non-specialist might wander by and be listening in. -- Steven

I am a programmer, of some 30-odd years full-time. But that doesn't mean I understand every acronym of every specialised field under the sun. "Version Control System" instead of "VCS" is perfectly comprehensible and only takes a little longer to type. "VCS" meant nothing to me. I follow the postings on python-dev and python-ideas with keen interest. On 23/02/2012 12:31, Steven D'Aprano wrote:

On Thu, Feb 23, 2012 at 6:00 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
+1 to this advice. I don't even sympathize. I have to look up the new jargon invented by the youngsters *all the time*. But using a search engine to educate myself is much more effective than asking around. And yes, if the search engine somehow doesn't help, just ask an explanation for a specific term. Not every problem can be fixed by asking everyone else to change their behavior. This is a technical list and technical jargon will be flouted. Deal with it. -- --Guido van Rossum (python.org/~guido)

On 2/23/2012 9:35 AM, Ned Batchelder wrote:
Googling either "vcs git" or "vcs python" shows "Version Control System" clearly highlighted right on the search results page.
Googling just vcs returns as third hit "Version Control System" and a Wikipedia link. Alternatives like Verified Carbon Standard and Veterans Canteen Service are easily rejected in the context of this list ;-). -- Terry Jan Reedy

On Thu, Feb 23, 2012 at 02:24:35PM -0500, Terry Reedy wrote:
http://www.acronymfinder.com/VCS.html lists VCS at the second place. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Am 23.02.2012 11:56, schrieb anatoly techtonik:
"Every single" makes it sounds like there are dozens... Apart from that: a diff/patch algorithm is such an integral part of version control that I would *not* expect them to use difflib, but something more sophisticated/optimized/etc. Georg

Georg Brandl writes:
But Anatoly isn't talking about the algorithm. He's talking about the output, and actually, I would expect them to use something diff(1) and diff3(1) compatible for hunk-oriented changes.[1] My experience with home-grown diff functions suggests that very few produce output as good as that of diff(1), and only git seems to be an improvement (but it's not backward compatible, as the tracker/review tool maintainers regularly mention). It's true that there are better algorithms than the one used by diff(1) (such as the "patience diff" Bazaar uses, and git offers as an option), but there's no need to change the hunk format as far as I have seen, and the file headers could easily be standardized I would think. Footnotes: [1] Darcs for one allows non-hunk-based changes, specifically a token-replace patch. And there are binary diffs such as xdelta, and word diffs like wdiff, which necessarily use a different format since they are not line-oriented.

On 2/22/2012 10:49 AM, Dan Colish wrote:
This is slightly garbled, but after looking, I see what you mean. As the doc says, the 'example' is available as Tools/Scripts/diff. Tools/Scripts/ndiff is another command-line front end for difflib. I believe difflib was extracted from the original version of ndiff.
If you run difflib directly, it runs difflib._test. which runs a doctest on difflib. Most modules do something similar. Having a real command-line interface in the module itself is unusual.
The code is pretty much there in the example from the documentation. It would just need to be included in the module itself.
I don't immediately see it as worth the trouble. I bet someone somewhere has a script that uses the interface in its current location. -- Terry Jan Reedy

On 2/22/12 1:40 PM, Terry Reedy wrote:
Yes, I realized shortly after sending how unintelligible that sounded. Yes, even thought those tools exist, they are not installed as part of the Python build.
Oh, I was unaware of that behavior. That's really good to know. Is this behavior documented?
I didn't think it would be that much trouble. It would be simple to install the scripts from Tools/Scripts. Either way I liked the idea of providing a cli frontend to difflib as part of the python install. --Dan

On Thu, Feb 23, 2012 at 7:40 AM, Terry Reedy <tjreedy@udel.edu> wrote:
That's largely a historical artifact though - prior to -m direct execution was a pain, so the only time it really happened was in a source checkout during development. (plus I don't believe regrtest always had selective test execution, so run the library directly was a good way to only run some of the tests). If there's useful functionality that can be provided via -m, I'm a fan of moving tests out of the way to make room for it (it's also a good opportunity to make sure regrtest is covering whatever __main__ execution tests). I think there's also an open tracker issue suggesting the creation of a dedicated section in the standard library docs that summarises all the modules that offer useful -m functionality. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2012-02-22, at 23:08 , Nick Coghlan wrote:
Last time this popped up, Raymond Hettinger noted undocumented command-line interfaces to stdlib modules are mostly intentional: http://mail.python.org/pipermail/docs/2011-February/003171.html Maybe things have changed since, at the time the sentiment Raymond expressed was pretty much "not going to happen". But if you want a list, there's one at http://www.reddit.com/r/Python/comments/fofan/suggestion_for_a_python_blogge... Though things may have changed since and it's for Python 2, it's a starting point.

On Thu, Feb 23, 2012 at 8:24 AM, Masklinn <masklinn@masklinn.net> wrote:
In my view, the most important points in Raymond's email are the first and the last: * Many of the undocumented command-line interfaces are intentionally undocumented -- they were there for the convenience of the developer for exercising the module as it was being developed and are not part of the official API. Most are not production quality and would have been done much differently if that had been the intent. * All that being said, there are some exceptions and it make may sense to document the interface in some where we really do want a command-line app. I'll look at any patches you want to submit, but try to not go wild turning the library into a suite of applications. For the most part, that is not what the standard library is about. What I'm envisioning is a dedicated section along the lines of X. Command Line Functionality in the Standard Library X.1 Supported Command Line Interfaces This section would list modules that provide a command line interface as detailed in the module documentation. A brief description would be given here, along with a link to the relevant section of the module docs. It would mainly consist of Python specific utilities for dumping diagnostic information about the interpreter's own state or analysing Python programs. Any CLIs in this section should also have associated unittests in their regression test suites. Interpreter Diagnostics - site - platform - locale Execution and Analysis of Python Code - runpy - unittest - doctest - pydoc - timeit - dis - tokenize - pdb - profile - pstats - modulefinder X.2 Unsupported Command Line Interfaces This section would list modules that offer command line functionality that is *not* designed to be production quality, but rather exists primarily as an interactive testing tool for sanity checking when working on the modules themselves. The only documentation of the functionality would be the brief descriptions here and the module's own interactive help (if any). It should be made clear that these interfaces are *not* covered by the regression test suite and they may break without warning. All the simple cross-platform file processing, networking and protocol handling utilities would be listed here. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 2/22/12 4:27 PM, Nick Coghlan wrote:
That sounds like a good guide to getting started. I like the idea of only supporting modules which help with python development. I am also wondering if libraries which are not going to be supported should have their cli removed? I've come around to see difflib is probably not that critical for that since we're all using hg these days. Finally, I tried a number of searches in the bug tracker to see if a ticket for something like this existed and I found nothing. Nick had mentioned that a ticket might already exist? --Dan

On Thu, 23 Feb 2012 08:08:08 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
+1 for moving self-tests to the regular test suite. Nobody, and especially not the buildbots, runs self-tests included in __main__ sections. (and, as a matter of fact, many of those may be broken without anyone noticing) Regards Antoine.

Can I put in a plea that postings to this list try to minimise the use of acronyms and jargon that may not be universally intelligible? This list is often read with interest by non-specialists such as myself. I have no idea for example what "VCS" means. Thanks Rob Cliffe On 23/02/2012 10:56, anatoly techtonik wrote:

I don't think it was an actual question, and clearly for Rob it's not a sustainable approach to be expanding acronyms on request. I'd suggest a acronym FAQ but that also isn't sustainable, and google won't always help. Status: Won't fix, maintain status quo. On Feb 23, 2012 7:25 PM, "Paul Moore" <p.f.moore@gmail.com> wrote:

On Thu, Feb 23, 2012 at 9:32 PM, Matt Joiner <anacrolix@gmail.com> wrote:
Status: Won't fix, maintain status quo.
But also, since language related discussions *will* occasional encounter domain specific discussions, people shouldn't be afraid to ask that such jargon be clarified. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Feb 23, 2012 at 2:22 PM, Paul Moore <p.f.moore@gmail.com> wrote:
Bazaar and Mercurial in this case. Mercurial's differ: http://selenic.com/hg/file/816211dfa3a5/mercurial/pure/bdiff.py Bazaar's: http://bazaar.launchpad.net/~bzr-pqm/bzr/bzr.dev/view/head:/bzrlib/diff.py -- anatoly t.

Rob Cliffe wrote:
Can I put in a plea that postings to this list try to minimise the use of acronyms and jargon that may not be universally intelligible?
"Universally intelligible" is an awfully big request. There are English speakers who don't know what you mean by either "postings" or "list", since both of those are themselves jargon. (My parents, for two.) To say nothing of children or non-English speakers who may not know what "acronym" means.
This list is often read with interest by non-specialists such as myself. I have no idea for example what "VCS" means.
While I sympathise, this is a list aimed at programmers, and while non-specialists are welcome, they are not the primary audience. I think you will be better off trying to learn programmer's jargon than asking programmers not to use common, if specialised, words in their technical conversations. You wouldn't expect (say) car enthusiasts to stop using the word "torque", or doctors not to use "dialysis", just because a non-specialist might wander by and be listening in. -- Steven

I am a programmer, of some 30-odd years full-time. But that doesn't mean I understand every acronym of every specialised field under the sun. "Version Control System" instead of "VCS" is perfectly comprehensible and only takes a little longer to type. "VCS" meant nothing to me. I follow the postings on python-dev and python-ideas with keen interest. On 23/02/2012 12:31, Steven D'Aprano wrote:

On Thu, Feb 23, 2012 at 6:00 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
+1 to this advice. I don't even sympathize. I have to look up the new jargon invented by the youngsters *all the time*. But using a search engine to educate myself is much more effective than asking around. And yes, if the search engine somehow doesn't help, just ask an explanation for a specific term. Not every problem can be fixed by asking everyone else to change their behavior. This is a technical list and technical jargon will be flouted. Deal with it. -- --Guido van Rossum (python.org/~guido)

On 2/23/2012 9:35 AM, Ned Batchelder wrote:
Googling either "vcs git" or "vcs python" shows "Version Control System" clearly highlighted right on the search results page.
Googling just vcs returns as third hit "Version Control System" and a Wikipedia link. Alternatives like Verified Carbon Standard and Veterans Canteen Service are easily rejected in the context of this list ;-). -- Terry Jan Reedy

On Thu, Feb 23, 2012 at 02:24:35PM -0500, Terry Reedy wrote:
http://www.acronymfinder.com/VCS.html lists VCS at the second place. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

Am 23.02.2012 11:56, schrieb anatoly techtonik:
"Every single" makes it sounds like there are dozens... Apart from that: a diff/patch algorithm is such an integral part of version control that I would *not* expect them to use difflib, but something more sophisticated/optimized/etc. Georg

Georg Brandl writes:
But Anatoly isn't talking about the algorithm. He's talking about the output, and actually, I would expect them to use something diff(1) and diff3(1) compatible for hunk-oriented changes.[1] My experience with home-grown diff functions suggests that very few produce output as good as that of diff(1), and only git seems to be an improvement (but it's not backward compatible, as the tracker/review tool maintainers regularly mention). It's true that there are better algorithms than the one used by diff(1) (such as the "patience diff" Bazaar uses, and git offers as an option), but there's no need to change the hunk format as far as I have seen, and the file headers could easily be standardized I would think. Footnotes: [1] Darcs for one allows non-hunk-based changes, specifically a token-replace patch. And there are binary diffs such as xdelta, and word diffs like wdiff, which necessarily use a different format since they are not line-oriented.
participants (17)
-
anatoly techtonik
-
Antoine Pitrou
-
Dan Colish
-
Ethan Furman
-
Georg Brandl
-
Guido van Rossum
-
Mark Lawrence
-
Masklinn
-
Matt Joiner
-
Ned Batchelder
-
Nick Coghlan
-
Oleg Broytman
-
Paul Moore
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy