End of the mystery "@README.txt Mercurial bug"

Hi, One month ago, unit tests were added to IDLE (cool!) with a file called @README.txt. The @ was used to see the name on top in a listing of the directory. Some developers began to get strange Mercurial errors like: "abort: data/Lib/idlelib/idle_test/@README.txt.i at 7573717b9e6f: no match" " 83941: empty or missing Lib/idlelib/idle_test/@README.txt " etc. Senthil reported the issue on python-committers mailing list: http://mail.python.org/pipermail/python-committers/2013-May/002565.html The @ character was discussed. Replacing it with "_" was proposed. Guido asked to simply rename the name "README.txt", he wrote: "I think we have a zen rule about this: Special cases aren't special enough to break the rules." Senthil was asked to upgrade its Mercurial version. Someone supposed that it is a disk issue. Anyway, the issue disappeared with a fresh clone. I also had the issue on 3 different computers, and so I reported the issue upstream: http://bz.selenic.com/show_bug.cgi?id=3954 I discussed with a Mercurial developer, Matt, on IRC. He asked how the server is managed, if we are using only one physical server, if repositories are copied with rsync, etc. I was unable to answer, I don't have access to hg.python.org server. The issue was closed, but 20 days later (today) I reproduced the issue again. I cloned the repository at a specific revision, tried to update to another specific revision: no bug. I also played with with hg bisect, because I suspected a bug in bisect: no bug. I tried to update at each revision between 83900 and 84348 to check if @README.txt disappears from .hg/store: still no bug. I also ran fsck: no error (but the disk is mounted, so I don't know if the report is reliable). And then I ran "make distclean"... Victor

On Tue, Jun 25, 2013 at 5:58 PM, Benjamin Peterson <benjamin@python.org>wrote:
Yeah, the final part is here: http://bz.selenic.com/show_bug.cgi?id=3954#c4 But still I have question as why hg complained about @README in the first place. Also, I hope make distclean is not working "inside" .hg folder. -- Senthil

On 6/25/2013 9:33 PM, Senthil Kumaran wrote:
I think that's exactly what's happening.
From the bug report:
find $(srcdir) '(' -name '*.fdc' -o -name '*~' \ -o -name '[@,#]*' -o -name '*.old' \ -o -name '*.orig' -o -name '*.rej' \ -o -name '*.bak' ')' \ -exec rm -f {} ';' Will find files beginning with '@' inside subdirectories of $(srcdir)/.hg. Just this week I saw someone use the logical equivalent of: find $(srcdir)/* ... to avoid this problem. It won't expand the .hg top-level directory. -- Eric.

On Tue, Jun 25, 2013 at 10:11:04PM -0400, "Eric V. Smith" <eric@trueblade.com> wrote:
Or find \( -type d -name .hg -prune \) -o ... Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 6/26/2013 6:43 AM, a.cavallo@cavallinux.eu wrote:
.. or having hg "purging" unwanted build artifact (probably cleaning up the .hgignore file first)
How would that work? How could hg purge the .bak, .orig, .rej, .old, etc. files?
I'm torn. Yours is more obvious, but we'd likely need to add .svn, .git, etc. Maybe find $(srcdir)/[a-zA-Z]* ... would be good enough to ignore all dot directories/files? -- Eric.

On Wed, Jun 26, 2013 at 08:18:27AM -0400, "Eric V. Smith" <eric@trueblade.com> wrote:
How many of those dot-files/directories are there beside .*ignore?
etc. Maybe find $(srcdir)/[a-zA-Z]* ... would be good enough to ignore all dot directories/files?
On the other hand yes, I think it'd be enough. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 6/26/2013 9:02 AM, Eric V. Smith wrote:
I created http://bugs.python.org/issue18312 to track this. -- Eric.

On 26 Jun, 2013, at 14:18, Eric V. Smith <eric@trueblade.com> wrote:
Is the find command in the distclean target still needed? The comment for the distclean target says it is used to clean up the tree for distribution, but that's easier to accomplish by using a clean checkout. The target is still useful to get a clean tree when you're building with srcdir == builddir, but you don't need the find command for that. Ronald

On 6/26/2013 8:57 AM, Ronald Oussoren wrote:
I run 'make distclean' fairly often, but maybe it's just out of habit. If I'm adding/deleting modules, I want to make sure there are no build artifacts. And since I have modified files, a clean checkout won't help (easily, at least). But me running distclean is not the same as answering your question about the find command being needed, I realize. -- Eric.

On 26 Jun, 2013, at 15:39, Barry Warsaw <barry@python.org> wrote:
Sure, but is it necessary to run the find command for removing backup files in make distclean? When the find command is removed you'd still end up with a tree that's clean enough to perform a build from scratch, although the tree won't be perfectly clean. BTW. I usually build in a separate directory, that makes cleaning up even easier :-) Ronald

On Wed, 26 Jun 2013 09:39:54 -0400, Barry Warsaw <barry@python.org> wrote:
We also sometimes ask someone reporting an issue to do a make distclean and recompile, and many of these reporters will be working from a tarball rather than a checkout. Sure, they could re-unpack the tarball (if they haven't deleted it already), but make distclean is easier. --David

Le Wed, 26 Jun 2013 15:12:45 +0200, a.cavallo@cavallinux.eu a écrit :
distclean removes only what we want to remove, while hg purge will remove any untracked file, including any files that you may have wanted to keep (notes, work-in-progress patches, personal data files, etc.). Regards Antoine.

On Wed, Jun 26, 2013 at 8:12 AM, <a.cavallo@cavallinux.eu> wrote:
I've recently discovered purge and have started using it on Windows since there is no `make distclean`, and it is very effective. `hg purge -p` shows what will be removed (which should match anything with a ? in `hg status`), `hg purge` removes it, and `hg purge --all` clears out everything that's not tracked (including things listed in .hgignore) giving a fresh checkout without having to re-download. Very convenient, especially since it's a built-in extension.

2013/6/26 Eric V. Smith <eric@trueblade.com>:
In my opinion, make distclean should only remove files generated by configure and a build. It should not remove random files. *~, .orig, .rej, .back should be kept. They are not generated by configure nor make. What are these "@*", ",*" and "#*" files? Why does "make distclean" remove them? "make distclean" removes also the "tags" file which is generated by the ctags program, useful tool to browse the C source code (ex: in vim). Why does "make distclean" remove it? In short, the whole "find ... -exec rm -f {} ';'" command should be removed from "make distclean". (They are other commands to remove Makefile, "*.o" files, etc.) If someone really need such cleanup, another Makefile rule should be added. Victor

*~, .orig, .rej, .back should be kept. They are not generated by configure nor make.
Ideally they should be left untracked not ignored. While devs can certainly add them to the .hgignore list to make life easier, a repository should be clean of extra files (or shown as untracked). I'd add that generated files (like the one generated from pgen) shouldn't be part of defaul make target but keept as fully tracked files (and regenerated on demand through a special make target). I hope this helps.

Victor Stinner writes:
In my opinion, make distclean should only remove files generated by configure and a build. It should not remove random files.
FWIW, the GNU standard for these targets is something like: ## make clean or make mostlyclean ## Delete all files from the current directory that are normally ## created by building the program. Don't delete the files that ## record the configuration. Also preserve files that could be ## made by building, but normally aren't because the distribution ## comes with them. ## Delete `.dvi' files here if they are not part of the ## distribution. ## make distclean ## Delete all files from the current directory that are created by ## configuring or building the program. If you have unpacked the ## source and built the program without creating any other files, ## `make distclean' should leave only the files that were in the ## distribution. ## make realclean ## Delete everything from the current directory that can be ## reconstructed with this Makefile. This typically includes ## everything deleted by distclean, plus more: C source files ## produced by Bison, tags tables, info files, and so on. ## make extraclean ## Still more severe - delete backup and autosave files, too. This is from the XEmacs Makefile.in.in, so it's not authoritative. Still, it seems pretty intuitive and presumably is in wide use, not to forget matching Victor's preferred usage for 'distclean'.

On Mon, 01 Jul 2013 08:33:38 +0200, Georg Brandl <g.brandl@gmx.net> wrote:
That's a good point. If the find were dropped, the target would have to be renamed. "make configureclean", maybe. But I think it is easier and less confusing just to leave things as they will be after Eric applies the fix proposed in http://bugs.python.org/issue18312. --David

If we disallowed builds *from in source tree* requiring all output to go into a separate build output directory instead (like any sane person does*) we wouldn't need a crazy find in the source tree to mess things up. ;) this can be done today: $ mkdir foo && cd foo && ../my-hg/2.7/configure --srcdir=../my-hg/2.7 && make -j12 I think all we'd need to do is disallow the cwd when configuring or building from being within srcdir. -gps * note: the author is normally too lazy to be sane because it involves slightly more typing. On Mon, Jul 1, 2013 at 4:19 AM, R. David Murray <rdmurray@bitdance.com>wrote:

Hi Georg, 2013/7/1 Georg Brandl <g.brandl@gmx.net>:
I don't understand why you are suggesting me to use "make clean". I would like to start a fresh build, so remove configure and Makefile, but I also want to keep my local changes and local files not tracked by Mercurial. I need this when the build does not work because a new file was added or a build script was modified. "make clean" does not change anything for this use case. For example, I don't understand why "make distclean" removes the "tags" file. Generating this file takes 20 to 30 seconds on my slow laptop, and it is not generated by Python build system, but by the external ctags program. Don't you think that we need two different "distclean" commands? One GNU-style "distclean" which only removes configure and Makefile, and another "distclean" which is the GNU "distclean" + the extra find removing temporary files. Victor

On Mon, 01 Jul 2013 23:05:56 +0200, Victor Stinner <victor.stinner@gmail.com> wrote:
The command that does not remove the extra files is *not* a 'distclean' command. 'buildclean' or 'configclean', but not 'distclean'. distclean still needs to be fixed, so please open a new issue for adding buildclean or whatever you want to call it, as Eric requested in the existing issue. --David

Am 01.07.2013 23:05, schrieb Victor Stinner:
Right, I had wrongly remembered that "clean" also removed the Makefile. Note that to add to the confusion, there's an additional target named "clobber" (which is called by distclean). Both "clobber" and "distclean" remove Python generated and potentially user generated files. IMO things could be rearranged without too much effort so that "distclean" only removes Python-build-generated files, whereas "clobber" removes also user generated files. But please don't introduce yet another target. Georg

It's like this. Whenever you use special characters in a file name, you're asking for trouble. The shell and the OS have negotiate how to interpret it. It bigger than git, and not a bug. The issue is between the file system, the kernel, and the shell.
Try it on different OS's on different machines (MacOS, Linux, Windows) . If my theory is right, it should be inconsistent across machines, but consistent within the same machine. -- MarkJ Tacoma, Washington

On Tue, Jun 25, 2013 at 5:58 PM, Benjamin Peterson <benjamin@python.org>wrote:
Yeah, the final part is here: http://bz.selenic.com/show_bug.cgi?id=3954#c4 But still I have question as why hg complained about @README in the first place. Also, I hope make distclean is not working "inside" .hg folder. -- Senthil

On 6/25/2013 9:33 PM, Senthil Kumaran wrote:
I think that's exactly what's happening.
From the bug report:
find $(srcdir) '(' -name '*.fdc' -o -name '*~' \ -o -name '[@,#]*' -o -name '*.old' \ -o -name '*.orig' -o -name '*.rej' \ -o -name '*.bak' ')' \ -exec rm -f {} ';' Will find files beginning with '@' inside subdirectories of $(srcdir)/.hg. Just this week I saw someone use the logical equivalent of: find $(srcdir)/* ... to avoid this problem. It won't expand the .hg top-level directory. -- Eric.

On Tue, Jun 25, 2013 at 10:11:04PM -0400, "Eric V. Smith" <eric@trueblade.com> wrote:
Or find \( -type d -name .hg -prune \) -o ... Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 6/26/2013 6:43 AM, a.cavallo@cavallinux.eu wrote:
.. or having hg "purging" unwanted build artifact (probably cleaning up the .hgignore file first)
How would that work? How could hg purge the .bak, .orig, .rej, .old, etc. files?
I'm torn. Yours is more obvious, but we'd likely need to add .svn, .git, etc. Maybe find $(srcdir)/[a-zA-Z]* ... would be good enough to ignore all dot directories/files? -- Eric.

On Wed, Jun 26, 2013 at 08:18:27AM -0400, "Eric V. Smith" <eric@trueblade.com> wrote:
How many of those dot-files/directories are there beside .*ignore?
etc. Maybe find $(srcdir)/[a-zA-Z]* ... would be good enough to ignore all dot directories/files?
On the other hand yes, I think it'd be enough. Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.

On 6/26/2013 9:02 AM, Eric V. Smith wrote:
I created http://bugs.python.org/issue18312 to track this. -- Eric.

On 26 Jun, 2013, at 14:18, Eric V. Smith <eric@trueblade.com> wrote:
Is the find command in the distclean target still needed? The comment for the distclean target says it is used to clean up the tree for distribution, but that's easier to accomplish by using a clean checkout. The target is still useful to get a clean tree when you're building with srcdir == builddir, but you don't need the find command for that. Ronald

On 6/26/2013 8:57 AM, Ronald Oussoren wrote:
I run 'make distclean' fairly often, but maybe it's just out of habit. If I'm adding/deleting modules, I want to make sure there are no build artifacts. And since I have modified files, a clean checkout won't help (easily, at least). But me running distclean is not the same as answering your question about the find command being needed, I realize. -- Eric.

On 26 Jun, 2013, at 15:39, Barry Warsaw <barry@python.org> wrote:
Sure, but is it necessary to run the find command for removing backup files in make distclean? When the find command is removed you'd still end up with a tree that's clean enough to perform a build from scratch, although the tree won't be perfectly clean. BTW. I usually build in a separate directory, that makes cleaning up even easier :-) Ronald

On Wed, 26 Jun 2013 09:39:54 -0400, Barry Warsaw <barry@python.org> wrote:
We also sometimes ask someone reporting an issue to do a make distclean and recompile, and many of these reporters will be working from a tarball rather than a checkout. Sure, they could re-unpack the tarball (if they haven't deleted it already), but make distclean is easier. --David

Le Wed, 26 Jun 2013 15:12:45 +0200, a.cavallo@cavallinux.eu a écrit :
distclean removes only what we want to remove, while hg purge will remove any untracked file, including any files that you may have wanted to keep (notes, work-in-progress patches, personal data files, etc.). Regards Antoine.

On Wed, Jun 26, 2013 at 8:12 AM, <a.cavallo@cavallinux.eu> wrote:
I've recently discovered purge and have started using it on Windows since there is no `make distclean`, and it is very effective. `hg purge -p` shows what will be removed (which should match anything with a ? in `hg status`), `hg purge` removes it, and `hg purge --all` clears out everything that's not tracked (including things listed in .hgignore) giving a fresh checkout without having to re-download. Very convenient, especially since it's a built-in extension.

2013/6/26 Eric V. Smith <eric@trueblade.com>:
In my opinion, make distclean should only remove files generated by configure and a build. It should not remove random files. *~, .orig, .rej, .back should be kept. They are not generated by configure nor make. What are these "@*", ",*" and "#*" files? Why does "make distclean" remove them? "make distclean" removes also the "tags" file which is generated by the ctags program, useful tool to browse the C source code (ex: in vim). Why does "make distclean" remove it? In short, the whole "find ... -exec rm -f {} ';'" command should be removed from "make distclean". (They are other commands to remove Makefile, "*.o" files, etc.) If someone really need such cleanup, another Makefile rule should be added. Victor

*~, .orig, .rej, .back should be kept. They are not generated by configure nor make.
Ideally they should be left untracked not ignored. While devs can certainly add them to the .hgignore list to make life easier, a repository should be clean of extra files (or shown as untracked). I'd add that generated files (like the one generated from pgen) shouldn't be part of defaul make target but keept as fully tracked files (and regenerated on demand through a special make target). I hope this helps.

Victor Stinner writes:
In my opinion, make distclean should only remove files generated by configure and a build. It should not remove random files.
FWIW, the GNU standard for these targets is something like: ## make clean or make mostlyclean ## Delete all files from the current directory that are normally ## created by building the program. Don't delete the files that ## record the configuration. Also preserve files that could be ## made by building, but normally aren't because the distribution ## comes with them. ## Delete `.dvi' files here if they are not part of the ## distribution. ## make distclean ## Delete all files from the current directory that are created by ## configuring or building the program. If you have unpacked the ## source and built the program without creating any other files, ## `make distclean' should leave only the files that were in the ## distribution. ## make realclean ## Delete everything from the current directory that can be ## reconstructed with this Makefile. This typically includes ## everything deleted by distclean, plus more: C source files ## produced by Bison, tags tables, info files, and so on. ## make extraclean ## Still more severe - delete backup and autosave files, too. This is from the XEmacs Makefile.in.in, so it's not authoritative. Still, it seems pretty intuitive and presumably is in wide use, not to forget matching Victor's preferred usage for 'distclean'.

On Mon, 01 Jul 2013 08:33:38 +0200, Georg Brandl <g.brandl@gmx.net> wrote:
That's a good point. If the find were dropped, the target would have to be renamed. "make configureclean", maybe. But I think it is easier and less confusing just to leave things as they will be after Eric applies the fix proposed in http://bugs.python.org/issue18312. --David

If we disallowed builds *from in source tree* requiring all output to go into a separate build output directory instead (like any sane person does*) we wouldn't need a crazy find in the source tree to mess things up. ;) this can be done today: $ mkdir foo && cd foo && ../my-hg/2.7/configure --srcdir=../my-hg/2.7 && make -j12 I think all we'd need to do is disallow the cwd when configuring or building from being within srcdir. -gps * note: the author is normally too lazy to be sane because it involves slightly more typing. On Mon, Jul 1, 2013 at 4:19 AM, R. David Murray <rdmurray@bitdance.com>wrote:

Hi Georg, 2013/7/1 Georg Brandl <g.brandl@gmx.net>:
I don't understand why you are suggesting me to use "make clean". I would like to start a fresh build, so remove configure and Makefile, but I also want to keep my local changes and local files not tracked by Mercurial. I need this when the build does not work because a new file was added or a build script was modified. "make clean" does not change anything for this use case. For example, I don't understand why "make distclean" removes the "tags" file. Generating this file takes 20 to 30 seconds on my slow laptop, and it is not generated by Python build system, but by the external ctags program. Don't you think that we need two different "distclean" commands? One GNU-style "distclean" which only removes configure and Makefile, and another "distclean" which is the GNU "distclean" + the extra find removing temporary files. Victor

On Mon, 01 Jul 2013 23:05:56 +0200, Victor Stinner <victor.stinner@gmail.com> wrote:
The command that does not remove the extra files is *not* a 'distclean' command. 'buildclean' or 'configclean', but not 'distclean'. distclean still needs to be fixed, so please open a new issue for adding buildclean or whatever you want to call it, as Eric requested in the existing issue. --David

Am 01.07.2013 23:05, schrieb Victor Stinner:
Right, I had wrongly remembered that "clean" also removed the Makefile. Note that to add to the confusion, there's an additional target named "clobber" (which is called by distclean). Both "clobber" and "distclean" remove Python generated and potentially user generated files. IMO things could be rearranged without too much effort so that "distclean" only removes Python-build-generated files, whereas "clobber" removes also user generated files. But please don't introduce yet another target. Georg

It's like this. Whenever you use special characters in a file name, you're asking for trouble. The shell and the OS have negotiate how to interpret it. It bigger than git, and not a bug. The issue is between the file system, the kernel, and the shell.
Try it on different OS's on different machines (MacOS, Linux, Windows) . If my theory is right, it should be inconsistent across machines, but consistent within the same machine. -- MarkJ Tacoma, Washington
participants (16)
-
a.cavallo@cavallinux.eu
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Eric V. Smith
-
Georg Brandl
-
Gregory P. Smith
-
Mark Janssen
-
Oleg Broytman
-
R. David Murray
-
Ronald Oussoren
-
Senthil Kumaran
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Victor Stinner
-
Zachary Ware