Problems with revision 4077 of new SVN repository

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm trying to mirror the brand-new Python SVN repository with SVK, to better be able to track both the trunk and the various branches. Since I'm not a Python developer and don't have svn+ssh access, I'm doing so over http. The process fails when trying to fetch revision 4077, with the following error message: "RA layer request failed: REPORT request failed on 'projects/!svn/bc/41373/python': The REPORT request returned invalid XML in the response: XML parse error at line 7: not well-formed (invalid token) (/projects/!svn/bc/41373/python)" The thread at http://svn.haxx.se/dev/archive-2004-07/0793.shtml suggests that the problem may lie in the commit message for revision 4077: if it has a character in the 0x01-0x1f range (which are invalid XML), then Subversion methods like http: will fail to retrieve it, while methods like file: will succeed. I haven't tried svn+ssh: since I don't have an SSH key on the server. Trying "svn log -r 4077 http://svn.python.org/projects/python/" also fails: subversion/libsvn_ra_dav/util.c:780: (apr_err=175002) svn: REPORT request failed on '/projects/!svn/bc/4077/python' subversion/libsvn_ra_dav/util.c:760: (apr_err=175002) svn: The REPORT request returned invalid XML in the response: XML parse error at line 7: not well-formed (invalid token) (/projects/!svn/bc/4077/python) When I visit http://svn.python.org/view/python/?rev=4077, I can see the offending log message. Sure enough, there's a 0x1b character in it, between the space after "Added" and the "h" immediately before the word "Moved". This problem can be fixed by someone with root permissions on the SVN server logging in and running the following: echo "New commit message goes here" > new-message.txt svnadmin setlog --bypass-hooks -r 4077 /path/to/repos new-message.txt If there are other, similar problems later in the SVN repository, I was unable to find them because the SVK mirror process consistently halts at revision 4077. If revision 4077 is fixed and I turn up other log problems, I'll report them as well. - -- Robin Munn rmunn@pobox.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDaRd46OLMk9ZJcBQRApjAAJ9K3Y5z1q4TulqwVjmZTZb9ZgY31ACcD8RI fNFmGL2U4XaIKa2n6UUyxEA= =tEbq -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. Löwis wrote:
Robin Munn wrote:
echo "New commit message goes here" > new-message.txt svnadmin setlog --bypass-hooks -r 4077 /path/to/repos new-message.txt
Thanks for pointing that out, and for giving those instructions. I now corrected the log message.
Revision 4077 is fine now. However, the same problem exists in revision 4284, which has a 0x01 character before the word "add". Same solution: echo "New commit message goes here" > new-message.txt svnadmin setlog --bypass-hooks -r 4284 /path/to/repos new-message.txt If there are two errors of the same type within about 200 revisions, there may be more. I'm currently running "svn log" on every revision in the Python SVN repository to see if I find any more errors of this type, so that I don't have to hunt them down one-by-one by rerunning SVK. I'll post my findings when I'm done. - -- Robin Munn rmunn@pobox.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDaUho6OLMk9ZJcBQRAg5eAJ9cJTPKX69DhXJyoT/cDV5GmZlC3QCfRj/E wCix8IYU8xbh5/Ibnpa+kg4= =+jLR -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robin Munn wrote:
Revision 4077 is fine now. However, the same problem exists in revision 4284, which has a 0x01 character before the word "add". Same solution:
echo "New commit message goes here" > new-message.txt svnadmin setlog --bypass-hooks -r 4284 /path/to/repos new-message.txt
If there are two errors of the same type within about 200 revisions, there may be more. I'm currently running "svn log" on every revision in the Python SVN repository to see if I find any more errors of this type, so that I don't have to hunt them down one-by-one by rerunning SVK. I'll post my findings when I'm done.
My script is up to revision 17500 with no further problems found; I now believe that 4077 and 4284 were isolated cases. Once 4284 is fixed, it should now be possible to SVK-mirror the entire repository. - -- Robin Munn rmunn@pobox.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDaXJF6OLMk9ZJcBQRAtZpAJ9iE1SlRJiQQOdIuBFuvjmQG3gshACgl9/A vbsGD0bX3NCirQC5qtxdLYo= =sgk/ -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. Löwis wrote:
Robin Munn wrote:
Revision 4077 is fine now. However, the same problem exists in revision 4284, which has a 0x01 character before the word "add". Same solution:
I now have fixed that as well.
Regards, Martin
And my script just finished running, with no further errors of this type found. So doing an SVK mirror of the repository should work now, barring any further surprises. I'm starting the SVK sync now; we'll see what happens. Thanks for fixing these! - -- Robin Munn rmunn@pobox.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDacVN6OLMk9ZJcBQRApUbAJ9+Ly5vPr8HRmoRbwJ3po4IWe8PBwCePTdm XNx8HGqPvs7fwahHuJSogMw= =a6Nc -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Robin Munn wrote:
So doing an SVK mirror of the repository should work now, barring any further surprises. I'm starting the SVK sync now; we'll see what happens.
Confirmed; the SVK mirror took about 18 hours, but it completed successfully with no further problems. Again, thanks for fixing the issues so quickly. - -- Robin Munn rmunn@pobox.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDaqiZ6OLMk9ZJcBQRAjGuAJwLmbrxBgrHYUb/7LOvjq89GfKrWACghGgn pvuMT5edAfMw3OAoZf5mJiw= =2i88 -----END PGP SIGNATURE-----

I have a question after this exhilarating exchange. Is there a way to prevent this kind of thing in the future, e.g. by removing or rejecting change log messages with characters that are considered invalid in XML? (Or should perhaps the fix be to suppress or quote these characters somehow in XML?) -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
I have a question after this exhilarating exchange.
Is there a way to prevent this kind of thing in the future, e.g. by removing or rejecting change log messages with characters that are considered invalid in XML?
I don't think it can happen again. Without testing, I would hope subversion rejects log messages that contain "random" control characters (if it doesn't, I should report that as a bug). The characters are in there because of the CVS conversion (that might be a bug in cvs2svn, which should have replaced them perhaps). It only happened in very old log messages - so perhaps even CVS doesn't allow them anymore. In XML 1.0, there is a lot of confusion about including control characters in text. In XML 1.1, this was clarified that you can include them, but only through character references. So in the future, subversion might be able to transmit such log messages in well-formed webdav. Regards, Martin

[Guido van Rossum]
Is there a way to prevent this kind of thing in the future, e.g. by removing or rejecting change log messages with characters that are considered invalid in XML?
Suppose TOP is the top of the Subversion repository. The easiest way is providing a TOP/hook/pre-commit script. If the script exits with non-zero status, usually with some clear diagnostic on stderr, the whole commit aborts, and the diagnostic is shown to the committing user. The tricky part is getting the tentative log message from within the script. This is done by popening "svnlook log -t ARG2 ARG1", where ARG1 and ARG2 are arguments given to the pre-commit script. -- François Pinard http://pinard.progiciels-bpi.ca
participants (4)
-
"Martin v. Löwis"
-
François Pinard
-
Guido van Rossum
-
Robin Munn