[Python-Dev] Mercurial migration: progress report (PEP 385)

Sat Jul 4 04:30:22 CEST 2009

Mark Hammond wrote:
> This *appears* to be correct at first glance, but in practice it doesn't
> interact well with wildcards specifications - particularly
> '**=cleverencode'.  I started a thread on the ML and submitted a patch a
> few months ago to fix this and it was accepted, but sadly it seemed to
> have dropped off the queue somewhere.  The patch now conflicts, and I've
> promised to resubmit it when I find time.  But even with that in place
> it doesn't address the more general problem that when only *some*
> developers use the extension - mixed mode files are still very possible,
> at which point win32text starts reporting enough spurious conflicts to
> become unusable for me.  Eg, doing a clean checkout of mozilla with
> win32text enabled results in a working tree with hundreds of files
> reporting every line in the file has changed.

Ah, the scope of the issues begin to become clear...

While a server side hook should be able to deal with the mixed mode file
problem, I'm not sure what can be done about the problems with properly
configuring win32text.

If wildcard specifications don't interact properly with filtered
negative specifications then that would appear to rule out that
approach. The presence of text files without extensions like NEWS and
ACKS in the Python repository appears to rule out the use of positive
filters to select only the files that are stored in the repository with
\n line endings.

I spent some more time exploring the approach recommended on the
win32text documentation page as a possible way of handling the situation:
1. Store all text files (even Windows specific ones) in \n format in the
repository
2. Apply the win32text precommit hook to forbid the introduction of \r\n
line endings
3. Use the recommended settings from the win32text documentation:
===============================
[extensions]
hgext.win32text=

[encode]
# Encode files that don't contain NUL characters.
** = cleverencode:

[decode]
# Decode files that don't contain NUL characters.
** = cleverdecode:
===============================

This would be the equivalent of setting "svn:eol-style native" on every
file in the repository.

However, in running Martin's query (svn pg -R svn:eol-style .|grep CRLF)
over 2.x and 3.x checkouts, I found that \r\n line endings are currently
enforced for:
 - .bat files under Tools/buildbot
 - .dsp and .dsw files under PC/VC6
 - Lib/email/test/data/msg_26.txt

I believe the rationale for the first two is to allow a source tarball
to be prepared on Linux but still be usable on Windows (e.g. see [1]).

I'm not clear on the rationale for the explicit CRLF line ending on the
email test message, but I would guess it is to ensure that the email
module can handle CRLF line endings correctly regardless of platform.

Only VC6 files appear on the list because later versions of Visual
Studio actually tolerate \n line endings in their project files.

Mercurial's heuristic handling of text vs binary and expected line
endings fails completely for these use cases.

I think there needs to be a solid answer in place for these use cases
before the actual migration to Mercurial takes place. A hand-waving "use
win32text" isn't enough - it needs to be "use win32text with these exact
settings" (with server side hook support to enforce the rules).

And since Mercurial doesn't even allow us to say "this is a binary file"
 the way CVS used to I'm currently not seeing any way for that to happen
except for win32text to be updated to correctly handle wild cards in
combination with negative filters.

Cheers,
Nick.

[1] http://mail.python.org/pipermail/python-dev/2006-March/062225.html

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------