On approximately 8/5/2009 4:28 AM, came the following characters from the keyboard of Dirkjan Ochtman:
On Wed, Aug 5, 2009 at 13:19, Mark Hammondmhammond@skippinet.com.au wrote:
Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo.
Even if we do that, enabling hg extensions will still need to be done locally -- although it can be done per-user/box instead of per-clone.
On approximately 8/5/2009 9:24 AM, came the following characters from the keyboard of Paul Moore:
- This behaviour is something needed for Python only. I've no issue
with enabling win32text globally, but I'd want to be clear that it is a no-op unless specifically requested (ie, something like **=cleverencode is *not* used in the absence of an explicit set of rules). That may well be the case, but I had the impression that win32text tried to be "automatic", so I'd like to verify it.
Depending on [Windows] users to configure their installation of Mercurial to work with the Python repository is lame; it will lead to new Windows contributors getting beat-up at check-in time, and make them less likely to want to contribute even the work they have already done (with wrong EOL), and much less to want to start future contributions, because some Unix Python hacker will be nasty about "Didn't you RTFM?" (Maybe not at first, but eventually).
If the configuration settings have to be different per project for Windows developers using Mercurial for multiple projects, then that is also lame... Windows developers would have to keep changing their configurations, or (implied in above discussion) remember to recreate settings for each new clone or branch or whatever of the Python project. This is also error-prone, and leads to the above problem a different way.
I have read this whole discussion, but want to step back and look at it from a theoretical viewpoint. A good solution would have the following characteristics:
INSTALLATION) The developer should install the [D]VCS (for this discussion, Mercurial, present or future version), and attempt to access a repository (for this discussion, the Python repository, converted and configured for the chosen [D]VCS). The resultant environment should automatically be configured to work properly. If any [D]VCS extensions are required for the project, they should be automatically installed and configured, or the user given explicit instructions on how to do so, as a one-time installation step, that adversely affects no other projects for which the [D]VCS is used by that or other users of the present installation.. See below for what properly means.
EOL CONFIGURATION) Each file, when added to the repository, should have a repository setting that indicates what the appropriate EOL type is for that file. The values I have heard are \n only, \r\n, platform-native, and binary. I haven't heard \r only in this discussion, but have heard it in other similar discussions, and it may be a useful setting for Mercurial to have, if the feature must be newly implemented there. I believe there are also systems that use RS to separate lines, and perhaps other things (and are there new Unicode control characters that could be used for line endings?), so it might be good to leave a few unassigned values in such a setting. I don't think any setting should be created to allow mixed line ending usage within a file, except binary. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file.
ENCODING CONFIGURATION) Each file, when created, should have a repository settings that declares its character repertoire and encoding, and if it is a Unicode UTF encoding, whether or not it should have a leading BOM. In my opinion, all source code files should use a Unicode encoding, the exception being for test files that help test encoding support in internationalized environments. But the feature supports other people's opinions too. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file.
CHECKOUT) Check-outs should be sensitive to the user's local environment (platform and locale settings), and non-binary files should be converted from the repository format to the local encoding and platform-specific line endings. Settings to override the line endings should be optionally available for users whose tools understand other line endings, and prefer them over the native line endings. If the characters used within a file cannot be converted losslessly to the encoding specified by the locale settings, then it should not be able to be checked out. A special override might be useful for using a lossy transformation for a read-only view of the file, at user request.
CHECKIN) Check-ins, even local check-ins to local clones or branches, should automatically convert encodings and line endings from the platform and locale setting to the encoding and line ending specified by the repository for that file. If the characters in the modified file cannot be transformed losslessly to the repository repertoire and encoding, the check-in should be prevented.
The CHECKIN should be a requirement of a useful [D]VCS, regardless of if any other capabilities are present.
Even if none of the existing tools can reach the above flexibility, the problems that results from using tools that do not have such flexibility should be understood in terms of their specific deficiencies compared to the theoretical model.
I can think of only one other solution that properly handles the problems (which is punting, really): to require the development environment to support the repertoire, encoding, and line endings of the repository. Doing this in a cross-platform manner is hard, because the tool sets (editors, compilers, databases, etc.) tend to support the platform-native convention better than the non-native conventions. It sounds like Mercurial's win32text extension is one form of this sort of requirement. CHECKIN should be a requirement even in this case, to validate the incoming data file. Basic software design requires validation of incoming data.
I have no clue how many of these characteristics are implemented by Mercurial (or any other VCS or DVCS, I've been 7 years away from using SCCS, CVS, and Clearcase, but none of them had such features then, and I've not used the modern crop of VCSes much: git, svn, hg, bazaar, except a little in passing, but haven't read any documentation, nor attempted to set up a project myself in any of them).
If none of the existing tools can reach the above flexibility, then there will be problems that result, and understanding what the problems are, and coming up with documented workarounds, processes, and auxiliary tools on each platform/envirenment to cure or prevent them, would seem to be necessary to support the use of such tools.
Since Mercurial is the presently chosen DVCS for Python to migrate to, I'd be delighted to learn how close it comes to the theoretical model, and I'm sure someone out there knows. When I have some time, I'll attempt to figure that out by reading the Mercurial documentation... I have a personal (Python, cross-platform) project that is in need of a DVCS soon, and so I'm watching this discussion with much interest, to know whether I should also choose Mercurial, or should choose something that is closer to the theoretical solution outlined above (if there is something that is, or appears to be more likely to reach it sooner).