[Python-Dev] PEP 385: the eol-type issue
Glenn Linderman
v+python at g.nevcal.com
Wed Aug 5 19:43:57 CEST 2009
On approximately 8/5/2009 4:28 AM, came the following characters from
the keyboard of Dirkjan Ochtman:
> On Wed, Aug 5, 2009 at 13:19, Mark Hammond<mhammond at skippinet.com.au> wrote:
>> Configuring on each clone would certainly be sub-optimal, so the proposal is
>> this configuration be stored in a versioned file in the repo.
>
> Even if we do that, enabling hg extensions will still need to be done
> locally -- although it can be done per-user/box instead of per-clone.
On approximately 8/5/2009 9:24 AM, came the following characters from
the keyboard of Paul Moore:
> 2) This behaviour is something needed for Python only. I've no issue
> with enabling win32text globally, but I'd want to be clear that it is
> a no-op unless specifically requested (ie, something like
> **=cleverencode is *not* used in the absence of an explicit set of
> rules). That may well be the case, but I had the impression that
> win32text tried to be "automatic", so I'd like to verify it.
Depending on [Windows] users to configure their installation of
Mercurial to work with the Python repository is lame; it will lead to
new Windows contributors getting beat-up at check-in time, and make them
less likely to want to contribute even the work they have already done
(with wrong EOL), and much less to want to start future contributions,
because some Unix Python hacker will be nasty about "Didn't you RTFM?"
(Maybe not at first, but eventually).
If the configuration settings have to be different per project for
Windows developers using Mercurial for multiple projects, then that is
also lame... Windows developers would have to keep changing their
configurations, or (implied in above discussion) remember to recreate
settings for each new clone or branch or whatever of the Python project.
This is also error-prone, and leads to the above problem a different way.
I have read this whole discussion, but want to step back and look at it
from a theoretical viewpoint. A good solution would have the following
characteristics:
INSTALLATION) The developer should install the [D]VCS (for this
discussion, Mercurial, present or future version), and attempt to access
a repository (for this discussion, the Python repository, converted and
configured for the chosen [D]VCS). The resultant environment should
automatically be configured to work properly. If any [D]VCS extensions
are required for the project, they should be automatically installed and
configured, or the user given explicit instructions on how to do so, as
a one-time installation step, that adversely affects no other projects
for which the [D]VCS is used by that or other users of the present
installation.. See below for what properly means.
EOL CONFIGURATION) Each file, when added to the repository, should have
a repository setting that indicates what the appropriate EOL type is for
that file. The values I have heard are \n only, \r\n, platform-native,
and binary. I haven't heard \r only in this discussion, but have heard
it in other similar discussions, and it may be a useful setting for
Mercurial to have, if the feature must be newly implemented there. I
believe there are also systems that use RS to separate lines, and
perhaps other things (and are there new Unicode control characters that
could be used for line endings?), so it might be good to leave a few
unassigned values in such a setting. I don't think any setting should
be created to allow mixed line ending usage within a file, except
binary. Per repository default for this setting should be available to
avoid burdening the user when creating the typical type of file.
ENCODING CONFIGURATION) Each file, when created, should have a
repository settings that declares its character repertoire and encoding,
and if it is a Unicode UTF encoding, whether or not it should have a
leading BOM. In my opinion, all source code files should use a Unicode
encoding, the exception being for test files that help test encoding
support in internationalized environments. But the feature supports
other people's opinions too. Per repository default for this setting
should be available to avoid burdening the user when creating the
typical type of file.
CHECKOUT) Check-outs should be sensitive to the user's local environment
(platform and locale settings), and non-binary files should be converted
from the repository format to the local encoding and platform-specific
line endings. Settings to override the line endings should be
optionally available for users whose tools understand other line
endings, and prefer them over the native line endings. If the
characters used within a file cannot be converted losslessly to the
encoding specified by the locale settings, then it should not be able to
be checked out. A special override might be useful for using a lossy
transformation for a read-only view of the file, at user request.
CHECKIN) Check-ins, even local check-ins to local clones or branches,
should automatically convert encodings and line endings from the
platform and locale setting to the encoding and line ending specified by
the repository for that file. If the characters in the modified file
cannot be transformed losslessly to the repository repertoire and
encoding, the check-in should be prevented.
The CHECKIN should be a requirement of a useful [D]VCS, regardless of if
any other capabilities are present.
Even if none of the existing tools can reach the above flexibility, the
problems that results from using tools that do not have such flexibility
should be understood in terms of their specific deficiencies compared to
the theoretical model.
I can think of only one other solution that properly handles the
problems (which is punting, really): to require the development
environment to support the repertoire, encoding, and line endings of the
repository. Doing this in a cross-platform manner is hard, because the
tool sets (editors, compilers, databases, etc.) tend to support the
platform-native convention better than the non-native conventions. It
sounds like Mercurial's win32text extension is one form of this sort of
requirement. CHECKIN should be a requirement even in this case, to
validate the incoming data file. Basic software design requires
validation of incoming data.
I have no clue how many of these characteristics are implemented by
Mercurial (or any other VCS or DVCS, I've been 7 years away from using
SCCS, CVS, and Clearcase, but none of them had such features then, and
I've not used the modern crop of VCSes much: git, svn, hg, bazaar,
except a little in passing, but haven't read any documentation, nor
attempted to set up a project myself in any of them).
If none of the existing tools can reach the above flexibility, then
there will be problems that result, and understanding what the problems
are, and coming up with documented workarounds, processes, and auxiliary
tools on each platform/envirenment to cure or prevent them, would seem
to be necessary to support the use of such tools.
Since Mercurial is the presently chosen DVCS for Python to migrate to,
I'd be delighted to learn how close it comes to the theoretical model,
and I'm sure someone out there knows. When I have some time, I'll
attempt to figure that out by reading the Mercurial documentation... I
have a personal (Python, cross-platform) project that is in need of a
DVCS soon, and so I'm watching this discussion with much interest, to
know whether I should also choose Mercurial, or should choose something
that is closer to the theoretical solution outlined above (if there is
something that is, or appears to be more likely to reach it sooner).
--
Glenn -- http://nevcal.com/
===========================
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Python-Dev
mailing list