So, I've been not-working on this, which I feel bad about. Suffice it to say the day job has required more of my time then usual for the past few weeks. I want to get back into it, so let's start by re-raising this issue, which Mark Hammond conveniently summarized below.
On 4/07/2009 2:03 PM, Mark Hammond wrote:
On 4/07/2009 12:30 PM, Nick Coghlan wrote:
And since Mercurial doesn't even allow us to say "this is a binary file" the way CVS used to I'm currently not seeing any way for that to happen except for win32text to be updated to correctly handle wild cards in combination with negative filters.
I agree with your conclusion. My ruminating on this over the last few months leaves me thinking this would involve:
* my older 'accepted but then lost' hg patch to allow an explicit 'none' rule for a single file to override wildcards.
This was and still is a good idea. It would be very nice if you could un-bitrot it and submit it for inclusion into crew-stable (so that it may land in the next release, which would hopefully be a somewhat near 1.3.2).
* win32text be enhanced to use a normal versioned file in the root of the repo, much like hgingore, where a project can maintain project wide rules.
I'm thinking that it should take stuff from .hgeols or whatever and apply rules from .hg/hgrc after that, so both may be used (and for backwards compatibility), but it sounds like a good idea in principle.
* win32text be enhanced such that all python developers, regardless of platform, are willing to use this extension, even if the majority of files happen to use their native line ending (sauce for the goose is sauce for the gander, and all that...)
I don't think that is necessary, I will elaborate below.
* commit hooks be implemented to enforce this - but this should not be necessary if the above was implemented and socially enforced.
You seem to advocate a two-step approach: enforce line endings through win32text, catch any errors that slipped through in a hook (commit hook is an optional first line of defense, changegroup hooks on the server to protect the rest of the world). I think inverting that approach would be better: have strict hooks on the server to prevent people from pushing inappropriate EOLs, and provide help on configuring win32text as an extra help for developers on Windows who use editors that work better with \r\n. That leaves people to pick their own weapon of choice against propagation of \r\n (e.g. better editor, commit hooks, whatever) while still making sure no inappropriate line endings land in the python.org repositories. It also seems to fit well with the whole consenting adults thing (but that might just be me). On Sun, Jul 19, 2009 at 15:27, Mark Hammond<skippy.hammond@gmail.com> wrote:
Sorry Dirkjan - I just noticed I didn't CC you on this mail originally. I'm wondering if you have any more thoughts on these EOL issues and if there is anything I can do to help?
Taking up the 'none' filter, first, and .hgeols, secondly, in the win32text extension would be wonderfully helpful, since I don't do much development on Windows and am therefore not that familiar with the extension in the first place. Cheers, Dirkjan
Dirkjan Ochtman wrote:
* commit hooks be implemented to enforce this - but this should not be necessary if the above was implemented and socially enforced.
You seem to advocate a two-step approach: enforce line endings through win32text, catch any errors that slipped through in a hook (commit hook is an optional first line of defense, changegroup hooks on the server to protect the rest of the world).
I think inverting that approach would be better: have strict hooks on the server to prevent people from pushing inappropriate EOLs, and provide help on configuring win32text as an extra help for developers on Windows who use editors that work better with \r\n. That leaves people to pick their own weapon of choice against propagation of \r\n (e.g. better editor, commit hooks, whatever) while still making sure no inappropriate line endings land in the python.org repositories. It also seems to fit well with the whole consenting adults thing (but that might just be me).
It's about not treating Windows developers as second class citizens. Their platform uses \r\n as its native line ending format, so they should be able to work in that format without any hassles by following some simple instructions (such as "ensure you have version X of the Windows hg client, enable the win32text extension and configure it in such-and-such a way"). Not "oh, yeah, that's an issue but if you search the Intarwebs there are a few different things you can do that kinda sorta work but are a bit fragile and klunky". The precise order the two issues (server side enforcement and client side assistance) are dealt with doesn't really matter because *both* issues need to be addressed before we migrate. win32text needs to be usable on non-Windows clients so that tarballs generated on a *nix machine get the line endings right in the Windows-only files. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On 4/08/2009 7:20 PM, Nick Coghlan wrote:
Dirkjan Ochtman wrote:
* commit hooks be implemented to enforce this - but this should not be necessary if the above was implemented and socially enforced.
You seem to advocate a two-step approach: enforce line endings through win32text, catch any errors that slipped through in a hook (commit hook is an optional first line of defense, changegroup hooks on the server to protect the rest of the world).
I think inverting that approach would be better: have strict hooks on the server to prevent people from pushing inappropriate EOLs, and provide help on configuring win32text as an extra help for developers on Windows who use editors that work better with \r\n. That leaves people to pick their own weapon of choice against propagation of \r\n (e.g. better editor, commit hooks, whatever) while still making sure no inappropriate line endings land in the python.org repositories. It also seems to fit well with the whole consenting adults thing (but that might just be me).
It's about not treating Windows developers as second class citizens. Their platform uses \r\n as its native line ending format, so they
Thanks Nick; I didn't want to be the only one saying that. There is a fine line between asserting reasonable requirements for Windows users and being obstructionist and unhelpful, and I'm trying to stay on the former side :)
should be able to work in that format without any hassles by following some simple instructions (such as "ensure you have version X of the Windows hg client, enable the win32text extension and configure it in such-and-such a way"). Not "oh, yeah, that's an issue but if you search the Intarwebs there are a few different things you can do that kinda sorta work but are a bit fragile and klunky".
The precise order the two issues (server side enforcement and client side assistance) are dealt with doesn't really matter because *both* issues need to be addressed before we migrate.
I'm not that happy with the server being the primary line of defense. Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever. Or I could attempt to recreate the clone by somehow "replaying" the commits with line endings corrected. Either way, the situation doesn't seem good.
win32text needs to be usable on non-Windows clients so that tarballs generated on a *nix machine get the line endings right in the Windows-only files.
I agree. It isn't fair to make this windows users problem. It would be like me proposing the repo get imported with \r\n line endings, enforce that with server side hooks, and let non-Windows users worry about the ramifications of that - somehow I doubt that would fly - so neither should it fly for Windows users... I'm more than willing to help on this; I haven't resurrected my stale patch because I find win32text only 1/2 a solution that doesn't work in practice. Therefore that patch is as stale for me as it is anyone. However, if a plan is put in place which offers a full solution and the hg developers are committed to it, I promise I'll put my hand up to help with implementation in a fairly timely manner... Cheers, Mark
Mark Hammond:
Thanks Nick; I didn't want to be the only one saying that. There is a fine line between asserting reasonable requirements for Windows users and being obstructionist and unhelpful, and I'm trying to stay on the former side :)
I haven't commented on this issue before because I can't really be helpful. I just don't understand why hg is being considered before it's Windows support is roughly equivalent to svn and cvs. There has been some similar experience with the main repository for the Cocoa port of Scintilla which is in bzr on launchpad. Several times in that repository, files were checked in with wrong line ends making every line appear changed when looking through history. There are several causes for this including user error but bzr (and hg) should default to more helpful behaviour on text files. Neil
I haven't commented on this issue before because I can't really be helpful. I just don't understand why hg is being considered before it's Windows support is roughly equivalent to svn and cvs.
Is it really that you don't *understand*? It's fairly easy: there was a PEP which offered a number of options, and there was BDFL pronouncement. This (BDFL pronouncement) is how Python has always worked, and, as a principle, it is a good and useful process. Now, the specific outcome of the process means that more work needs to be done. So we have a *second* PEP, and we have a lack of volunteers that help implementing it. The second PEP hasn't been approved yet (as it isn't complete, yet), so migration to hg is stalled. The primary volunteer (Dirkjan) has indicated that he can't help with that specific issue, so other volunteers need to step forward, or we cannot move to hg. Regards, Martin
On 5/08/2009 5:35 PM, "Martin v. Löwis" wrote:
Now, the specific outcome of the process means that more work needs to be done. So we have a *second* PEP, and we have a lack of volunteers that help implementing it. The second PEP hasn't been approved yet (as it isn't complete, yet), so migration to hg is stalled. The primary volunteer (Dirkjan) has indicated that he can't help with that specific issue, so other volunteers need to step forward, or we cannot move to hg.
I don't recall Dirkjan saying he can't help with that issue - was it a lack of time, or a lack of understanding the problem/lack of a Windows environment? The problem I see is a lack of agreement about exactly what the solution entails. I believe there is general agreement win32text needs to be enhanced to support versioned 'rules'. But even with that, the only option I see is a truly cross-platform extension to implement these rules which every Python committer, regardless of operating-system, is expected to use - but that doesn't seem the consensus. As mentioned, I'm willing to lend manpower for this once there is agreement on something workable... Cheers, Mark
Now, the specific outcome of the process means that more work needs to be done. So we have a *second* PEP, and we have a lack of volunteers that help implementing it. The second PEP hasn't been approved yet (as it isn't complete, yet), so migration to hg is stalled. The primary volunteer (Dirkjan) has indicated that he can't help with that specific issue, so other volunteers need to step forward, or we cannot move to hg.
I don't recall Dirkjan saying he can't help with that issue - was it a lack of time, or a lack of understanding the problem/lack of a Windows environment?
I think he said (at some point) that he is not a Windows user, and thus can't really help. Of course, he also indicated that, as a Mercurial contributor, he is willing to help as much as he can.
The problem I see is a lack of agreement about exactly what the solution entails. I believe there is general agreement win32text needs to be enhanced to support versioned 'rules'. But even with that, the only option I see is a truly cross-platform extension to implement these rules which every Python committer, regardless of operating-system, is expected to use - but that doesn't seem the consensus.
As mentioned, I'm willing to lend manpower for this once there is agreement on something workable...
I think it needs to work the other way 'round. Somebody (perhaps you) needs to propose a hook and configuration settings, and propose that this hook is used on every system, and that refusal to use these hooks could lead to changes not being integratable (is that a word?). There can't be consensus to use a solution that doesn't exist. My personal favorite outcome would be this: - most files have svn's "native" eol style; they get stored in LF in the repository; the hook will convert them on Windows, and check on Unix. - some files have "windows" eol style; they get stored in CRLF. The hook will not convert, but only check. - not sure whether some files need to be declared as "unix" eol style. - some files are "binary"; they get stored as-is - the hook will do nothing. With such a setup, using the hook would be truly optional on Unix, as it only ever checks and never converts. So if you manage to mess up, and don't have the hook installed on Unix, you lose when trying to push. That will teach you to be more careful in the future, or to install the hook (which hopefully becomes built into Mercurial at some point). Whether it is actually possible to implement all that, I don't know. Regards, Martin
2009/8/5 "Martin v. Löwis" <martin@v.loewis.de>:
My personal favorite outcome would be this: - most files have svn's "native" eol style; they get stored in LF in the repository; the hook will convert them on Windows, and check on Unix. - some files have "windows" eol style; they get stored in CRLF. The hook will not convert, but only check. - not sure whether some files need to be declared as "unix" eol style. - some files are "binary"; they get stored as-is - the hook will do nothing.
With such a setup, using the hook would be truly optional on Unix, as it only ever checks and never converts. So if you manage to mess up, and don't have the hook installed on Unix, you lose when trying to push. That will teach you to be more careful in the future, or to install the hook (which hopefully becomes built into Mercurial at some point).
Given that my preference is to use Unix-style EOL for "text" files on Windows, as every text editor I use (barring notepad!) understands LF format, it seems to me that this proposal also means that the hook would be optional for me. That suits me fine - I'd prefer to avoid having hooks that are required for Python checkouts, as that means I have to remember to configure them on each clone (IIUC). Of course, this implies that your proposal only requires any action by the user in the case of Windows users whose text editing tools insist on CRLF format text files (sources, etc). Is that really a large group of developers? (I honestly don't know). I suspect that there is something missing from your proposal, as if this were the case, then the problem appears to be limited to a very small group of developers. Maybe it's Visual Studio that insists on CRLF for source files? (I don't know, as I don't use the VS editor). If that's the case, then maybe a VS hook would be an alternative approach? (I can't imagine such a hook would be an *easier* approach, I only mention it because it makes it clearer where the issue lies). Paul.
On Wed, Aug 5, 2009 at 12:04, Paul Moore<p.f.moore@gmail.com> wrote:
Given that my preference is to use Unix-style EOL for "text" files on Windows, as every text editor I use (barring notepad!) understands LF format, it seems to me that this proposal also means that the hook would be optional for me. That suits me fine - I'd prefer to avoid having hooks that are required for Python checkouts, as that means I have to remember to configure them on each clone (IIUC).
Yeah, this may also be what's making it harder for me to understand the issues. I am actually a Windows user, although I do most of my development on Linux servers through PuTTY. I just always make sure I use editors that respect the file's line endings, and so for those things where I've used hg to version code on Windows (for example, when testing a Firefox extension) and when my colleague who does edit his code inside Windows, I've just used editors that deal with line endings. Typically, in my case, that was either Notepad2 (an awesomely light-weight Notepad replacement) or Komodo (Edit). That solved all of my issues, so I haven't had a need for win32text so far. Cheers, Dirkjan
On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote:
endings. Typically, in my case, that was either Notepad2 (an awesomely light-weight Notepad replacement) or Komodo (Edit). That solved all of my issues, so I haven't had a need for win32text so far.
FWIW, I use komodo and scite as my primary editors, and as mentioned, am personally responsible for accidentally checking in \r\n files into what should be a \n repo. I am slowly and painfully learning to be more careful - IMO, I shouldn't need to... Cheers, Mark
Mark Hammond wrote:
On 5/08/2009 8:14 PM, Dirkjan Ochtman wrote:
endings. Typically, in my case, that was either Notepad2 (an awesomely light-weight Notepad replacement) or Komodo (Edit). That solved all of my issues, so I haven't had a need for win32text so far.
FWIW, I use komodo and scite as my primary editors, and as mentioned, am personally responsible for accidentally checking in \r\n files into what should be a \n repo. I am slowly and painfully learning to be more careful - IMO, I shouldn't need to...
Cheers,
Mark
IIRC one of the main problems in Copy & Paste. I believe both Scite and Visual Studio have had issues where they "preserve" the line endings of files, but if you paste from another source, it will continue to "preserve" the line endings of the pasted content. That said, you also have the "create a new file defaults to CRLF" that has similar problems. John =:->
Given that my preference is to use Unix-style EOL for "text" files on Windows, as every text editor I use (barring notepad!) understands LF format, it seems to me that this proposal also means that the hook would be optional for me. That suits me fine - I'd prefer to avoid having hooks that are required for Python checkouts, as that means I have to remember to configure them on each clone (IIUC).
Yeah, this may also be what's making it harder for me to understand the issues.
Please trust that there are plenty of editors that get the line ending implementation wrong. I'm fairly certain that some Visual Studio versions are among them. They will recognize LF as a line ending, but add CRLF line breaks when the user presses enter. In addition, some editors (in particular notepad) choke when confronted with LF-only files. It is very annoying if you have to look at source code at somebody else's machine which doesn't have any programmer editor installed (except for Visual Studio). Regards, Martin
On 5/08/2009 8:04 PM, Paul Moore wrote:
2009/8/5 "Martin v. Löwis"<martin@v.loewis.de>:
With such a setup, using the hook would be truly optional on Unix, as it only ever checks and never converts. So if you manage to mess up, and don't have the hook installed on Unix, you lose when trying to push. That will teach you to be more careful in the future, or to install the hook (which hopefully becomes built into Mercurial at some point).
Given that my preference is to use Unix-style EOL for "text" files on Windows, as every text editor I use (barring notepad!) understands LF format,
Most tools that I use will tend to not mix EOL styles in a single file, but will tend to create \r\n line endings for new files I create. Most hg repos I come across don't have mixed line endings within individual files, so I can only guess these files were accidentally introduced in the same way (and indeed I have personally done this.) I'm hoping to be part of the solution instead of part of the problem :)
it seems to me that this proposal also means that the hook would be optional for me.
Technically it would be optional for everyone, of course. However, the solution should be such that everyone, regardless of personal preference, is willing to take the hit. For example, if the repo is converted using \r\n line endings natively, then Windows users would need to take no action either and puts the onus back on you (given your stated preferences) to configure the tool appropriately. I assume you would have no objection to that and would be happy to make that tool optional for me? That suits me fine - I'd prefer to avoid
having hooks that are required for Python checkouts, as that means I have to remember to configure them on each clone (IIUC).
Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo.
Of course, this implies that your proposal only requires any action by the user in the case of Windows users whose text editing tools insist on CRLF format text files (sources, etc). Is that really a large group of developers? (I honestly don't know).
It applies to all files that aren't "native" EOL style - there are just less of them regularly modified than those that are so marked.
I suspect that there is something missing from your proposal, as if this were the case, then the problem appears to be limited to a very small group of developers. Maybe it's Visual Studio that insists on CRLF for source files? (I don't know, as I don't use the VS editor). If that's the case, then maybe a VS hook would be an alternative approach? (I can't imagine such a hook would be an *easier* approach, I only mention it because it makes it clearer where the issue lies).
I must concede that Windows developers are the minority here - but assuming we want a level playing field, I don't see how that changes the underlying issue... Cheers, Mark
On Wed, Aug 5, 2009 at 13:19, Mark Hammond<mhammond@skippinet.com.au> wrote:
Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo.
Even if we do that, enabling hg extensions will still need to be done locally -- although it can be done per-user/box instead of per-clone. Cheers, Dirkjan
On 5/08/2009 9:28 PM, Dirkjan Ochtman wrote:
On Wed, Aug 5, 2009 at 13:19, Mark Hammond<mhammond@skippinet.com.au> wrote:
Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo.
Even if we do that, enabling hg extensions will still need to be done locally -- although it can be done per-user/box instead of per-clone.
That is completely fine, and not unlike SVN where a per-user/box setting generally needs to be set once - but after that everything "just works". Windows developers don't mind taking a hit once ;) The dev guide can make it clear what the expectations are... Cheers, Mark
Hi, The expy project provides an express way to extend Python. After some careful considerations, I came up with some reasons for expy (this is not an exhaustive list): (I). WYSIWYG. The expy project enables you to write your module in Python the way your extension would be (WYSIWYG), and meanwhile write your implementation in pure C. You specify your modules, functions, methods, classes, and even their documentations the usual way of writing your Python correspondences. Then your provide your implementation to the functions/methods by returning a multi-line string. By such an arrangement, everything falls in its right place, and your extension code becomes easy to read and maintain. Also, the generated code is very human-friendly. (II). You only provide minimal information to indicate your intension of how your module/class would function in Python. So your extension is largely independent from the Python extension API. As your interaction with the Python extension API is reduced to minimal (you only care about the functionality and logic), it is then possible that your module written in expy can be independent of changes in the extension API. (III). The building and setup of your project can be automatically done with the distutil tool. In the tutorial, there are ample examples on how easily this is achieved. (IV). Very light weight. The expy tool is surprisingly light weight dispite of its powerful ability, as it is written in pure Python. There is no parser or compiler for code generation, but rather the powerful reflexion mechanism of Python is exploited in a clever way to generate human-friendly codes. Currently, generating code in C is supported, however, the implementation is well modularized and code generation in other languages such as Java and C++ should be easy. While there are already a couple of other projects trying to simply this task with different strategies, such as Cython, Pyrex and modulator, this project is unique and charming in its own way. All you need is the WYSIWYG Python file for your module extension, then expy takes care of everything else. What follows in this documentation is on how to extend Python in C using expy-cxpy: the module expy helps define your module, while module cxpy helps generate C codes for your defined module. For more information about expy, please visit its homepage at: http://expy.sf.net/ Cheers, Yingjie
Yingjie Lan wrote:
Hi,
The expy project provides an express way to extend Python. After some careful considerations, I came up with some reasons for expy (this is not an exhaustive list):
This kind of advocacy for external projects belongs on python-list, not python-dev (or, if you're proposing something for use in the standard library, on python-ideas). Cheers, Nick. P.S. The message to capi-sig was probably on topic - certainly closer to being so than the inclusion of python-dev. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
From: Nick Coghlan <ncoghlan@gmail.com> Subject: Re: [Python-Dev] Reasons for using expy To: "Yingjie Lan" <lanyjie@yahoo.com> Cc: python-dev@python.org Date: Thursday, August 6, 2009, 1:44 AM This kind of advocacy for external projects belongs on python-list, not python-dev (or, if you're proposing something for use in the standard library, on python-ideas).
Thanks Nick. Cheers, Yingjie
On approximately 8/5/2009 4:28 AM, came the following characters from the keyboard of Dirkjan Ochtman:
On Wed, Aug 5, 2009 at 13:19, Mark Hammond<mhammond@skippinet.com.au> wrote:
Configuring on each clone would certainly be sub-optimal, so the proposal is this configuration be stored in a versioned file in the repo.
Even if we do that, enabling hg extensions will still need to be done locally -- although it can be done per-user/box instead of per-clone.
On approximately 8/5/2009 9:24 AM, came the following characters from the keyboard of Paul Moore:
2) This behaviour is something needed for Python only. I've no issue with enabling win32text globally, but I'd want to be clear that it is a no-op unless specifically requested (ie, something like **=cleverencode is *not* used in the absence of an explicit set of rules). That may well be the case, but I had the impression that win32text tried to be "automatic", so I'd like to verify it.
Depending on [Windows] users to configure their installation of Mercurial to work with the Python repository is lame; it will lead to new Windows contributors getting beat-up at check-in time, and make them less likely to want to contribute even the work they have already done (with wrong EOL), and much less to want to start future contributions, because some Unix Python hacker will be nasty about "Didn't you RTFM?" (Maybe not at first, but eventually). If the configuration settings have to be different per project for Windows developers using Mercurial for multiple projects, then that is also lame... Windows developers would have to keep changing their configurations, or (implied in above discussion) remember to recreate settings for each new clone or branch or whatever of the Python project. This is also error-prone, and leads to the above problem a different way. I have read this whole discussion, but want to step back and look at it from a theoretical viewpoint. A good solution would have the following characteristics: INSTALLATION) The developer should install the [D]VCS (for this discussion, Mercurial, present or future version), and attempt to access a repository (for this discussion, the Python repository, converted and configured for the chosen [D]VCS). The resultant environment should automatically be configured to work properly. If any [D]VCS extensions are required for the project, they should be automatically installed and configured, or the user given explicit instructions on how to do so, as a one-time installation step, that adversely affects no other projects for which the [D]VCS is used by that or other users of the present installation.. See below for what properly means. EOL CONFIGURATION) Each file, when added to the repository, should have a repository setting that indicates what the appropriate EOL type is for that file. The values I have heard are \n only, \r\n, platform-native, and binary. I haven't heard \r only in this discussion, but have heard it in other similar discussions, and it may be a useful setting for Mercurial to have, if the feature must be newly implemented there. I believe there are also systems that use RS to separate lines, and perhaps other things (and are there new Unicode control characters that could be used for line endings?), so it might be good to leave a few unassigned values in such a setting. I don't think any setting should be created to allow mixed line ending usage within a file, except binary. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file. ENCODING CONFIGURATION) Each file, when created, should have a repository settings that declares its character repertoire and encoding, and if it is a Unicode UTF encoding, whether or not it should have a leading BOM. In my opinion, all source code files should use a Unicode encoding, the exception being for test files that help test encoding support in internationalized environments. But the feature supports other people's opinions too. Per repository default for this setting should be available to avoid burdening the user when creating the typical type of file. CHECKOUT) Check-outs should be sensitive to the user's local environment (platform and locale settings), and non-binary files should be converted from the repository format to the local encoding and platform-specific line endings. Settings to override the line endings should be optionally available for users whose tools understand other line endings, and prefer them over the native line endings. If the characters used within a file cannot be converted losslessly to the encoding specified by the locale settings, then it should not be able to be checked out. A special override might be useful for using a lossy transformation for a read-only view of the file, at user request. CHECKIN) Check-ins, even local check-ins to local clones or branches, should automatically convert encodings and line endings from the platform and locale setting to the encoding and line ending specified by the repository for that file. If the characters in the modified file cannot be transformed losslessly to the repository repertoire and encoding, the check-in should be prevented. The CHECKIN should be a requirement of a useful [D]VCS, regardless of if any other capabilities are present. Even if none of the existing tools can reach the above flexibility, the problems that results from using tools that do not have such flexibility should be understood in terms of their specific deficiencies compared to the theoretical model. I can think of only one other solution that properly handles the problems (which is punting, really): to require the development environment to support the repertoire, encoding, and line endings of the repository. Doing this in a cross-platform manner is hard, because the tool sets (editors, compilers, databases, etc.) tend to support the platform-native convention better than the non-native conventions. It sounds like Mercurial's win32text extension is one form of this sort of requirement. CHECKIN should be a requirement even in this case, to validate the incoming data file. Basic software design requires validation of incoming data. I have no clue how many of these characteristics are implemented by Mercurial (or any other VCS or DVCS, I've been 7 years away from using SCCS, CVS, and Clearcase, but none of them had such features then, and I've not used the modern crop of VCSes much: git, svn, hg, bazaar, except a little in passing, but haven't read any documentation, nor attempted to set up a project myself in any of them). If none of the existing tools can reach the above flexibility, then there will be problems that result, and understanding what the problems are, and coming up with documented workarounds, processes, and auxiliary tools on each platform/envirenment to cure or prevent them, would seem to be necessary to support the use of such tools. Since Mercurial is the presently chosen DVCS for Python to migrate to, I'd be delighted to learn how close it comes to the theoretical model, and I'm sure someone out there knows. When I have some time, I'll attempt to figure that out by reading the Mercurial documentation... I have a personal (Python, cross-platform) project that is in need of a DVCS soon, and so I'm watching this discussion with much interest, to know whether I should also choose Mercurial, or should choose something that is closer to the theoretical solution outlined above (if there is something that is, or appears to be more likely to reach it sooner). -- Glenn -- http://nevcal.com/ =========================== A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
Glenn Linderman:
and perhaps other things (and are there new Unicode control characters that could be used for line endings?),
Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 but they are rarely supported and very rarely used. They are a pain to work with since they are 3 byte sequences in UTF-8. Visual Studio does support them. Python does not currently support these line separators such as in this example which only reads 2 lines rather than 3: with open("x.txt", "wb") as f: f.write("a\nb\u2029c\n".encode('utf-8')) with open("x.txt", "r") as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1 Neil
Neil Hodgson wrote:
Glenn Linderman:
and perhaps other things (and are there new Unicode control characters that could be used for line endings?),
Unicode includes Line Separator U+2028 and Paragraph Separator U+2029 but they are rarely supported and very rarely used. They are a pain to work with since they are 3 byte sequences in UTF-8. Visual Studio does support them.
Python does not currently support these line separators such as in this example which only reads 2 lines rather than 3:
with open("x.txt", "wb") as f: f.write("a\nb\u2029c\n".encode('utf-8')) with open("x.txt", "r") as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg <mal <at> egenix.com> writes:
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters.
Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines(). Regards Antoine.
Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters.
Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines().
The fairly long-standing RFE relating to an arbitrarily selectable newline separator seems relevant here: http://bugs.python.org/issue1152248 As with the discussion there, the problem with using str.splitlines is that it prevents pipelining approaches that avoid reading a whole file into memory. While removing the validity check from readlines() completely is questionable (the readrecords() approach mentioned in the tracker issue would still be better there), loosening the validity check to be based on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it a feature requests rather than a bug though). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters.
Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines().
The fairly long-standing RFE relating to an arbitrarily selectable newline separator seems relevant here: http://bugs.python.org/issue1152248
As with the discussion there, the problem with using str.splitlines is that it prevents pipelining approaches that avoid reading a whole file into memory.
While removing the validity check from readlines() completely is questionable (the readrecords() approach mentioned in the tracker issue would still be better there), loosening the validity check to be based on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it a feature requests rather than a bug though).
I've had a look at the io implementation: this appears to be based on the universal newline support idea which addresses only a fixed set of "new line" character combinations and is not as straight forward to extend to support all Unicode line break characters as I thought. What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg wrote:
Nick Coghlan wrote:
Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
Please file a bug report for this. f.readlines() (or rather the io layer) should be using Py_UNICODE_ISLINEBREAK(ch) for detecting line break characters.
Actually, no. It has been designed from the start to only recognize the "standard" line break representations found in common formats/protocols (CR, LF and CR+LF). People wanting to split on arbitrary unicode line breaks should use str.splitlines().
The fairly long-standing RFE relating to an arbitrarily selectable newline separator seems relevant here: http://bugs.python.org/issue1152248
As with the discussion there, the problem with using str.splitlines is that it prevents pipelining approaches that avoid reading a whole file into memory.
While removing the validity check from readlines() completely is questionable (the readrecords() approach mentioned in the tracker issue would still be better there), loosening the validity check to be based on Py_UNICODE_IS_LINEBREAK seems a bit more feasible. (I'd still call it a feature requests rather than a bug though).
I've had a look at the io implementation: this appears to be based on the universal newline support idea which addresses only a fixed set of "new line" character combinations and is not as straight forward to extend to support all Unicode line break characters as I thought.
What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination).
... and because of this, the feature is already available if you use codecs.open() instead of the built-in open(): import codecs with codecs.open("x.txt", "w", encoding='utf-8') as f: f.write("a\nb\u2029c\n") with codecs.open("x.txt", "r", encoding='utf-8') as f: n = 1 for l in f.readlines(): print(n, repr(l)) n += 1 This prints: 1 'a\n' 2 'b\u2029' 3 'c\n' -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
Neil Hodgson wrote:
M.-A. Lemburg:
... and because of this, the feature is already available if you use codecs.open() instead of the built-in open():
So should I not add an issue for the basic open because codecs.open should be used for this case?
Like Antoine mentioned: Using codecs.open() and .readline() is about 20-30 times slower than open(). This is mainly due to the fact that the codec's .readline() method is implemented in pure Python and does its own buffering. IMHO, it would be a lot better to add full Unicode support for line breaks to the io layer. Given that the code for the complicated handling of the CRLF combination is already there, it's not difficult to add support for the remaing line break characters. The implementation could reuse the Bloom filter approach used in unicodeobject.c to make this very fast. BTW: I'm not sure why the io layer records the line endings it has seen. This makes processing more complicated for no apparent reason. In the few cases where you might need this (I don't see any), you could just as well scan the lines in a quick loop using Python. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 07 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg <mal <at> egenix.com> writes:
IMHO, it would be a lot better to add full Unicode support for line breaks to the io layer. Given that the code for the complicated handling of the CRLF combination is already there, it's not difficult to add support for the remaing line break characters.
I'm not against anything in principle here, but I'd just like to point out two things: 1. Changing line break semantics would break compatibility with the current behaviour, and it would also diverge from what the `newline` parameter specifies; this may be annoying if, for example, the TextIOWrapper class is used to parse some network protocols with a rigorous line ending definition 2. It would be useful to have some input by the original designers of the IO library (the PEP lists Guido, Daniel Stutzbach and Mike Verdone, but I suppose other people were involved) Regards Antoine.
Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
IMHO, it would be a lot better to add full Unicode support for line breaks to the io layer. Given that the code for the complicated handling of the CRLF combination is already there, it's not difficult to add support for the remaining line break characters.
I'm not against anything in principle here, but I'd just like to point out two things:
1. Changing line break semantics would break compatibility with the current behaviour, and it would also diverge from what the `newline` parameter specifies; this may be annoying if, for example, the TextIOWrapper class is used to parse some network protocols with a rigorous line ending definition
Sure, but that would still be possible using the newline parameter. We'd only have to find a way to tell the io layer "accept all Unicode line break characters".
2. It would be useful to have some input by the original designers of the IO library (the PEP lists Guido, Daniel Stutzbach and Mike Verdone, but I suppose other people were involved)
Fair enough. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 07 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg <mal <at> egenix.com> writes:
What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination).
As for the original Python implementation, the goal was probably to start from a clean sheet. Besides, the new API has seek() and tell() as well. But I'm not really qualified to say more -- I didn't participate in its design. As for the C implementation, it had to be written from scratch anyway -- codecs.open() is pure Python and too slow. Deferring to str.splitlines() would still have been possible but a bit wasteful since in C you can use buffers directly. (and, besides, when writing the C implementation we were concerned with exact compatibility with the Python version -- including line break semantics) Regards Antoine.
Antoine Pitrou wrote:
M.-A. Lemburg <mal <at> egenix.com> writes:
What I don't understand is why the io layer tries to reinvent the wheel here instead of just using the codec's .readline() method - which *does* use .splitlines() and has full support for all Unicode line break characters (including the CRLF combination).
As for the original Python implementation, the goal was probably to start from a clean sheet. Besides, the new API has seek() and tell() as well. But I'm not really qualified to say more -- I didn't participate in its design.
As for the C implementation, it had to be written from scratch anyway -- codecs.open() is pure Python and too slow. Deferring to str.splitlines() would still have been possible but a bit wasteful since in C you can use buffers directly.
Sure, but the code for line splitting is not really all that complicated (see PyUnicode_Splitlines()), so could easily be adapted to work on buffers directly. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 06 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
M.-A. Lemburg <mal <at> egenix.com> writes:
Sure, but the code for line splitting is not really all that complicated (see PyUnicode_Splitlines()), so could easily be adapted to work on buffers directly.
Certainly indeed. It all comes down to compatibility with the original implementation. (PEP 3116 itself is vague on the subject, but it didn't come to me to question the validity of the Python implementation, I admit) Regards Antoine.
2009/8/5 Mark Hammond <mhammond@skippinet.com.au>:
Most tools that I use will tend to not mix EOL styles in a single file, but will tend to create \r\n line endings for new files I create. Most hg repos I come across don't have mixed line endings within individual files, so I can only guess these files were accidentally introduced in the same way (and indeed I have personally done this.) I'm hoping to be part of the solution instead of part of the problem :)
Interesting. I don't recall *ever* having generated CRLF line endings in a LF-delimited file (I use Vim) although I may have created CRLF in new files (and then not noticed, as Vim handles it transparently enough that I missed it). There are no significant projects where I'm a committer, though, so I interact via patches, which means I don't get the opportunity to break the repository :-)
Technically it would be optional for everyone, of course. However, the solution should be such that everyone, regardless of personal preference, is willing to take the hit.
For example, if the repo is converted using \r\n line endings natively, then Windows users would need to take no action either and puts the onus back on you (given your stated preferences) to configure the tool appropriately. I assume you would have no objection to that and would be happy to make that tool optional for me?
Absolutely. My issue is with 2 points: 1) I'm an infrequent contributor, so I don't keep a checkout around. I make a new clone "on demand", so I would be likely to forget to enable the hook on at least a proportion of my clones. The versioned .hgeols proposal seems to cover this. 2) This behaviour is something needed for Python only. I've no issue with enabling win32text globally, but I'd want to be clear that it is a no-op unless specifically requested (ie, something like **=cleverencode is *not* used in the absence of an explicit set of rules). That may well be the case, but I had the impression that win32text tried to be "automatic", so I'd like to verify it.
I must concede that Windows developers are the minority here - but assuming we want a level playing field, I don't see how that changes the underlying issue...
Again, agreed entirely. As a Windows developer who doesn't (knowingly) encounter the issue, I'm not in a good position to help, but I'm happy to contribute comments and test things. I'll be offline for a couple of weeks, though, so you may well have solved it before I can do anything :-) Paul
Martin v. Löwis:
Is it really that you don't *understand*? It's fairly easy: there was a PEP ...
The PEP process is straightforward. However, a PEP may produce an outcome that proves after more experience to be wrong. ISTM a prerequisite to choosing a DVCS is that it should support the full range of development platforms and thus the PEP was accepted prematurely. At some point the PEP should be reexamined and, if necessary, rescinded. What I don't understand is why the plan is still to move to hg despite, after several months, there not being a known good way to include Windows eol support. Neil
The PEP process is straightforward. However, a PEP may produce an outcome that proves after more experience to be wrong. ISTM a prerequisite to choosing a DVCS is that it should support the full range of development platforms and thus the PEP was accepted prematurely.
To be as blunt as possible: the PEP was accepted because Guido really, Really, REALLY wanted to switch to Mercurial. So you would have to convince Guido to revert his decision. You may not like the decision (I did not like using a DVCS in the first place), but following such decisions has served us well, and will serve us well this time.
At some point the PEP should be reexamined and, if necessary, rescinded. What I don't understand is why the plan is still to move to hg despite, after several months, there not being a known good way to include Windows eol support.
You don't understand why it takes many months? That's also easy: because there is a single volunteer, and because there is a lot of work. I think it took me a year to migrate to subversion back then, and I wouldn't be surprised if the Mercurial migration takes even longer. Or don't you understand why that single unresolved item didn't manage to revert the decision? Well, there are many unresolved items in the Mercurial conversion, some much more stressful than the eol issue (e.g. the branching discussion). None of them is unsolvable (AFAICT); you can either contribute to the solution, and sit back and wait for solutions to emerge. Then you can vote on PEP 385 up or down still. Regards, Martin
Martin v. Löwis:
Or don't you understand why that single unresolved item didn't manage to revert the decision? Well, there are many unresolved items in the Mercurial conversion, some much more stressful than the eol issue (e.g. the branching discussion).
Then these issues should have been included in the initial PEP for choosing a DVCS since the issues could have driven the choice. PEP 374 implies that win32text effectively solves the Windows eol issue which no longer appears to be correct. Neil
Neil Hodgson schrieb:
Martin v. Löwis:
Or don't you understand why that single unresolved item didn't manage to revert the decision? Well, there are many unresolved items in the Mercurial conversion, some much more stressful than the eol issue (e.g. the branching discussion).
Then these issues should have been included in the initial PEP for choosing a DVCS since the issues could have driven the choice. PEP 374 implies that win32text effectively solves the Windows eol issue which no longer appears to be correct.
Apparently, it was the author's understanding at that time that win32text would be sufficient. Also, PEP 374 has not been written in isolation; at any time during the process people could have notified Dirkjan that this is not the case. The branching issue *has* been included in PEP 374; it is not a blocker for migration, but rather a decision has to be made between two similar, but in other ways quite different styles for converting SVN branches. I'm not aware of any other unresolved items; they may exist, but the fact that they're not discussed on this list in detail means that they are largely unimportant. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
I'm not aware of any other unresolved items; they may exist, but the fact that they're not discussed on this list in detail means that they are largely unimportant.
There is a long list of things that still need to be done; each one potentially creating new problems. In particular: - the .hgeols plugin needs to be written - the hooks need to be written, or at least deployed, for code style checks, for email notification, and for buildbot triggering - the build identification patch needs to be written (I do expect many problems out of that one, some possibly small - I'm not a Mercurial user, so I can't estimate how difficult that will be) - buildbot configuration needs to be adjusted - the roundup regex needs to be configured to refer to hgweb links - access control needs to be setup - stackless needs to be converted - a decision on the location of the PEPs must be made and implemented - developer documentation needs to be written - a decision must be made what to do with the migrated parts of subversion, in the subversion repository I may have missed some things. I would like to see test period (say, two weeks) were we can find further issues. Regards, Martin
Martin v. Löwis schrieb:
I'm not aware of any other unresolved items; they may exist, but the fact that they're not discussed on this list in detail means that they are largely unimportant.
There is a long list of things that still need to be done; each one potentially creating new problems. In particular: - the .hgeols plugin needs to be written - the hooks need to be written, or at least deployed, for code style checks, for email notification, and for buildbot triggering - the build identification patch needs to be written (I do expect many problems out of that one, some possibly small - I'm not a Mercurial user, so I can't estimate how difficult that will be) - buildbot configuration needs to be adjusted - the roundup regex needs to be configured to refer to hgweb links - access control needs to be setup - stackless needs to be converted - a decision on the location of the PEPs must be made and implemented - developer documentation needs to be written - a decision must be made what to do with the migrated parts of subversion, in the subversion repository
I may have missed some things. I would like to see test period (say, two weeks) were we can find further issues.
Sure there are many things to do; I was speaking of issues where the way to go is not decided, and needs to be before the switch can happen. Maybe build identification is one of them; but I think everything has been said in the one thread we had about this. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
Mark Hammond <mhammond@skippinet.com.au> writes:
Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever.
What is wrong with that? I mean, if that is the actual sequence of events, why should the history not reflect that?
Either way, the situation doesn't seem good.
I see this assertion made often, so I'm not saying you are necessarily wrong to make it. I just don't see a justification for making it (and, without justification, I would say it *is* wrong to make it). -- \ “Our products just aren't engineered for security.” —Brian | `\ Valentine, senior vice-president of Microsoft Windows | _o__) development | Ben Finney
On 5/08/2009 3:56 PM, Ben Finney wrote:
Mark Hammond<mhammond@skippinet.com.au> writes:
Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever.
What is wrong with that? I mean, if that is the actual sequence of events, why should the history not reflect that?
The problem is the sequence of events happened in the first place. An extra burden is placed on the developer that will quickly get tiresome. I wouldn't personally be happy if that workflow became the norm.
Either way, the situation doesn't seem good.
I see this assertion made often, so I'm not saying you are necessarily wrong to make it. I just don't see a justification for making it (and, without justification, I would say it *is* wrong to make it).
*shrug* - in my opinion, the fact the developer is faced with that hurdle in their workflow is justification enough to say that developer's situation "doesn't seem good" and should have been prevented from happening by the tool much earlier than proposed. Mark
Mark Hammond <skippy.hammond@gmail.com> writes:
On 5/08/2009 3:56 PM, Ben Finney wrote:
Mark Hammond<mhammond@skippinet.com.au> writes:
Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. […]
The problem is the sequence of events happened in the first place. An extra burden is placed on the developer that will quickly get tiresome. I wouldn't personally be happy if that workflow became the norm.
Ah, okay. In that case, the ultimate “problem” is that OS vendors entrenched their incompatible line-ending conventions instead of choosing a single standard. Any line-ending burden borne by developers is a result of that. If things were different, they'd be different. However, we live with the legacy of that stupid set of decisions and have no real option to resolve it permanently short of deprecating entire vistas of tools (or even entire operating systems).
*shrug* - in my opinion, the fact the developer is faced with that hurdle in their workflow is justification enough to say that developer's situation "doesn't seem good" and should have been prevented from happening by the tool much earlier than proposed.
AIUI, this is a combination of several things: * different OSen have incompatible, entrenched conventions for line-ending that is embodied in the default output of their text processing tools. * these differences matter in many concrete ways to the tools that process text, so the differences need to be preserved, or explicitly transformed. * distributed VCS has the job of preserving data as present on the filesystem, including whatever line-ending convention is present in a file. * distributed VCS has the job of managing data exchange between users, presenting differences in a way that allows easy inspection and merging. * humans want to pretend that these incompatibilities don't exist, and want “end of line” to be an automatically-handled abstraction. It's not a simple thing to solve, and many clever people have tried over the decades. The fact that a centralised VCS can put the problem aside by requiring an explicit, single decision in the repository, is no help when addressing the constraints of a distributed VCS. At some point, the decision about how to handle line endings in cross-platform data needs to be punted to a human for a context-sensitive assessment, since (as can be seen) the above list of requirements is internally inconsistent and can't be relegated to a one-size-fits-all algorithm. -- \ “All progress has resulted from people who took unpopular | `\ positions.” —Adlai Stevenson | _o__) | Ben Finney
On 5/08/2009 4:50 PM, Ben Finney wrote:
Mark Hammond<skippy.hammond@gmail.com> writes:
On 5/08/2009 3:56 PM, Ben Finney wrote:
Mark Hammond<mhammond@skippinet.com.au> writes:
Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. […]
The problem is the sequence of events happened in the first place. An extra burden is placed on the developer that will quickly get tiresome. I wouldn't personally be happy if that workflow became the norm.
Ah, okay. In that case, the ultimate “problem” is that OS vendors entrenched their incompatible line-ending conventions instead of choosing a single standard. Any line-ending burden borne by developers is a result of that.
Yeah - this happened around 1964 if wikipedia is any guide.
If things were different, they'd be different. However, we live with the legacy of that stupid set of decisions and have no real option to resolve it permanently short of deprecating entire vistas of tools (or even entire operating systems).
Agreed - so let's not solve it permanently. ...
It's not a simple thing to solve, and many clever people have tried over the decades.
As already mentioned in this thread, a capability similar to what svn or cvs offers would be sufficient. While a DVCS does offer unique challenges, it seems to me that doing something at commit time without requiring magic hooks be configured would go a long way to addressing the problem. Magic hooks on the official repo would then be considered the final fallback defense, but should rarely be invoked.
At some point, the decision about how to handle line endings in cross-platform data needs to be punted to a human for a context-sensitive assessment, since (as can be seen) the above list of requirements is internally inconsistent and can't be relegated to a one-size-fits-all algorithm.
I'm not sure what point you are trying to make, but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do. Cheers, Mark.
Mark Hammond <skippy.hammond@gmail.com> writes:
As already mentioned in this thread, a capability similar to what svn or cvs offers would be sufficient.
That capability presented by centralised VCSen is entirely dependent on the fact that they *are* centralised. Using a distributed VCS means the same capability doesn't apply.
While a DVCS does offer unique challenges, it seems to me that doing something at commit time without requiring magic hooks be configured would go a long way to addressing the problem.
The hand-waving “doing something” is exactly what needs to be solved.
Magic hooks on the official repo would then be considered the final fallback defense, but should rarely be invoked.
Right, so that's “capability similar to centralised VCS” out of consideration; I'm glad we agree in the end.
I'm not sure what point you are trying to make
That I disagree with your position. You seem to think that the problem has an obvious solution, which is not true; and that choice of a distributed VCS should be delayed until the problem is solved, which I don't agree with.
but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do.
Your guess is incorrect; I've been bitten time and again by this problem in many different contexts, enough to know that it's not obvious what the “right” solution is. -- \ “Not to perambulate the corridors in the hours of repose in the | `\ boots of ascension.” —ski hotel, Austria | _o__) | Ben Finney
On 5/08/2009 6:00 PM, Ben Finney wrote:
Mark Hammond<skippy.hammond@gmail.com> writes:
As already mentioned in this thread, a capability similar to what svn or cvs offers would be sufficient.
That capability presented by centralised VCSen is entirely dependent on the fact that they *are* centralised. Using a distributed VCS means the same capability doesn't apply.
Why do you say that (without justification I might add <wink>) about this issue?
While a DVCS does offer unique challenges, it seems to me that doing something at commit time without requiring magic hooks be configured would go a long way to addressing the problem.
The hand-waving “doing something” is exactly what needs to be solved.
I think you have been mis-reading this thread. It is quite clear what 'doing something' means in this context - it means implement the human-defined rules for the line-ending policy for the repository.
Magic hooks on the official repo would then be considered the final fallback defense, but should rarely be invoked.
Right, so that's “capability similar to centralised VCS” out of consideration; I'm glad we agree in the end.
I'm afraid you have lost me again, as clearly we don't agree on what useful things can be done at local commit time.
I'm not sure what point you are trying to make
That I disagree with your position. You seem to think that the problem has an obvious solution, which is not true; and that choice of a distributed VCS should be delayed until the problem is solved, which I don't agree with.
Fair enough - but it seems clear to enough of us that we can make progress and meet the requirements of the people actually impacted.
but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do.
Your guess is incorrect; I've been bitten time and again by this problem in many different contexts, enough to know that it's not obvious what the “right” solution is.
Sorry about that - but that was the only way I could explain you not seeing how such a solution can work. Cheers, Mark
"Martin v. Löwis" <martin@v.loewis.de> writes:
You seem to think that the problem has an obvious solution, which is not true;
But is *has* an obvious solution. See the implementation from Dj Gilcrease, or the spec that I just posted.
Two different solutions are both obvious? There are other solutions proposed elsewhere too; are they also obvious? Mark Hammond <skippy.hammond@gmail.com> writes:
I think you have been mis-reading this thread.
Quite possibly; I'm not intending to impose my position on anyone. I'll go back to lurking on the thread for a while and see if it becomes any clearer. -- \ “First things first, but not necessarily in that order.” —The | `\ Doctor, _Doctor Who_ | _o__) | Ben Finney
As already mentioned in this thread, a capability similar to what svn or cvs offers would be sufficient.
That capability presented by centralised VCSen is entirely dependent on the fact that they *are* centralised. Using a distributed VCS means the same capability doesn't apply.
Why do you say that? People have demonstrated the contrary already.
I'm not sure what point you are trying to make
That I disagree with your position. You seem to think that the problem has an obvious solution, which is not true; and that choice of a distributed VCS should be delayed until the problem is solved, which I don't agree with.
But is *has* an obvious solution. See the implementation from Dj Gilcrease, or the spec that I just posted.
Your guess is incorrect; I've been bitten time and again by this problem in many different contexts, enough to know that it's not obvious what the “right” solution is.
The configuration options of svn have served us well enough. Regards, Martin
Mark Hammond writes:
I'm not sure what point you are trying to make, but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do.
Mercurial is not only open source, it's written in Python. The problem is known to be hard in a practical sense, the existing solutions (written by non-Windows developers, of course) are judged to be insufficient by Windows users, and the non-Windows developers "probably don't see this is clearly as Windows users do." I think the implication is obvious. There will be no good solution until Windows users develop it. I don't see a good reason to wait for that. I do see good reason for non-Windows users to put up with some inconvenience during the "beta" phase of implementing that solution; it's important enough to be fast-tracked, and doesn't need to be perfect for everybody to be tried (though it should not be allowed to endanger repo content, which seems unlikely but needs care since it's a potential disaster).
Stephen J. Turnbull schrieb:
Mark Hammond writes:
I'm not sure what point you are trying to make, but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do.
Mercurial is not only open source, it's written in Python. The problem is known to be hard in a practical sense, the existing solutions (written by non-Windows developers, of course) are judged to be insufficient by Windows users, and the non-Windows developers "probably don't see this is clearly as Windows users do."
I think the implication is obvious. There will be no good solution until Windows users develop it. I don't see a good reason to wait for that. I do see good reason for non-Windows users to put up with some inconvenience during the "beta" phase of implementing that solution;
It's not that obvious -- we at least need the server-side check that doesn't allow "wrong" line endings as the "last" line of defense, and this check already needs a way to know which files are supposed to have which line endings -- deciding how to specify that is already half of the needed solution. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.
On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote:
Mark Hammond writes:
I'm not sure what point you are trying to make, but I believe it *is* possible for a solution to be found here which will keep Windows users happy. I'm guessing you haven't had much practical experience with this problem, so probably don't see this is clearly as Windows users do.
Mercurial is not only open source, it's written in Python. The problem is known to be hard in a practical sense, the existing solutions (written by non-Windows developers, of course) are judged to be insufficient by Windows users, and the non-Windows developers "probably don't see this is clearly as Windows users do."
I think the implication is obvious. There will be no good solution until Windows users develop it. I don't see a good reason to wait for that.
My conclusion is different. I'm not sure of the history of win32text, but it most certainly is now squarely in the hands of Windows users. Patches to win32text, or even general discussion is usually met with silence, and when prodded, the response is "sorry - we don't use that - it is a Windows problem." As a result, we end up in the position we are in now - win32text is great in theory but doesn't work in practice, attempts to make it work are met with indifference, and the "problem" stays squarely with Windows users. Non Windows users remain oblivious to the pain, Windows users stop bothering with the extension, and the repository post-commit hooks then cause different pain. Hence my conclusion that the answer is for any such support to be developed in conjunction with Windows users, but also in such a way that the solution works, almost identically, for non Windows users. By insisting all platforms eat the same dog-food, there is much more chance the glaringly obvious (to Windows users) issues are addressed.
I do see good reason for non-Windows users to put up with some inconvenience during the "beta" phase of implementing that solution; it's important enough to be fast-tracked, and doesn't need to be perfect for everybody to be tried (though it should not be allowed to endanger repo content, which seems unlikely but needs care since it's a potential disaster).
And on the flip-side, I accept we may migrate without the agreed solution fully implemented - I'm happy to accept commitments about what *will* be done even if it isn't a reality for a short while... Cheers, Mark
Mark Hammond writes:
On 6/08/2009 12:28 AM, Stephen J. Turnbull wrote:
I think the implication is obvious. There will be no good solution until Windows users develop it. I don't see a good reason to wait for that.
My conclusion is different. I'm not sure of the history of win32text, but it most certainly is now squarely in the hands of Windows users. Patches to win32text, or even general discussion is usually met with silence, and when prodded, the response is "sorry - we don't use that - it is a Windows problem."
Well, yes, it is a Windows problem. And it will probably always be that way, because for practical purposes, Windows users cannot advocate their platform's infrastructure solutions for open source projects: those solutions are proprietary. On the flip side, in my experience at least Windows users do not contribute much to this kind of infrastructure initiative, undoubtedly due to the high cost of acquiring familiarity with the usable options[1], and so have less input into the process. But that's a matter of certain costs that are built in to the nature of a proprietary platform. Somebody has to pay them, and I think it should be the users of that platform. Why should the rest of the community subsidize that platform?
As a result, we end up in the position we are in now - win32text is great in theory but doesn't work in practice, attempts to make it work are met with indifference, and the "problem" stays squarely with Windows users.
This is simply false AFAICS. There was little participation on this particular issue during PEP 374 that I can recall. Now that it is clearly an issue after all, it's still early in the PEP 385 process. Martin has already picked up the ball on EOL support, and has carried informal design pretty much to the goal line already ... all that's left is the detailed design and the implementation, and there are several people involved who will help develop the patch, all very capable. (Of course it's going to be easier said than done and there are probably bumps in the road to a smooth workflow, but I do claim that the process is working as well as you could expect.)
Hence my conclusion that the answer is for any such support to be developed in conjunction with Windows users, [...]
Ahem. Why not "(primarily) by Windows users"?
And on the flip-side, I accept we may migrate without the agreed solution fully implemented - I'm happy to accept commitments about what *will* be done even if it isn't a reality for a short while...
Make no mistake about it, EOL support is a tempest in a teapot compared to the benefits to a large number of core developers in their *personal* workspaces -- even if the project workflow doesn't change at all. That's what is driving this change. Unless Windows users do it themselves, they are dependent on the good will of the PEP 385 proponent and other volunteer contributors. I don't think "accepting commitments" is part of the game plan. Footnotes: [1] Eg, I was willing to participate in PEP 374 because I already have a great interest in version control and use git daily. Lots of Unix users don't, and they didn't participate any more than most Windows users did.
This is simply false AFAICS. There was little participation on this particular issue during PEP 374 that I can recall. Now that it is clearly an issue after all, it's still early in the PEP 385 process. Martin has already picked up the ball on EOL support, and has carried informal design pretty much to the goal line already ... all that's left is the detailed design and the implementation, and there are several people involved who will help develop the patch, all very capable.
I'm not so optimistic. To me, it looks like that either Dirkjan or Mark will implement a hg hook, or else it won't happen (for me, I certainly know that I will not write Mercurial hooks anytime soon). Regards, Martin
"Martin v. Löwis" writes:
This is simply false AFAICS. There was little participation on this particular issue during PEP 374 that I can recall. Now that it is clearly an issue after all, it's still early in the PEP 385 process. Martin has already picked up the ball on EOL support, and has carried informal design pretty much to the goal line already ... all that's left is the detailed design and the implementation, and there are several people involved who will help develop the patch, all very capable.
I'm not so optimistic. To me, it looks like that either Dirkjan or Mark will implement a hg hook, or else it won't happen (for me, I certainly know that I will not write Mercurial hooks anytime soon).
Ouch. Still, I think the informal discussion so far is pretty close to a usable solution at that level.
If things were different, they'd be different. However, we live with the legacy of that stupid set of decisions and have no real option to resolve it permanently short of deprecating entire vistas of tools (or even entire operating systems).
I think you missed the solution to the problem that Mark proposed (IIUC): a local commit to a hg repository should already get the line endings right, by automatically converting the file-to-be-committed into the repository line endings. This is what CVS has supported for more than ten years, and what svn supports for close-to ten years.
* distributed VCS has the job of preserving data as present on the filesystem, including whatever line-ending convention is present in a file.
No, that's not true. Distributed VCS has the job to help the developer. That may mean to preserve the file as-is, or it may mean to convert the file on checkout and checkin. Which of these would be needed depends on the file, of course.
It's not a simple thing to solve, and many clever people have tried over the decades. The fact that a centralised VCS can put the problem aside by requiring an explicit, single decision in the repository, is no help when addressing the constraints of a distributed VCS.
Why do you say that? It's not true. The approach that has worked for the central repository can work just as well for a distributed repository.
At some point, the decision about how to handle line endings in cross-platform data needs to be punted to a human for a context-sensitive assessment, since (as can be seen) the above list of requirements is internally inconsistent and can't be relegated to a one-size-fits-all algorithm.
Right - there needs to be a way for the user to specify what line endings to use. That's why both CVS and subversion have supported such configuration, on a per file basis, for many years. I can't see why hg couldn't, in principle, support the same configuration. Being a DVCS, such configuration would have to be part of the clone, of course, being versioned, and all that. I think hg is well capable of keeping versioned configuration information in the clone, as demonstrated by the .hgignore files. Regards, Martin
On Tue, Aug 4, 2009 at 5:43 PM, Mark Hammond<mhammond@skippinet.com.au> wrote:
I'm more than willing to help on this; I haven't resurrected my stale patch because I find win32text only 1/2 a solution that doesn't work in practice. Therefore that patch is as stale for me as it is anyone. However, if a plan is put in place which offers a full solution and the hg developers are committed to it, I promise I'll put my hand up to help with implementation in a fairly timely manner...
Not sure what your patch was as I cannot find it, but I did up a quick change to win32text that uses a versioned .win32text file to maintain encoders, decoders and an ignore list http://media.digitalxero.net/win32text.py http://media.digitalxero.net/.win32text and add to your hgrc file [hooks] precommit.eol_encode = python:hgext.win32text.versioned_encode it needs to be precommit since it needs to run before the change set has been created so it can modify the data. Honestly I think this solution is kind of a hack, a much better solution would be to modify the encode/decode hooks to accept a filename so you can at least do ignore pattern matching, but that still ignores versioned encodes / decodes
On Wed, Aug 5, 2009 at 01:43, Mark Hammond<mhammond@skippinet.com.au> wrote:
Thanks Nick; I didn't want to be the only one saying that. There is a fine line between asserting reasonable requirements for Windows users and being obstructionist and unhelpful, and I'm trying to stay on the former side :)
I'm not trying to be obstructionist and unhelpful (I hope that should be obvious). On the other hand, I'm working from the point of view of hg, which has two assumptions: - we're a distributed system, there's fairly little we can assume about clients - we exchange checksummed byte streams (even if we have some tools that assume those streams are code) - because of the previous point, there's one native (and therefore better, in a sense) serialization of what you consider "structured" data The first point means, for example, there will always be some clients who don't have win32text enabled, no matter what, so you can't rely on it, which is why I want to make the server hooks the primary line of defense, and view the client-side tools as helper tools (to make it easy not to trigger the server-side hooks). That doesn't mean I think Windows users are second-rate, or anything like that!
I'm not that happy with the server being the primary line of defense. Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever. Or I could attempt to recreate the clone by somehow "replaying" the commits with line endings corrected. Either way, the situation doesn't seem good.
I don't think either is bad. In the first case, you have one or maybe two extra changesets. As we like to advocate small changesets that fix one thing, a changeset fixing up whitespace is par for the course. ;) The other solution would be to employ mq, for example, to fix up the commits, which mq excels at (although admittedly it has a learning curve).
I agree. It isn't fair to make this windows users problem. It would be like me proposing the repo get imported with \r\n line endings, enforce that with server side hooks, and let non-Windows users worry about the ramifications of that - somehow I doubt that would fly - so neither should it fly for Windows users...
I'm more than willing to help on this; I haven't resurrected my stale patch because I find win32text only 1/2 a solution that doesn't work in practice. Therefore that patch is as stale for me as it is anyone. However, if a plan is put in place which offers a full solution and the hg developers are committed to it, I promise I'll put my hand up to help with implementation in a fairly timely manner...
Well, I'd be happy to help convince the hg crew to accept whatever we come up with, but I'm not sure I'm the best person to come up with it. It sounds like a versioned .hgeols would help a bunch of issues, but I have the feeling you know that better than me, so I'm hoping you can come up with a concrete proposal on what should change in win32text to fix all the problems you see. Cheers, Dirkjan
- we're a distributed system, there's fairly little we can assume about clients
Not as Mercurial, no. As Python, we can certainly expect that all of our contributors have read the developer FAQ, and set up their systems accordingly. If all else fails, we can revoke commit access (or is it "push access"?) if some committer doesn't get the configuration right. We would, of course, prefer if it was very easy to get the configuration right, so that problems don't occur in the first place.
The first point means, for example, there will always be some clients who don't have win32text enabled, no matter what, so you can't rely on it, which is why I want to make the server hooks the primary line of defense
I think it's a terminology issue only: don't say "primary", say "last". Can we agree that the "last" line of defense will be the server hooks, and the "primary" line of defense will be the client commits? "primary" would mean that this is were most errors are detected and fixed; Mark would really object to a flow where most errors are detected only at the server.
That doesn't mean I think Windows users are second-rate, or anything like that!
If the server hooks were the primary line of defense, it would effectively make Windows users second-rate: they will have to redo all their changes over-and-over again, whereas the Unix users can push the changes without any obstacles (just because they are less likely to make mistakes). If the client machines were the primary line of defense, Windows users were treated equally: they would make as few mistakes as Unix users, because the hooks do what they want correctly.
I don't think either is bad. In the first case, you have one or maybe two extra changesets. As we like to advocate small changesets that fix one thing, a changeset fixing up whitespace is par for the course. ;)
Whitespace-only changes hurt the "annotate" feature, so we dislike them very much in Python.
Well, I'd be happy to help convince the hg crew to accept whatever we come up with, but I'm not sure I'm the best person to come up with it.
That is all very well. See my other message (asking for volunteers) as well. If you have more work you would prefer to delegate, please let us know. Regards, Martin
On Wed, Aug 5, 2009 at 10:51, "Martin v. Löwis"<martin@v.loewis.de> wrote:
Not as Mercurial, no. As Python, we can certainly expect that all of our contributors have read the developer FAQ, and set up their systems accordingly. If all else fails, we can revoke commit access (or is it "push access"?) if some committer doesn't get the configuration right. We would, of course, prefer if it was very easy to get the configuration right, so that problems don't occur in the first place.
There will also be non-committers who forge changesets that you want to be able to push directly to the Python repositories.
If the client machines were the primary line of defense, Windows users were treated equally: they would make as few mistakes as Unix users, because the hooks do what they want correctly.
Similarly, if Python kept its .py files in \r\n line endings by default instead of \n endings, Unix-like users would be more prone to mistake, so by keeping the .py files in \n-format, so Python is making Windows users second-rate by keeping the line endings in \n format. To cope with that, hg needs to do extra work on the client side. Cheers, Dirkjan
Not as Mercurial, no. As Python, we can certainly expect that all of our contributors have read the developer FAQ, and set up their systems accordingly. If all else fails, we can revoke commit access (or is it "push access"?) if some committer doesn't get the configuration right. We would, of course, prefer if it was very easy to get the configuration right, so that problems don't occur in the first place.
There will also be non-committers who forge changesets that you want to be able to push directly to the Python repositories.
They will also have to follow the policies we set up. If they refuse to do that, we refuse to accept their changes. It's very simple, and contributors have learned very quickly what the policies were (after they were explained to them). Whether that means that they have to fix their changesets, or that they have to redo them, practice will show.
If the client machines were the primary line of defense, Windows users were treated equally: they would make as few mistakes as Unix users, because the hooks do what they want correctly.
Similarly, if Python kept its .py files in \r\n line endings by default instead of \n endings, Unix-like users would be more prone to mistake, so by keeping the .py files in \n-format, so Python is making Windows users second-rate by keeping the line endings in \n format. To cope with that, hg needs to do extra work on the client side.
I think you still miss the point. *If* hg does the extra work, *then* Windows users are *not* second-class citizens anymore. They *only* consider themselves second-class if they have to do additional *manual* work (*). Regards, Martin (*) They may also consider themselves second-class if they have to install additional software, so hopefully, the necessary extra code for hg will become part of the regular Mercurial distribution at some point.
On 5/08/2009 6:25 PM, Dirkjan Ochtman wrote:
On Wed, Aug 5, 2009 at 01:43, Mark Hammond<mhammond@skippinet.com.au> wrote:
Thanks Nick; I didn't want to be the only one saying that. There is a fine line between asserting reasonable requirements for Windows users and being obstructionist and unhelpful, and I'm trying to stay on the former side :)
I'm not trying to be obstructionist and unhelpful (I hope that should be obvious).
It is, and I hope I didn't imply otherwise.
On the other hand, I'm working from the point of view of hg, which has two assumptions:
- we're a distributed system, there's fairly little we can assume about clients - we exchange checksummed byte streams (even if we have some tools that assume those streams are code) - because of the previous point, there's one native (and therefore better, in a sense) serialization of what you consider "structured" data
The first point means, for example, there will always be some clients who don't have win32text enabled, no matter what, so you can't rely on it, which is why I want to make the server hooks the primary line of defense, and view the client-side tools as helper tools (to make it easy not to trigger the server-side hooks). That doesn't mean I think Windows users are second-rate, or anything like that!
In general I agree - although I think we can enforce a "social contract" which puts requirements on people who commit to the Python repository - and therefore we can consider the server-side hooks a "secondary" defense. IOW, the system (including the social aspects of the system) are setup such that the server-side hooks are very rarely called upon.
I'm not that happy with the server being the primary line of defense. Let's say I make a branch of the hg repo, myself and a few others work on it committing as we go, then attempt to merge back upstream. Let's say some of the early commits on that clone introduced "bad" line endings. I'm guessing I would be forced to make a number of whitespace-only checkins to normalize the line-endings before it could merge - and these checkins would then be in the history forever. Or I could attempt to recreate the clone by somehow "replaying" the commits with line endings corrected. Either way, the situation doesn't seem good.
I don't think either is bad.
With all due respect, I suspect that is because you don't expect to see the issue regularly. This proposal still leaves the problem squarely in the lap of Windows users and imposes a burden on them that would probably be considered unreasonable if the situation was reversed. I'm yet to work on a hg repository without mixed line endings. If I understand correctly, every such repository would have involved a developer checking in locally, than at some point in the future pushing these changes upstream. I really really don't want hg to tell me at this final step that I need to perform whitespace only fixes purely because I am running Windows. I understand we are discussing how win32text can offer that - but I must object to your assertion that the situation I described isn't bad when you hit it.
Well, I'd be happy to help convince the hg crew to accept whatever we come up with, but I'm not sure I'm the best person to come up with it. It sounds like a versioned .hgeols would help a bunch of issues, but I have the feeling you know that better than me, so I'm hoping you can come up with a concrete proposal on what should change in win32text to fix all the problems you see.
Actually, I think it is easy to make this problem much easier to understand; mandate every platform should use win32text, then start collating the issues people, including yourself, will no doubt face. I'm happy to get this ball rolling, but again, don't want this left purely in the domain of "it is a windows problem" - it isn't. Cheers, Mark
On Wed, Aug 5, 2009 at 11:02, Mark Hammond<skippy.hammond@gmail.com> wrote:
In general I agree - although I think we can enforce a "social contract" which puts requirements on people who commit to the Python repository - and therefore we can consider the server-side hooks a "secondary" defense. IOW, the system (including the social aspects of the system) are setup such that the server-side hooks are very rarely called upon.
Agreed.
With all due respect, I suspect that is because you don't expect to see the issue regularly.
I suspect so, too!
I'm yet to work on a hg repository without mixed line endings. If I understand correctly, every such repository would have involved a developer checking in locally, than at some point in the future pushing these changes upstream. I really really don't want hg to tell me at this final step that I need to perform whitespace only fixes purely because I am running Windows.
I understand we are discussing how win32text can offer that - but I must object to your assertion that the situation I described isn't bad when you hit it.
I agree it is to be avoided, I'm just saying that I think it will be exceptional and therefore not a large burden, given other kinds of defenses we can put in place.
Actually, I think it is easy to make this problem much easier to understand; mandate every platform should use win32text, then start collating the issues people, including yourself, will no doubt face. I'm happy to get this ball rolling, but again, don't want this left purely in the domain of "it is a windows problem" - it isn't.
I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform. Cheers, Dirkjan
I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform.
This is all fairly hypothetical - if hg grew a .hgeols file, it would be good if it supported that cross-platform. It then may make win32text obsolete (in particular if it provided some useful defaults). On Unix, the functionality might be as simple as checking conformance with the eol-style at pre-commit time. Regards, Martin
On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:
I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform.
I see two practical outcomes of such a mandate: * line-ending rules are enforced for local checkins, even for linux users, even though such 'accidental' inappropriate line-ending checkins should be much rarer than for windows. * practical problems faced by Windows users, including any performance considerations, are shared by the community and therefore addressed as a community, thereby ensuring all platforms are considered as important as any other. Cheers, Mark
Mark Hammond wrote:
On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:
I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform.
I see two practical outcomes of such a mandate:
* line-ending rules are enforced for local checkins, even for linux users, even though such 'accidental' inappropriate line-ending checkins should be much rarer than for windows.
* practical problems faced by Windows users, including any performance considerations, are shared by the community and therefore addressed as a community, thereby ensuring all platforms are considered as important as any other.
The main error that enabling win32text everywhere can catch is the use of a *nix client to accidentally corrupt one of the files that is supposed to have \r\n line endings. It also simplifies the configuration rules in the Python hg FAQ - we would be able to just tell all developers wanting to contribute patches to Python to enable the win32text extension when working with the Python repositories (or clones thereof) without having to worry about what platform they were on. So it seems to me that the main client-side feature we want is a versioned .hgeols file in the repository that allows files to be explicitly nominated as one of: - eol=CRLF (i.e. have \r\n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CRLF) - eol=LF (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:LF) - eol=CR (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CR) - native text (i.e. always stored in the repository with \n line endings, but uses native line endings on the local disk - equivalent to SVN eol-style:native) - binary (i.e. always reproduced on disk exactly as they are in the repository - equivalent to SVN files without eol-style set at all) The .hgeols file should also allow the repository to define which of the above should be used as the default handling mechanism for text files that are not named in the file (native text, in the specific case of the Python repositories). Files which look like binary files (according to the existing win32text heuristics) would be left alone regardless of what the default handling was set to in .hgeols. win32text would then be enhanced to check for a .hgeols file before falling back to its existing configuration mechanisms. The above basically provides the SVN eol-style feature in a more hg-friendly way. Allowing wildcards in the .hgeols files might be nice, but I don't think it is actually required. We really don't have that many files that are affected by this problem (it's just the fact that it is a number greater than zero that is causing the problem). The server side pre-push hooks for the main Python repositories would be set to reject change sets which didn't meet the above rules. If a patch fails those checks, either the committer can fix it themselves and resubmit, or else send it back to the originator along with a pointer to the section in the dev FAQ that describes the expected client-side configuration. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
Nick Coghlan wrote:
Mark Hammond wrote:
On 5/08/2009 7:09 PM, Dirkjan Ochtman wrote:
I'm not sure how win32text will provide anything other than performance degradation for non-Windows developers, but if there's functionality to be had, I'm happy to mandate its use on every platform. I see two practical outcomes of such a mandate:
* line-ending rules are enforced for local checkins, even for linux users, even though such 'accidental' inappropriate line-ending checkins should be much rarer than for windows.
* practical problems faced by Windows users, including any performance considerations, are shared by the community and therefore addressed as a community, thereby ensuring all platforms are considered as important as any other.
The main error that enabling win32text everywhere can catch is the use of a *nix client to accidentally corrupt one of the files that is supposed to have \r\n line endings.
It also simplifies the configuration rules in the Python hg FAQ - we would be able to just tell all developers wanting to contribute patches to Python to enable the win32text extension when working with the Python repositories (or clones thereof) without having to worry about what platform they were on.
So it seems to me that the main client-side feature we want is a versioned .hgeols file in the repository that allows files to be explicitly nominated as one of: - eol=CRLF (i.e. have \r\n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CRLF) - eol=LF (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:LF) - eol=CR (i.e. have \n line endings in the repository and should be left that way on the local disk as well - equivalent to SVN eol-style:CR) - native text (i.e. always stored in the repository with \n line endings, but uses native line endings on the local disk - equivalent to SVN eol-style:native) - binary (i.e. always reproduced on disk exactly as they are in the repository - equivalent to SVN files without eol-style set at all)
The .hgeols file should also allow the repository to define which of the above should be used as the default handling mechanism for text files that are not named in the file (native text, in the specific case of the Python repositories).
Files which look like binary files (according to the existing win32text heuristics) would be left alone regardless of what the default handling was set to in .hgeols.
win32text would then be enhanced to check for a .hgeols file before falling back to its existing configuration mechanisms.
The above basically provides the SVN eol-style feature in a more hg-friendly way. Allowing wildcards in the .hgeols files might be nice, but I don't think it is actually required. We really don't have that many files that are affected by this problem (it's just the fact that it is a number greater than zero that is causing the problem).
The server side pre-push hooks for the main Python repositories would be set to reject change sets which didn't meet the above rules. If a patch fails those checks, either the committer can fix it themselves and resubmit, or else send it back to the originator along with a pointer to the section in the dev FAQ that describes the expected client-side configuration.
Instead of just talking about line endings, could each file have a specific 'filetype'? This would define what kind of data it contains, how it's stored in the repository, and what actions to perform for fetching and committing, including any checks: c_header: C header file; LF in repository; native outside c_source: C source file; LF in repository; native outside text: plain text; LF in repository; native outside crlf_text: plain text; CRLF in repository; CRLF outside cr_text: plain text; CR in repository; CR outside lf_text: plain text; LF in repository; LF outside binary: arbitrary binary data; as-is in repository This could be expanded in the future to include filetypes for JPEG, etc.
On Wed, Aug 5, 2009 at 15:35, MRAB<python@mrabarnett.plus.com> wrote:
Instead of just talking about line endings, could each file have a specific 'filetype'? This would define what kind of data it contains, how it's stored in the repository, and what actions to perform for fetching and committing, including any checks:
Sounds like YAGNI to me. The outline Nick provided seems to me to be quite close to the current win32text settings in syntax and purpose and staying close to that would help making adoption easier. Cheers, Dirkjan
Dirkjan Ochtman wrote:
On Wed, Aug 5, 2009 at 15:35, MRAB<python@mrabarnett.plus.com> wrote:
Instead of just talking about line endings, could each file have a specific 'filetype'? This would define what kind of data it contains, how it's stored in the repository, and what actions to perform for fetching and committing, including any checks:
Sounds like YAGNI to me.
Yep - while SVN does support full mime_type specification for files, I don't think we have ever used it. The SVN eol-style property is all we're trying to replicate, since that has served us well in the few cases where it has mattered.
The outline Nick provided seems to me to be quite close to the current win32text settings in syntax and purpose and staying close to that would help making adoption easier.
Yeah, win32text is already tantalising close to what we would like so I deliberately tried to stay close to its existing approach. We're just being a bit fussier than most about the repository being able to tell the clients which files should be given special treatment. That way individual users can just set it up once on their development machine and then no longer have to worry about it (if more files that need special treatment are added to the repository, then the same checkin that adds them should also update .hgeols). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
On Thu, Aug 06, 2009 at 12:12:08AM +1000, Nick Coghlan wrote:
Yep - while SVN does support full mime_type specification for files, I don't think we have ever used it.
These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly. Probably there are other files. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann <phd <at> phd.pp.ru> writes:
These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly.
It's certainly ok to convert them to utf-8 (and add the marker anyway). There's no point in having different charsets used throughout the code base, except for testing purposes (just as there's no point in having different indentation rules used for the same file type throughout the code base ;-)). Regards Antoine.
These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly.
It's certainly ok to convert them to utf-8 (and add the marker anyway).
No, it's not. PEP 8 mandates that non-ASCII code in the Python source code is in Latin-1. Regards, Martin
Martin v. Löwis <martin <at> v.loewis.de> writes:
No, it's not. PEP 8 mandates that non-ASCII code in the Python source code is in Latin-1.
Ok, point taken. Having several encodings (and several indentation rules) certainly makes things more annoying for contributors than they should, however. Regards Antoine.
"Martin v. Löwis" wrote:
These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly.
It's certainly ok to convert them to utf-8 (and add the marker anyway).
No, it's not. PEP 8 mandates that non-ASCII code in the Python source code is in Latin-1.
Then I guess it's time to change PEP 8 for Python 2.7 ... """ Code in the core Python distribution should aways use the ASCII or UTF-8 encoding together with a PEP 263 encoding comment header. """ Since UTF-8 is ASCII compatible, the whole source code will effectively be UTF-8 encoded. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Aug 05 2009)
Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
These files are in 8859-1 encoding (names in comments, at least): http://svn.python.org/view/python/trunk/Lib/encodings/punycode.py http://svn.python.org/view/python/trunk/Lib/test/test_csv.py http://svn.python.org/view/python/trunk/Tools/i18n/msgfmt.py http://svn.python.org/view/python/trunk/Tools/i18n/pygettext.py If they are not marked as "text/plain; charset=iso-8859-1" I think it's a bug. Either they should be marked, or converted to ascii or utf-8; the coding pseudocomment (directive) should be changed accordingly.
It's certainly a bug of the web page. I'm not so sure it's a bug in the files: I would claim that it's a bug in ViewCVS. Regards, Martin
On Wed, Aug 05, 2009 at 02:35:02PM +0100, MRAB wrote:
Instead of just talking about line endings, could each file have a specific 'filetype'?
EOL-conversion, MIME type and encoding (charset) are three different concepts. Yes, all of them must be supported, but not necessary in one configuration mechanism. Subversion handles these issues by providing svn:eol-style and svn:mime-type (handles both MIME type and charset) properties on a file-by-file basis. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Wed, Aug 05, 2009 at 05:50:03PM +0400, Oleg Broytmann wrote:
Subversion handles these issues by providing ... svn:mime-type (handles both MIME type and charset) file-by-file basis.
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Wed, Aug 5, 2009 at 15:57, Oleg Broytmann<phd@phd.pp.ru> wrote:
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header.
It doesn't currently have any way to provide out-of-band charset info. Cheers, Dirkjan
On Wed, Aug 05, 2009 at 04:04:24PM +0200, Dirkjan Ochtman wrote:
On Wed, Aug 5, 2009 at 15:57, Oleg Broytmann<phd@phd.pp.ru> wrote:
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header.
It doesn't currently have any way to provide out-of-band charset info.
Perhaps that's not a big issue for Python, but it's certainly a big issue for me. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Wed, Aug 5, 2009 at 16:35, Oleg Broytmann<phd@phd.pp.ru> wrote:
Perhaps that's not a big issue for Python, but it's certainly a big issue for me.
I think there are extensions that try to deal with it. Have a look: http://mercurial.selenic.com/wiki/UsingExtensions If not, it should be easy to come up with something and write an extension for it. Cheers, Dirkjan
Oleg Broytmann writes:
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header.
How is this relevant to PEP 385? I hope the answer is "not at all". I've been there, done that, and my answer is "never again". (I'm not telling you what to do with *your* repository, just that I don't see any good reason for having any encodings but UTF-8 in Python's.)
On Thu, Aug 06, 2009 at 12:34:39AM +0900, Stephen J. Turnbull wrote:
Oleg Broytmann writes:
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header.
How is this relevant to PEP 385? I hope the answer is "not at all".
There are non-utf8 non-ascii files in the Python source tree. Either there should be a way to handle them in Mercurial or they have to be converted to UTF-8 in a proper way (i.e., don't forget to rewrite charset directives). Other tan that - I am pondering a switch from SVN to hg in other projects using Python process as an example and asking questions that are slightly off-topic (but only slightly).
I've been there, done that, and my answer is "never again". (I'm not telling you what to do with *your* repository, just that I don't see any good reason for having any encodings but UTF-8 in Python's.)
We have files in at least two different encodings - utf-8 and cp1251 for user-visible text-files on w32. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Dirkjan, how does Mercurial handles charsets? If I have three files in my repository - one in utf-8, another in koi8-r, and the third in cp1251 encoding - I certainly don't want to convert them back and force, but I want hg web interface to provide charset in the Content-Type header.
How is this relevant to PEP 385? I hope the answer is "not at all". I've been there, done that, and my answer is "never again". (I'm not telling you what to do with *your* repository, just that I don't see any good reason for having any encodings but UTF-8 in Python's.)
Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 for Python 2.x source code (except for files that test PEP 263). Regards, Martin
"Martin v. Löwis" writes:
I don't see any good reason for having any encodings but UTF-8 in Python's.
Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 for Python 2.x source code (except for files that test PEP 263).
You're right, sorry for the misinformation. An exception should be made for gettext message files, too?
Just in case my previous message gets overlooked: PEP 8 mandates Latin-1 for Python 2.x source code (except for files that test PEP 263).
You're right, sorry for the misinformation.
An exception should be made for gettext message files, too?
In principle, perhaps. However, Python doesn't have any .po files, AFAIK. Regards, Martin
participants (18)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Ben Finney
-
Dirkjan Ochtman
-
Dj Gilcrease
-
Georg Brandl
-
Glenn Linderman
-
John Arbash Meinel
-
M.-A. Lemburg
-
Mark Hammond
-
Mark Hammond
-
MRAB
-
Neil Hodgson
-
Nick Coghlan
-
Oleg Broytmann
-
Paul Moore
-
Stephen J. Turnbull
-
Yingjie Lan