Re: [Python-Dev] Path object design
On 08:14 pm, sluggoster@gmail.com wrote:
Argh, it's difficult to respond to one topic that's now spiraling into two conversations on two lists.
glyph@divmod.com wrote:
(...) people have had to spend five years putting hard-to-read os.path functions in the code, or reinventing the wheel with their own libraries that they're not sure they can trust. I started to use path.py last year when it looked like it was emerging as the basis of a new standard, but yanked it out again when it was clear the API would be different by the time it's accepted. I've gone back to os.path for now until something stable emerges but I really wish I didn't have to.
You *don't* have to. This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever. It's OK to use libraries. It's OK even to use libraries that Guido doesn't like! I'm pretty sure the first person to tell you that would be Guido himself. (Well, second, since I just told you.) If you like path.py and it solves your problems, use path.py. You don't have to cram it into the standard library to do that. It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.
*It is already used in a large body of real, working code, and therefore its limitations are known.*
This is an important consideration.However, to me a clean API is more important.
It's not that I don't think a "clean" API is important. It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it". Any code that is really easy to use right will tend to have *some* aesthetic appeal.
I took a quick look at filepath. It looks similar in concept to PEP 355. Four concerns: - unfamiliar method names (createDirectory vs mkdir, child vs join)
Fair enough, but "child" really means child, not join. It is explicitly for joining one additional segment, with no slashes in it.
- basename/dirname/parent are methods rather than properties: leads to () overproliferation in user code.
The () is there because every invocation returns a _new_ object. I think that this is correct behavior but I also would prefer that it remain explicit.
- the "secure" features may not be necessary. If they are, this should be a separate discussion, and perhaps implemented as a subclass.
The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class. Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous. child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly. Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con". It turns out that "con" is shorthand for "fall in a well and die" in win32-ese. A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze. Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.
- stylistic objection to verbose camelCase names like createDirectory
There is no accounting for taste, I suppose. Obviously if it violates the stlib's naming conventions it would have to be adjusted.
Path representation is a bike shed. Nobody would have proposed writing an entirely new embedded database engine for Python: python 2.5 simply included SQLite because its utility was already proven.
There's a quantum level of difference between path/file manipulation -- which has long been considered a requirement for any full-featured programming language -- and a database engine which is much more complex.
"quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you. No, it's not as hard as writing a database engine. Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion.
Fredrik has convinced me that it's more urgent to OOize the pathname conversions than the filesystem operations.
I agree in the relative values. I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library.
Where have all the proponents of non-OO or limited-OO strategies been?
This continuum doesn't make any sense to me. Where would you place Twisted's solution on it?
On 11/1/06, glyph@divmod.com <glyph@divmod.com> wrote:
On 08:14 pm, sluggoster@gmail.com wrote:
(...) people have had to spend five years putting hard-to-read os.path functions in the code, or reinventing the wheel with their own libraries that they're not sure they can trust. I started to use path.py last year when it looked like it was emerging as the basis of a new standard, but yanked it out again when it was clear the API would be different by the time it's accepted. I've gone back to os.path for now until something stable emerges but I really wish I didn't have to.
You *don't* have to. This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever. It's OK to use libraries. It's OK even to use libraries that Guido doesn't like! I'm pretty sure the first person to tell you that would be Guido himself. (Well, second, since I just told you.) If you like path.py and it solves your problems, use path.py. You don't have to cram it into the standard library to do that. It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.
Oh, I understand it's OK to use libraries. It's just that a path library needs to be widely tested and well supported so you know it won't scramble your files. A bug in a date library affects only datetimes. A bug in a database database library affects only that database. A bug in a template library affects only the page being output. But a bug in a path library could ruin your whole day. "Um, remember those important files in that other project directory you weren't working in? They were just overwritten." Also, I train several programmers new to Python at work. I want to make them learn *one* path library that we'll be sure to stick with for several years. Every path library has subtle quirks, and switching from one to another may not be just a matter of renaming methods.
- the "secure" features may not be necessary. If they are, this should be a separate discussion, and perhaps implemented as a subclass.
The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class. Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous. child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly. Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con". It turns out that "con" is shorthand for "fall in a well and die" in win32-ese. A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze. Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.
Perhaps you're right. I'm not saying it *should not* be a basic feature, just that unless the Python community as a whole is ready for this, users should have a choice to use it or not. I learned about DOS device files from the manuals back in the 80s. But I had completely forgotten them when I made several "aux" directories in a Subversion repository on Linux. People tried to check it out on Windows and... got some kind of error. "CON" means console: its input comes from the keyboard and its output goes to the screen. Since this is a device file, I'm not sure a path library has any responsibility to treat it specially. We don't treat "/dev/stdout" specially unless the user specifically calls a device function. I have no idea why Microsoft thought it was a good idea to put the seven-odd device files in every directory. Why not force people to type the colon ("CON:"). If they've memorized what CON means they should have no trouble with the colon, especially since it's required with "A:" and "C:" anyway For trivia, these are the ones I remember: CON Console (keyboard input, screen output) KBRD Keyboard input. ??? screen output LPT1/2/3 parallel ports COM 1/2/3/4 serial ports PRN alias for default printer port (normally LPT1) NUL bit bucket AUX game port? COPY CON FILENAME.TXT # Unix: "cat >filename.txt". COPY FILENAME.TXT PRN # Unix: "lp filename.txt" or "cat filename.txt | lp". TYPE FILENAME.TXT # Unix: "cat filename.txt".
Where have all the proponents of non-OO or limited-OO strategies been?
This continuum doesn't make any sense to me. Where would you place Twisted's solution on it?
In the "let's create a brilliant library and put a dark box around it so nobody knows it's there" position. Although you say you've been trying to spread the word about it. For whatever reason, I haven't heard about it till now. Not sure what this means. But what I meant is, we OO proponents have been trying to promote path.py and/or get a similar module into the stdlib for years, and all we got was... not even hostility... just indifference and silence. People like to complain about os.path but not do anything about fixing it, or even to say which approach they *would* support. Talin started a great thread on the python-3000 list, going back to the beginning and saying "What is wrong with os.path, how much does it need fixing, and is consensus on an API possible?" Maybe he did what the rest of us (including me) should have done long ago. -- Mike Orr <sluggoster@gmail.com>
Mike Orr wrote:
I have no idea why Microsoft thought it was a good idea to put the seven-odd device files in every directory. Why not force people to type the colon ("CON:").
Yes, this is a particularly stupid piece of braindamage on the part of the designers of MS-DOS. As far as I remember, even CP/M (which was itself a severely warped and twisted version of RT11) had the good sense to put colons on the end of such things. But maybe "design" is too strong a word to apply to MS-DOS... Anyhow, I think I agree that there's really nothing a path library can do about this. Whatever it tries to do, the fact will remain that it's impossible to have a regular file called "con", and users will have to live with that. -- Greg
participants (3)
-
glyph@divmod.com
-
Greg Ewing
-
Mike Orr