Mailman 3 mmap - Python-Dev

newer
RE: [Python-Dev] stackable ints...

mmap

David Ascher

June 15, 1999

10:48 p.m.

Another topic: what are the chances of adding the mmap module to the core distribution? It's restricted to a smallish set of platforms (modern Unices and Win32, I think), but it's quite small, and would be a nice thing to have available in the core, IMHO. (btw, the buffer object needs more documentation) --david

Show replies by date

Guido van Rossum

June 1999

10:54 p.m.

...

Another topic: what are the chances of adding the mmap module to the core distribution? It's restricted to a smallish set of platforms (modern Unices and Win32, I think), but it's quite small, and would be a nice thing to have available in the core, IMHO.

If it works on Linux, Solaris, Irix and Windows, and is reasonably clean, I'll take it. Please send it.

...

(btw, the buffer object needs more documentation)

That's for Jack & Greg... --Guido van Rossum (home page: http://www.python.org/~guido/)

Greg Stein

10:13 a.m.

On Tue, 15 Jun 1999, Guido van Rossum wrote:

...

...
Another topic: what are the chances of adding the mmap module to the core distribution? It's restricted to a smallish set of platforms (modern Unices and Win32, I think), but it's quite small, and would be a nice thing to have available in the core, IMHO.

If it works on Linux, Solaris, Irix and Windows, and is reasonably clean, I'll take it. Please send it.

Actually, my preference is to see a change to open() rather than a whole new module. For example, let's say that you open a file, specifying memory-mapping. Then you create a buffer against that file: f = open('foo','rm') # 'm' means mem-map b = buffer(f) print b[100:200] Disclaimer: I haven't looked at the mmap modules (AMK's and Mark's) to see what capabilities are in there. They may not be expressable soly as open() changes. (adding add'l params for mmap flags might be another way to handle this) I'd like to see mmap native in Python. I won't push, though, until I can run a test to see what kind of savings will occur when you mmap a .pyc file and open PyBuffer objects against the thing for the code bytes. My hypothesis is that you can reduce the working set of Python (i.e. amortize the cost of a .pyc's code over several processes by mmap'ing it); this depends on the proportion of code in the pyc relative to "other" stuff.

...

...
(btw, the buffer object needs more documentation)

That's for Jack & Greg...

Quite true. My bad :-( ... That would go into the API doc, I guess... I'll put this on a todo list, but it could be a little while. Cheers, -g -- Greg Stein, http://www.lyra.org/

Fredrik Lundh

10:53 a.m.

Greg wrote:

...

Actually, my preference is to see a change to open() rather than a whole new module. For example, let's say that you open a file, specifying memory-mapping. Then you create a buffer against that file:

f = open('foo','rm') # 'm' means mem-map b = buffer(f) print b[100:200]

Disclaimer: I haven't looked at the mmap modules (AMK's and Mark's) to see what capabilities are in there. They may not be expressable soly as open() changes. (adding add'l params for mmap flags might be another way to handle this)

I'd like to see mmap native in Python. I won't push, though, until I can run a test to see what kind of savings will occur when you mmap a .pyc file and open PyBuffer objects against the thing for the code bytes. My hypothesis is that you can reduce the working set of Python (i.e. amortize the cost of a .pyc's code over several processes by mmap'ing it); this depends on the proportion of code in the pyc relative to "other" stuff.

yes, yes, yes! my good friend the mad scientist (the guy who writes code, not the flaming cult-ridden brainwashed script kiddie) has considered writing a whole new "abstract file" backend, to entirely get rid of stdio in the Python core. some potential advantages: -- performance (some stdio implementations are slow) -- portability (stdio doesn't exist on some platforms!) -- opens up for cool extensions (memory mapping, pluggable file handlers, etc). should I tell him to start hacking? or is this the same thing as PyBuffer/buffer (I've implemented PyBuffer support for the unicode class, but that doesn't mean that I understand how it works...) </F> PS. someone once told me that Perl goes "below" the standard file I/O system. does anyone here know if that's true, and per- haps even explain how they're doing that...

Guido van Rossum

12:24 p.m.

...

my good friend the mad scientist (the guy who writes code, not the flaming cult-ridden brainwashed script kiddie) has considered writing a whole new "abstract file" backend, to entirely get rid of stdio in the Python core. some potential advantages:

-- performance (some stdio implementations are slow) -- portability (stdio doesn't exist on some platforms!)

You have this backwards -- you'd have to port the abstract backend first! Also don't forget that a *good* stdio might be using all sorts of platform-specific tricks that you'd have to copy to match its performance.

...

-- opens up for cool extensions (memory mapping, pluggable file handlers, etc).

should I tell him to start hacking?

Tcl/Tk does this. I see some advantages (e.g. you have more control over and knowledge of how much data is buffered) but also some disadvantages (more work to port, harder to use from C), plus tons of changes needed in the rest of Python. I'd say wait until Python 2.0 and let's keep stdio for 1.6.

...

PS. someone once told me that Perl goes "below" the standard file I/O system. does anyone here know if that's true, and per- haps even explain how they're doing that...

Probably just means that they use the C equivalent of os.open() and friends. --Guido van Rossum (home page: http://www.python.org/~guido/)

Fredrik Lundh

7:16 p.m.

...

...
-- performance (some stdio implementations are slow) -- portability (stdio doesn't exist on some platforms!)

You have this backwards -- you'd have to port the abstract backend first! Also don't forget that a *good* stdio might be using all sorts of platform-specific tricks that you'd have to copy to match its performance.

well, if the backend layer is good enough, I don't think a stdio-based standard version will be much slower than todays stdio-only implementation.

...

...
PS. someone once told me that Perl goes "below" the standard file I/O system. does anyone here know if that's true, and per- haps even explain how they're doing that...

Probably just means that they use the C equivalent of os.open() and friends.

hopefully. my original source described this as "digging around in the innards of the stdio package" (and so did greg). and the same source claimed it wasn't yet ported to Linux. sounds weird, to say the least, but maybe he referred to that sfio package greg mentioned. I'll do some digging, but not today. </F>

Greg Ward

12:25 p.m.

On 16 June 1999, Fredrik Lundh said:

...

my good friend the mad scientist (the guy who writes code, not the flaming cult-ridden brainwashed script kiddie) has considered writing a whole new "abstract file" backend, to entirely get rid of stdio in the Python core. some potential advantages: [...] PS. someone once told me that Perl goes "below" the standard file I/O system. does anyone here know if that's true, and per- haps even explain how they're doing that...

My understanding (mainly from folklore -- peeking into the Perl source has been known to turn otherwise staid, solid programmers into raving lunatics) is that yes, Perl does grovel around in the internals of stdio implementations to wring a few extra cycles out. However, what's probably of more interest to you -- I mean your mad scientist alter ego -- is Perl's I/O abstraction layer: a couple of years ago, somebody hacked up Perl's guts to do basically what you're proposing for Python. The main result was a half-baked, unfinished (at least as of last summer, when I actually asked an expert in person at the Perl Conference) way of building Perl with AT&T's sfio library instead of stdio. I think the other things you mentioned, eg. more natural support for memory-mapped files, have also been bandied about as advantages of this scheme. The main problem with Perl's I/O abstraction layer is that extension modules now have to call e.g. PerlIO_open(), PerlIO_printf(), etc. in place of their stdio counterparts. Surprise surprise, many extension modules have not adapted to the new way of doing things, even though it's been in Perl since version 5.003 (I think). Even more surprisingly, the fourth-party C libraries that those extension modules often interface to haven't switched to using Perl's I/O abstraction layer. This doesn't make a whit of difference if Perl is built in either the "standard way" (no abstraction layer, just direct stdio) or with the abstraction layer on top of stdio. But as soon as some poor fool decides Perl on top of sfio would be neat, lots of extension modules break -- their I/O calls go nowhere. I'm sure there is some sneaky way to make it all work using sfio's binary compatibility layer and some clever macros. This might even have been done. However, AFAIK it's not been documented anywhere. This is not merely to bitch about unfinished business in the Perl core; it's to warn you that others have walked down the road you propose to tread, and there may be potholes. Now if the Python source really does get even more modularized for 1.6, you might have a much easier job of it. ("Modular" is not the word that jumps to mind when one looks at the Perl source code.) Greg /* * "Far below them they saw the white waters pour into a foaming bowl, and * then swirl darkly about a deep oval basin in the rocks, until they found * their way out again through a narrow gate, and flowed away, fuming and * chattering, into calmer and more level reaches." */ -- Tolkein, by way of perl/doio.c -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913

Mark Hammond

1:47 p.m.

[Greg writes]

...

The main problem with Perl's I/O abstraction layer is that extension modules now have to call e.g. PerlIO_open(), PerlIO_printf(), etc. in place of their stdio counterparts. Surprise surprise, many extension

Interestingly, Python _nearly_ suffers this problem now. Although Python does use native FILE pointers, this scheme still assumes that Python and the extensions all use the same stdio. I understand that on most Unix system this can be taken for granted. However, to be truly cross-platform, this assumption may not be valid. A case in point is (surprise surprise :-) Windows. Windows has a number of C RTL options, and Python and its extensions must be careful to select the one that shares FILE * and the heap across separately compiled and linked modules. In-fact, Windows comes with an excellent debug version of the C RTL, but this gets in Python's way - if even one (but not all) Python extension attempts to use these debugging features, we die in a big way. and-dont-even-talk-to-me-about-Windows-CE ly, Mark.

Fredrik Lundh

7:11 p.m.

Greg Ward wrote:

...

This is not merely to bitch about unfinished business in the Perl core; it's to warn you that others have walked down the road you propose to tread, and there may be potholes.

oh, the mad scientist have rushed down that road a few times before. we'll see if he's prepared to do that again; it sure won't happen before the unicode stuff is in place... </F>

David Beazley

1:23 p.m.

Fredrik Lundh writes:

...

my good friend the mad scientist (the guy who writes code, not the flaming cult-ridden brainwashed script kiddie) has considered writing a whole new "abstract file" backend, to entirely get rid of stdio in the Python core. some potential advantages:

-- performance (some stdio implementations are slow) -- portability (stdio doesn't exist on some platforms!) -- opens up for cool extensions (memory mapping, pluggable file handlers, etc).

should I tell him to start hacking?

I am not in favor of obscuring Python's I/O model too much. When working with C extensions, it is critical to have access to normal I/O mechanisms such as 'FILE *' or integer file descriptors. If you hide all of this behind some sort of abstract I/O layer, it's going to make life hell for extension writers unless you also provide a way to get access to the raw underlying data structures. This is a major gripe I have with the Tcl channel model--namely, there seems to be no easy way to unravel a Tcl channel into a raw file-descriptor for use in C (unless I'm being dense and have missed some simple way to do it). Also, what platforms are we talking about here? I've never come across any normal machine that had a C compiler, but did not have stdio. Is this really a serious problem? Cheers, Dave

Fredrik Lundh

7:27 p.m.

David Beazley wrote:

...

I am not in favor of obscuring Python's I/O model too much. When working with C extensions, it is critical to have access to normal I/O mechanisms such as 'FILE *' or integer file descriptors. If you hide all of this behind some sort of abstract I/O layer, it's going to make life hell for extension writers unless you also provide a way to get access to the raw underlying data structures. This is a major gripe I have with the Tcl channel model--namely, there seems to be no easy way to unravel a Tcl channel into a raw file-descriptor for use in C (unless I'm being dense and have missed some simple way to do it).

Also, what platforms are we talking about here? I've never come across any normal machine that had a C compiler, but did not have stdio. Is this really a serious problem?

in a way, it is a problem today under Windows (in other words, on most of the machines where Python is used today). it's very easy to end up with different DLL's using different stdio implementations, resulting in all kinds of strange errors. a rewrite could use OS-level handles instead, and get rid of that problem. not to mention Windows CE (iirc, Mark had to write his own stdio-ish package for the CE port), maybe PalmOS, BeOS's BFile's, and all the other upcoming platforms which will make Windows look like a fairly decent Unix clone ;-) ... and in Python, any decent extension writer should write code that works with arbitrary file objects, right? "if it cannot deal with StringIO objects, it's broken"... </F>

David Beazley

7:53 p.m.

Fredrik Lundh writes:

...

and in Python, any decent extension writer should write code that works with arbitrary file objects, right? "if it cannot deal with StringIO objects, it's broken"...

I disagree. Given that a lot of people use Python as a glue language for interfacing with legacy codes, it is unacceptable for extensions to be forced to use some sort of funky non-standard I/O abstraction. Unless you are volunteering to rewrite all of these codes to use the new I/O model, you are always going to need access (in one way or another) to plain old 'FILE *' and integer file descriptors. Of course, one can always just provide a function like FILE *PyFile_AsFile(PyObject *o) That takes an I/O object and returns a 'FILE *' where supported. (Of course, if it's not supported, then it doesn't matter if this function is missing since any extension that needs a 'FILE *' wouldn't work anyways). Cheers, Dave

Fredrik Lundh

8:04 p.m.

...

...
and in Python, any decent extension writer should write code that works with arbitrary file objects, right? "if it cannot deal with StringIO objects, it's broken"...

I disagree. Given that a lot of people use Python as a glue language for interfacing with legacy codes, it is unacceptable for extensions to be forced to use some sort of funky non-standard I/O abstraction.

oh, you're right, of course. should have added that extra smiley to that last line. cut and paste from this mail if necessary: ;-)

...

Unless you are volunteering to rewrite all of these codes to use the new I/O model, you are always going to need access (in one way or another) to plain old 'FILE *' and integer file descriptors. Of course, one can always just provide a function like

FILE *PyFile_AsFile(PyObject *o)

That takes an I/O object and returns a 'FILE *' where supported.

exactly my idea. when scanning the code, PyFile_AsFile immediately popped up as a potential pothole (if you need the fileno, there's already a method for that in the "standard file object interface"). btw, an "abstract file object" could actually make it much easier to support arbitrary file objects from C/C++ extensions. just map the calls back to Python. or add a tp_file slot, and things get really interesting...

...

(Of course, if it's not supported, then it doesn't matter if this function is missing since any extension that needs a 'FILE *' wouldn't work anyways).

yup. I suspect some legacy code may have a hard time running under CE et al. but of course, with a little macro trickery, no- thing stops you from recompiling such code so it uses Python's new "abstract file... okay, okay, I'll stop now ;-) </F>

David Beazley

8:13 p.m.

Fredrik Lundh writes:

...

...
...
and in Python, any decent extension writer should write code that works with arbitrary file objects, right? "if it cannot deal with StringIO objects, it's broken"...

I disagree. Given that a lot of people use Python as a glue language for interfacing with legacy codes, it is unacceptable for extensions to be forced to use some sort of funky non-standard I/O abstraction.

oh, you're right, of course. should have added that extra smiley to that last line. cut and paste from this mail if necessary: ;-)

Good. You had me worried there for a second :-).

...

yup. I suspect some legacy code may have a hard time running under CE et al. but of course, with a little macro trickery, no- thing stops you from recompiling such code so it uses Python's new "abstract file... okay, okay, I'll stop now ;-)

Macro trickery? Oh yes, we could use that too... (one can never have too much macro trickery if you ask me :-) Cheers, Dave

Guido van Rossum

12:19 p.m.

[me]

...

...
If it works on Linux, Solaris, Irix and Windows, and is reasonably clean, I'll take it. Please send it.

[Greg]

...

Actually, my preference is to see a change to open() rather than a whole new module. For example, let's say that you open a file, specifying memory-mapping. Then you create a buffer against that file:

f = open('foo','rm') # 'm' means mem-map b = buffer(f) print b[100:200]

Buh. Changes of this kind to builtins are painful, especially since we expect that this feature may or may not be supported. And imagine the poor reader who comes across this for the first time... What's wrong with import mmap f = mmap.open('foo', 'r') ???

...

I'd like to see mmap native in Python. I won't push, though, until I can run a test to see what kind of savings will occur when you mmap a .pyc file and open PyBuffer objects against the thing for the code bytes. My hypothesis is that you can reduce the working set of Python (i.e. amortize the cost of a .pyc's code over several processes by mmap'ing it); this depends on the proportion of code in the pyc relative to "other" stuff.

We've been through this before. I still doubt it will help much. Anyway, it's a completely independent feature from making the mmap module(any mmap module) available to users. --Guido van Rossum (home page: http://www.python.org/~guido/)

9400

Age (days ago)

9401

Last active (days ago)

List overview

Download

14 comments

7 participants

participants (7)

David Ascher
David Beazley
Fredrik Lundh
Greg Stein
Greg Ward
Guido van Rossum
Mark Hammond

mmap

David Ascher

Guido van Rossum

Greg Stein

Fredrik Lundh

Guido van Rossum

Fredrik Lundh

Greg Ward

Mark Hammond

Fredrik Lundh

David Beazley

Fredrik Lundh

David Beazley

Fredrik Lundh

David Beazley

Guido van Rossum

tags

participants (7)