[Tutor] Is this possible and should it be done?

Dave Angel d at davea.name
Mon May 21 15:44:50 CEST 2012

On 05/21/2012 06:38 AM, wolfrage8765 at gmail.com wrote:
> All, I have had a curious idea for awhile, and was wondering the best
> way to implement it in Python and if it is even possible. The concept
> is this, a file that is actually a folder that contains multiple files
> (Like an Archive format). The actual files are really un-important.
> What I want is for the folder to be represented as a single file by
> any normal file browser, but to be able to access the files with-in
> via Python. I will actually use the word archive to represent my
> mystical folder as a file concept for the rest of this message. Some
> additional things I would like to be possible: is for multiple copies
> of the program to write to the same archive, but different files
> with-in at the same time (Reading & Writing to the archive should not
> lock the archive as long as they are different files); and for just
> the desired files with-in the archive to be loaded to memory with out
> having to hold the entire archive in memory.
> Use case for these additional capabilities. I was reading about how
> some advanced word processing programs (MS Word) actually save
> multiple working copies of the file with-in a single file
> representation and then just prior to combining the working copies it
> locks the original file and saves the working changes. That is what I
> would like to do. I want the single file because it is easy for a user
> to grasp that they need to copy a single file or that they are working
> on a single file, but it is not so easy for them to grasp the multiple
> file concepts.
> MS Word uses Binary streams as shown here:
> http://download.microsoft.com/download/5/0/1/501ED102-E53F-4CE0-AA6B-B0F93629DDC6/WindowsCompoundBinaryFileFormatSpecification.pdf
> Is this easy to do with python? Does it prevent file locking if you
> use streams? Is this worth the trouble, or should I just use a
> directory and forget this magical idea?
> A piece of reference for my archive thoughts, ISO/IEC 26300:2006 chapter 17.2

When I first read your description, I assumed you were talking about the
streams supported by NTFS, where it's possible to store multiple
independent, named, streams of data within a single file.  That
particular feature isn't portable to other operating systems, nor even
to non-NTFS systems.  And the user could get tripped up by copying what
he thought was the whole file, but in fact was only the unnamed stream.

However, thanks for the link.  That specification describes building an
actual filesystem inside the file, which is a lot of work.  It does not
mention anything about stream locking, and my experience with MSWORD
(which has been several years ago, now) indicates that the entire
archive is locked whenever one instance of MSWORD is working on it.

I think if you're trying to do something as flexible as MSWORD
apparently does (or even worse - adding locking to it) you're asking for
subtle bugs and race conditions.  If the data you were storing is
structured such that you can simplify the MS approach, then it may be
worthwhile.  For example, if each stream is always 4k.



More information about the Tutor mailing list