[Patches] [ python-Patches-1587674 ] Patch for #1586414 to avoid fragmentation on Windows

SourceForge.net noreply at sourceforge.net
Sat Dec 23 20:03:05 CET 2006


Patches item #1587674, was opened at 2006-10-31 06:05
Message generated for change (Comment added) made by gustaebel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Enoch Julias (enochjul)
>Assigned to: Lars Gustäbel (gustaebel)
Summary: Patch for #1586414 to avoid fragmentation on Windows

Initial Comment:
Add a call to file.truncate() to inform Windows of the
size of the target file in  makefile(). This helps
guide cluster allocation in NTFS to avoid fragmentation.

----------------------------------------------------------------------

>Comment By: Lars Gustäbel (gustaebel)
Date: 2006-12-23 20:03

Message:
Logged In: YES 
user_id=642936
Originator: NO

Any progress on this one?

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-11-08 22:30

Message:
Logged In: YES 
user_id=642936

You both still fail to convince me and I still don't see
need for action. The only case ATM where this addition makes
sense (in your opinion) is the Windows OS when using the
NTFS filesystem and certain conditions are met. NTFS has a
preallocation algorithm to deal with this. We don't know if
there is any advantage on FAT filesystems.

On Linux for example there is a plethora of supported
filesystems. Some of them may take advantage, others may
not. Who knows? We can't even detect which filesystem type
we are currently writing to. Apart from that, the behaviour
of truncate(arg) with arg > filesize seems to be
system-dependent.

So, IMO this is a very special optimization targeted at a
single platform. The TarFile class is easily subclassable,
just override the makefile() method and add the two lines of
code. I think that's what ActiveState's Python Cookbook is for.

BTW, I like my files to grow bit by bit. In case of an
error, I can detect if a file was not extracted completely
by comparing the file sizes. Furthermore, a file that grows
is more common and more what a programmer who uses this
module might expect.


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2006-11-08 17:33

Message:
Logged In: YES 
user_id=341410

I disagree with user gustaebel.  We should be adding
automatic truncate calls for all possible supported
platforms, in all places where it could make sense.  Be it
in tarfile, zipfile, where ever we can.  It would make sense
to write a function that can be called by all of those
modules so that there is only one place to update if/when
changes occur.  If the function were not part of the public
Python API, then it wouldn't need to wait until 2.6, unless
it were considered a feature addition rather than bugfix. 
One would have to wait on a response from Martin or Anthony
to know which it was, though I couldn't say for sure if
operations that are generally performance enhancing are
bugfixes or feature additions.

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-11-06 22:57

Message:
Logged In: YES 
user_id=642936

Personally, I think disk defragmenters are evil ;-) They
create the need that they are supposed to satisfy at the
same time. On Linux we have no defragmenters, so we don't
bother about it.

I think your proposal is some kind of a performance hack for
a particular filesystem. In principle, this problem exists
for all filesystems on all platforms. Fragmentation is IMO a
filesystem's problem and is not so much a state but more
like a process. Filesystem fragment over time and you can't
do anything about it. For those people who care, disk
fragmenter were invented. It is not tarfile.py's job to care
about a fragmented filesystem, that's simply too low level.

I admit that it is a small patch, but I'm -1 on having this
applied.

----------------------------------------------------------------------

Comment By: Enoch Julias (enochjul)
Date: 2006-11-06 18:19

Message:
Logged In: YES 
user_id=6071

I have not really tested FAT/FAT32 yet as I don't use these 
filesystems now.

The Disk Defragmenter tool in Windows 2000/XP shows the number of 
files/directories fragmented in its report.

NTFS does handle growing files, but the operating system can only do 
so much without knowing the size of the file. Extracting from 
archives consisting of only several files does not cause 
fragmentation. However, if the archive has many files, it is much 
more likely that the default algorithm will fail to allocate 
contiguous clusters for some files. It may also depend on the amount 
of free space fragmentation on a particular partition and whether 
other processes are writing to other files in the same partition.

Some details of the cluster allocation algorithm used in Windows can 
be found at http://support.microsoft.com/kb/841551.

----------------------------------------------------------------------

Comment By: Lars Gustäbel (gustaebel)
Date: 2006-11-01 16:27

Message:
Logged In: YES 
user_id=642936

Is this merely an NTFS problem or is it the same with FAT fs?
How do you detect file fragmentation?
Doesn't this problem apply to all other modules or scripts
that write to file objects as well?
Shouldn't a decent filesystem be able to handle growing
files in a correct manner?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470


More information about the Patches mailing list