[Tutor] fake defrag revisited

Wed Oct 5 08:15:49 CEST 2011

On 10/01/2011 03:53 PM, R. Alan Monroe wrote:
>
>> You missed the rest of the paragraph.  Don't wait for the sort to
>> finish, you do the swapping in the compare function.
>
> Either I'm too dumb to understand your answer, or explained my
> original problem too poorly.
>
> Have you ever played the Rush Hour game where you're moving little
> cars around a parking lot to let the goal car escape?
>
> http://www.thinkfun.com/sites/default/files/images/product_library/rushhour/new/RushH-5000-LoResSpill.jpg?1294204060
>
> http://www.thinkfun.com/sites/default/files/images/product_howtoplay/rush-hour.jpg?1290652077
>
> i.e. you can't move the goal car until another car is out of the way,
> but you can't move THAT car until a third car is out of the way, etc.
>
> That's the problem I'm facing, and I don't think a sort will solve it.
> I'm wondering now if it will have to be more of a recursive
> backtracking thing, that will reject a sequence of moves that would
> have overwritten an as-yet-unmoved piece.
>
> I'm using 2.x, and yeah, I have used cmp lambda functions before. It's
> how I got the sorted-by-second-element list in my followup email.
>
> (requires pygame)
> http://javajack.dynalias.net/defrag/defrag008.py
>
> Alan
>

I never saw this message, since you didn't CC it to the python-tutor group.

Problem with recursive backtracking is that you have to be sure the 
maximum recursive depth is manageable.

Don't use a lambda function, use a real def function.  (Lambdas are just 
expressions, so they can't do real if statements). Your function will be 
called by the sort function to compare two items.  And if they should be 
swapped, it'll swap them after you return. So you do the swap on your 
display to match what you know the sort is going to do.

Since all the moves are swaps, it'll be guaranteed to be in a sequence 
that converges on the correct final order.  Will it be the minimum 
number of moves? Definitely not.  But that wasn't a requirement, and if 
it were, you wouldn't start by building that list of tuples.

I believe producing a fake cmp-function callback is the minimum code to 
solve the problem, but not the minimum thinking.  But if you have 
different constraints, tell us about them.  If you want speed, just 
update the display to its final state.

If you're trying to realistically simulate an actual defrag program, 
then design that program.  But such a program might not have enough ram 
to build the table you're assuming, where you know before starting what 
the final ordering will be.

If I were writing a real defragger, I'd have other costraints.  It'd 
have to work with huge disks, and relatively small memory.   And it 
would have the constraint that if it would crash part way through, at 
most one file or metafile would be in an invalid state, and that file 
would be readily recovered by a utility that you run afterwards.  If you 
think that's too restrictive, because you should do a backup first, then 
my defragger is easy.  Backup, erase the entire drive, and restore. 
Fast and easy.  But notice it required 50% free space, since the backup 
media had to be big enough to hold it all.

I can come up with several other algorithms, depending on assumptions. 
For example, you examine a pair of blocks, and if the abs-sum of the 
dstances they are from their targets will decrease by swapping, then do 
so.  Otherwise, leave them alone, and pick a different pair. 
Independently keep track of the sum of all those distances.  When that 
sum reaches zero, you're done.  Now the only trick is systematically 
picking pairs to process.

Another one, start by moving all files upwards, till all the free space 
is at the beginning.  Then sort the files by size, reversed, and move 
them towards the beginning.  As long as there is enough free space to 
store the biggest file, this will converge in one double pass. 
Otherwise, it must be repeated.  Trick is that you define the disk as 
just a little smaller next pass, starting wherever the first file didn't 
fit.  So after this next pass, about twice as much of that first file 
will be contiguous.  So the number of passes is no more than max file 
size divided by free space.

-- 

DaveA