[Tutor] discrepency when using os.path.getsize(line)

Rick Pasotto rick@niof.net
Sat Aug 2 19:27:02 2003


On Sat, Aug 02, 2003 at 07:11:19PM -0400, Paul Tremblay wrote:
> When I use os.path.getsize, I come up with a different value than when
> I use the linux command "du -sb". 
> 
> I am using these two commands in a script to backup. 
> 
> At the beginning of my script, I use:
> 
> du -sb /
> 
> in order to get the size of all the files on my hard drive. I get a
> size that converts to 4.0 Gibabytes.
> 
> I then use 
> 
> find / > list
> 
> To get all the files on my hard drive. 
> 
> I then run this list through a module prun.py, in order to prune out
> junk files. The paths that are pruned out go into their separate file,
> called "exclude".
> 
> So now I have two files of path names. If I add up the size of the two
> lists, it should equal the size from the "du -sb" command.
> 
> However, it does not. In fact, I get a difference that is as great as
> +60 perecent when I test my script on a small directory. If I test the
> script on my whole hard drive, I get a difference of -9 percent.
> 
> If I do a test run on a small directory, and check a CD to see that
> all the files have made it to the CD, they have. So it seems that all
> the files that are supposed to be backed up are getting backed up. 
> 
> It seems that the problem is that getting the size of each individul
> file using os.path.getsize will yield a huge difference than when
> using "du". 
> 
> Anybody have an idea of why this discrepency occurrs? 

Have you taken into account links -- both hard and soft?

Another thing to consider is that 'du' reports the actual disk usage so
if your disk blocks are 1k then eight little 128 byte files will total
only 1024 bytes but will actually use 8 * 1024 = 8192 bytes.

-- 
"Moderation in temper is always a virtue; but moderation in principle
 is always a vice." -- Thomas Paine, _The Rights of Man_ (1791)
    Rick Pasotto    rick@niof.net    http://www.niof.net