Puzzled: Y am I ending up with extra bytes?

Bengt Richter bokr at oz.net
Sat Feb 23 15:50:19 EST 2002


On 23 Feb 2002 19:23:44 GMT, "A.Newby" <deathtospam43423 at altavista.com> wrote:

>Why is this happening? I read large chunks of data from a text file, 
>according to byte locations specified on another file, and for some reason, 
>this function (below), spits out a few extra bytes.
>
>Here's the code, as entered into the Python shell...... 
>
>
>    	index = map(string.rstrip, open('D:\cgi-bin\indx.txt').readlines())
>    	#this opens the index file, which has precise byte locations of each 
>    	#chunk of data I want to extract from the log.txt file, and turns it 
>    	#into a list.
>
>
>    	def fish(end, start, deduct):
>	    	chat = open('D:\cgi-bin\log.txt', 'r')
>	    	g = int(index[end]) - int(index[start])
>	    	chat.seek(int(index[start]))
>	    	print chat.read(g - deduct)
>
>
>
>Now, if I enter the following into the command line ...
>
>fish(205, 204, 0)
>
>,,, I get about four extra characters. God knows why! So that's why I try 
>...
>
>fish(205, 204, 4) 
>
>... And it seems to work perfectly. I can even "fish" up to about about ten 
>"index" lines with it. But as soon as I try and fish out any more than 
>that, I get the dreaded extra 4 bytes of code again. Why?
>
>Now I know what you're thinking. You're thinking that perhaps my index file 
>is corrupted, and hasn't got an accurate account of precisely what's in the 
>log file. I suspected that might be the case myself, but ... when I try 
>fishing out each chunk of data individually, it works fine! I can even do 
>this ...
>
>for x in range(1, 90):
>	fish(211+x, 210 + x, 4)
>
>...... without ending up with that extra data I don't want. However, this 
>method is too slow for my purposes. Plus, I really wanna know what it is 
>that's going wrong.
>
>Can anyone spot it?
>
If your index was created by finding something prefixed to the substring
you want, that gives a start position, but what about the end? If there is
a prefix for the next item, the difference between two indices will include
a prefix and possibly extra stuff, like CR/LF etc:
...<prefix>what you want<extra><prefix>what you want next<extra2><prefix ...
           ^-index[n]                  ^-index[n+1]
           |<----- wrong length ------>|

Just a guess. Try printing enough raw data to see what's there,
e.g., print `fish(206,204,0)` and some other examples (note backquotes).

BTW, you might be able to map int instead of string.rstrip and get a list
of integers that you can use directly instead of converting them later.

HTH
Regards,
Bengt Richter




More information about the Python-list mailing list