Puzzled: Y am I ending up with extra bytes?
bokr at oz.net
Sat Feb 23 15:50:19 EST 2002
On 23 Feb 2002 19:23:44 GMT, "A.Newby" <deathtospam43423 at altavista.com> wrote:
>Why is this happening? I read large chunks of data from a text file,
>according to byte locations specified on another file, and for some reason,
>this function (below), spits out a few extra bytes.
>Here's the code, as entered into the Python shell......
> index = map(string.rstrip, open('D:\cgi-bin\indx.txt').readlines())
> #this opens the index file, which has precise byte locations of each
> #chunk of data I want to extract from the log.txt file, and turns it
> #into a list.
> def fish(end, start, deduct):
> chat = open('D:\cgi-bin\log.txt', 'r')
> g = int(index[end]) - int(index[start])
> print chat.read(g - deduct)
>Now, if I enter the following into the command line ...
>fish(205, 204, 0)
>,,, I get about four extra characters. God knows why! So that's why I try
>fish(205, 204, 4)
>... And it seems to work perfectly. I can even "fish" up to about about ten
>"index" lines with it. But as soon as I try and fish out any more than
>that, I get the dreaded extra 4 bytes of code again. Why?
>Now I know what you're thinking. You're thinking that perhaps my index file
>is corrupted, and hasn't got an accurate account of precisely what's in the
>log file. I suspected that might be the case myself, but ... when I try
>fishing out each chunk of data individually, it works fine! I can even do
>for x in range(1, 90):
> fish(211+x, 210 + x, 4)
>...... without ending up with that extra data I don't want. However, this
>method is too slow for my purposes. Plus, I really wanna know what it is
>that's going wrong.
>Can anyone spot it?
If your index was created by finding something prefixed to the substring
you want, that gives a start position, but what about the end? If there is
a prefix for the next item, the difference between two indices will include
a prefix and possibly extra stuff, like CR/LF etc:
...<prefix>what you want<extra><prefix>what you want next<extra2><prefix ...
|<----- wrong length ------>|
Just a guess. Try printing enough raw data to see what's there,
e.g., print `fish(206,204,0)` and some other examples (note backquotes).
BTW, you might be able to map int instead of string.rstrip and get a list
of integers that you can use directly instead of converting them later.
More information about the Python-list