[python-win32] Fwd: First line is a mix of text characters and binary, after the first line all is binary.

Tue May 5 18:29:10 CEST 2009

Khalid Moulfi wrote:
>
> Hi all,
> I come back on my issue :
> Actually, the work I have to do is to process 35000 files.
> On each of these files, I have to update only the first line, but keep
> the remaining (Remaining is all but binary characters).
> The issue is that the file after the second line is full of strange
> chararcters.
> I need to keep the same file but update only the first line.
> What is also strange is that when I read the file line by line, I do
> not get the same line when writing it onto a new file. (I used
> open(file,'rb') and write it to a new file but neither the size nor
> the number of lines is the same)

You MUST STOP thinking of this file as having "lines".  It does NOT have
"lines".  It is a binary file, plain and simple.  You cannot read it
with line-oriented functions.  The first part of it happens to have
ASCII characters, but it is not a text file by any means.  And like any
binary file, you need to have a document that tells you about the
format.  I can certainly make an educated guess about the format, but
you have to have a spec somewhere.

Further, I already told you exactly how to handle these files.  The
advice I gave you earlier would work perfectly well with this larger
file.  The file is divided into sections by a 0x1C byte.  The first two
sections have those readable strings, the third (and beyond) is all
binary.  The first two sections are divided into records by 0x1D byte. 
There are no carriage returns and no line feeds -- hence, no lines.

So, you process it exactly like I told you before:

    f = open('sample_1005000190.nst','rb')
    # Divide into sections.  We only care about the first two, so we
leave the rest unparsed.
    sections = f.read().split( '\x1C', 2 )
    # Sections is now a list with three elements: the 1. section, the 2.
section, and the binary section.
    # Divide the first two sections into records.
    sect1 = sections[0].split( '\x1D' )
    sect2 = sections[1].split( '\x1D' )
    # Now I can add or remove parts in either of the first two sections.
    # ...
    # Now, I'm ready to write my changed parts back to a new file.
    part1 = '\x1D'.join( sect1 )
    part2 = '\x1D'.join( sect2 )
    recreate = '\x1C'.join(  [part1, part2, sections[-1]] )
    open( 'newfile.nst','wb' ).write( recreate )

-- 
Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.