[Tutor] help converting a Sequential file to an indexed file (LONGISH)

Thu, 20 Apr 2000 11:30:28 -0500

Dan -

It depends a bit on the size of your serial file(s).  You can either
read the complete file into a list via the "readlines()" method, or just
read it line-by-line using the "readline()" method.  Although it's quite
a bit slower, I personally prefer the latter on files whose size I don't
know the EOF of.

Once you have (or are reading) your serial data, you can populate a
dictionary with the values, providing that field you spoke of is a
unique key -- dictionaries are awesome for this.  If it's not a unique
key, then you'll need to find a way to keep the duplicate entries.

You don't HAVE to read stuff into a list, of course -- you could read
the file and write the new record directly -- but it helps not only for
speed, but for doing "stuff" with the info you read.  (Totalling items,
counting records, etc.)

Once you have (even part of) your list populated, you can begin writing
it to a new file using the Key of Your Choice.

Here's a script that illustrates reading from a serial file, then using
a dictionary to sort/total some stuff.  I used methods from the "string"
module to separate the fields: columns 1-n = variable #1, cols n+1-y =
variable #2, etc.  If my values had been space-delimited, then I could
have used the "split" method to excellent effect, but <sigh> they
weren't, so I had to slice the string "r" (record) up.

for line in fileinput.input():
    r = string.strip(line)
    if r[0:5] == "Logon":                        # Detail record?
       rdate  = "0" + string.strip(r[07:14])
       rdate  = rdate[-7:]
       rdev   = int(string.strip(r[23:27]))
       rlogon = string.strip(r[27:62])
       rname  = string.strip(r[63:94])
       hist[rdev] = (rdate, rlogon, rname)       # Load the dictionary
       if rdate <> x:                            # Where are we at?
          x = rdate
          print rdate
       if not totals.has_key(rdev): totals[rdev] = 0
       totals[rdev] = totals[rdev] + 1

keylist = hist.keys()
keylist.sort()

for ldev in keylist:
    rdate, rlogon, rname = hist[ldev]
    print "%04u %5u %7s %s" % (ldev, totals[ldev], rdate, rlogon)

The file was a list of logons I wanted to summarize by device.  I read
each record from the file (specified on the command line), and if it's
one of the details records I'm looking for (it has the word "Logon:" at
the beginning) then I strip values out of it and assign them to
variables (so I can format them a tad bit more -- stripping space-padded
ends, chang a string to an integer, etc.)

Once I have my variables, I populate the "hist" dictionary using one of
the variables (the device used in logging on) as the unique key, with a
tuple of the variables as the information I want to record.  The device
key won't truly be unique, since many people log on/off using the same
device(s), so I end up overwriting the previous data using that key --
but since the data file is already sorted by date, I'll always have the
info on the *last* time that device was used.  (Automatic duplication
elimination -- How I *LOVE* dictionaries!)

The "if" just watches to see when I change days in my reading so it can
print the new date to the screen, which in turn keeps me from thinking
the script is hung.  (Fun With Auditing.)

Finally, I sum all the times I've seen that device used, and put the
count into another dictionary named "totals", using the device as a key
once more.  (Yep, coulda' put it into a tuple entry in the "hist"
dictionary, but I was thinking of adding more to "totals" later on, so I
kept "totals" separate.)

Once out of the loop, I create a List of the keys I used, then sort
that List.

Lastly, I read through that sorted List and pull the data out of the
dictionary as though the dictionary were a random file.  That info gets
prettily printed for the waiting human (that'd be me) and there ya go.

To write to a random-access file, just use the "seek" and "write"
methods for your file.  The "seek" method positions for the next read or
write.  Its usage is "seek(offset,where)" where "offset" is, well, an
offset, and "where" is a value affecting the offset.  It "where" is
omitted or zero, then the offset is relative to the beginning of the
file.  If "where" is 1, then the offset is relative to the current
postion, and if "where" is 2, then the offset is relative to the end of
the file.  IMPORTANT: the offset is in BYTES (*not* records).

Hope that helps.  As always, there's more than one way to do it, and
the above illustrates only one of those ways.  If anyone can offer more,
or ways to imporve upon the above, I'm open for it.

Thanks!
Curtis

>>> "Dan Howard" <howardinc@home.com> 04/03/00 07:58PM >>>
Hi all...
I'm a newbie to Python and would appreciate some advice on the best way
to
read a sequential file of data and map the contents to a direct access
file.

The sequential file has fixed size fields and each field is separated
by
spaces - some fields are filled with spaces as filler.

The first field in each record is the key that I'd like to use to
access the
records.

Any comments on best practices here would be most welcome
Thanks
Dan

_______________________________________________
Tutor maillist  -  Tutor@python.org 
http://www.python.org/mailman/listinfo/tutor

-----------------------------------------------------
Confidentiality Notice: This e-mail transmission 
may contain confidential or legally privileged 
information that is intended only for the individual 
or entity named in the e-mail address. If you are not 
the intended recipient, you are hereby notified that 
any disclosure, copying, distribution, or reliance 
upon the contents of this e-mail is strictly prohibited. 

If you have received this e-mail transmission in error, 
please reply to the sender, so that Covance can arrange 
for proper delivery, and then please delete the message 
from your inbox. Thank you.