Problem with tuple - hopefully clear this time

Steven Citron-Pousty steven.citron-pousty at yale.edu
Mon Jan 15 15:27:49 EST 2001


Sorry all about my previous incomprehensible message. Honestly, I tried
reading
the woodrat and the alligator but I can't get this too work.
I didn't want to send code before because I thought everyone would get
POed at
me sending code. Obviously my psuedocode was not helpful AT ALL.
Here is my code - remember think newbie and don't slam me too hard, its
one of those days. Thanks again for any help
Steve

import os
import sys
import string
import re

#numfields is the number of fields to potentially parse
numfields = 41

"""  THIS LIST IS INCOMPETE - get complete list from the spreadsheet
   ifields[0] is the name of the field
    ifields[1] is the content of that field
   so if we add a new item to the fields we have to add '' to the other
2 lists
   """
ifields = [['ID =', 'T = ', 'AU =', 'DIST =', 'DNUM =', 'ABS =',
'ARCH.FILTER = </B>I', 'ARCH.FILTER = </B>C', 'ARCH.FILTER = </B>N',
'ARCH.FILTER = </B>S', 'ARCH.FILTER = </B>E', 'CLASSIF',
'ICPSR.CLASSIF1', 'NACJD.CLASS', 'NACDA.CLASS' ,'SAMHDA.CLASS',
'IAED.CLASS', 'EXTENT.COLLECT', 'CLASSNO', 'SERIES.NAME', 'SERIES.INFO',

'RESTRICTIONS =', 'DATA.TYPE', 'TIME.PERIOD', 'DATE.OF.COLLECT',
'FUNDING.AGENCY', 'GRANT.NUMBER', 'DATA.SOURCE', 'EXTENT.PROCESS',
'DATA.FORMAT', 'COLLECT.NOTE', 'SAMPLING =', 'UNIVERSE =',
'RELATED.PUBS', 'CITATION =', 'KEYWORDS =', 'DIR =', 'CHAPTER =',
'SECTION =', 'SUBSECTION =','SUBSUB'],

['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']]

#read the files and set up file for writing
dir = os.listdir('D:\\statlab\\ssda\\data') #read in the list of files
in this directory


"""open the file"""
for f in dir:
    try:
        fileproc = open('D:\\statlab\\ssda\\data\\'+f, 'r')
    except IOError:
        print 'Can\'t open file for reading.'
        sys.exit(0)

#create a new iflieds to store the data. More important for looping
    #through the directory


    result = ifields


    #read the file into a list
    text = fileproc.readlines()



    #loop through and find an occurnce of a tag
    #if you find a tag write it to the field
    #if you don't find a tag write it to the previous found field
    for i in text:
        jindex = 0
        found = 0
        for j in result[0]:
            if (i.rfind(j)==0) or (i.rfind(j)==1) or (i.rfind(j)==3):
               if i.rfind("=")+2 == ' ':
                   where = i.rfind("=")+3
               else:
                   where = i.rfind("=")+2
               result[1][jindex] = i[where:-1]
               oldindex = jindex
               found = 1
    # need to write a test for ; at the end
               if result[1][jindex][-1] == ";":
                   result[1][jindex] = i[where:-2]
               break
            elif (i.rfind(j)==6) or (i.rfind(j)==7):
               if i.rfind("=")+6 == ' ':
                   where = i.rfind("=")+7
               else:
                   where = i.rfind("=")+6
               result[1][jindex] = i[where:-1]
               found = 1
               oldindex = jindex
               if result[1][jindex][-1] == ";":
                   result[1][jindex] = i[where:-2]
               break
            jindex += 1
            if i.rfind("BV =") != -1:
               break
        if (found != 1) and (i.rfind("BV =") != -1):
            result[1][oldindex] += i[0:-1]
            found = 0




    crap  = 0
    while crap < numfields :
        if result[1][crap]:
            print "field # ", result[0][crap], " ", result[1][crap]
        crap += 1

___________________________________________________________________________________________________

###Sample data file

ID = 1588;
T = Census Of Population And Housing, 1980 [United States]: Summary Tape

File 1A;
AU = United States Department of Commerce. Bureau of the Census.;
DIST = ICPSR;
DNUM = 07941;
ABS = Summary Tape File 1 consists of four sets of computer-readable
data files containing detailed tabulations of the nation's population
and housing characteristics produced from the 1980 Census. This series
is comprised of Summary Tape File 1A (STF1A), Summary Tape File 1B
(STF1B), Summary Tape File 1C (STF1C), and Summary Tape File 1D (STF1D).

STF1A, STF1B, and STF1D have 52 separate files, one for each state,
Puerto Rico, and the District of Columbia. STF1C consists of one
nation-wide datafile containing information about all
states. All files in the STF1 series are identical, containing 321
substantive data variables organized in the form of 59 ''tables,'' as
well as standard geographic identification variables. All of the data
items contained in all the STF 1 files were tabulated from the
''complete count'' or ''100%'' questions included on the 1980 Census
questionnaire. All four groups of files within the STF1 series have
identical record formats and technical characteristics and differ only
in the types of geographical areas for which the summarized data items
are presented. STF1A provides summaries for state or state equivalent,
county or county equivalent, minor civil division/census county division

(MCD/CCD), place or place segment within MCD/CCD or remainder of
MCD/CCD, census tract or block numbering area (BNA) or untracted segment

within place, place segment or remainder or MCD/CCD, block group (BG) or

BG segment or enumeration district (ED). An additional STF 1A file for
Outlying Areas is also available from ICPSR. This file contains data
specifically for the United States possessions: American Samoa, Guam,
Northern Mariana Islands, Trust Territory of the Pacific Islands, and
the Virgin Islands. The information contained in this file is similar to

but not identical with the data for the 50 states and is
documented in a separate codebook. All STF 1 files are being released on

a state-by-state ''flow'' basis, with the less populous states generally

being prepared and released before the most populous states. Each
''record'' in these files comprises 3,276 characters with two record
segments (physical records) of 1,638 characters each, the number of data

records in each file varies by state.
<P><B>CITATION = </B>U.S. Dept. of Commerce, Bureau of the Census.
CENSUS OF POPULATION AND HOUSING, 1980 [UNITED STATES]: SUMMARY TAPE
FILE 1A [Computer file]. Washington, DC: U.S. Dept. of Commerce, Bureau
of the Census [producer], 1982. Ann Arbor, MI: Inter-university
Consortium for Political and Social Research [distributor], 1983.;
DIR;
CHAPTER = Census Enumerations;
SECTION = Contemporary;
SUBSECTION = United States;
BV;
BV.TAPE;

FILE.NUMBER = 0;
NOVELL.LOC = h:\ssda\7941\da7941ct.dat;
NRECS = 8772;
LRECL = 1638;
DS.COMMENTS = ASCII data file: Connecticut;

;








More information about the Python-list mailing list