Problem with tuple - hopefully clear this time
Steven Citron-Pousty
steven.citron-pousty at yale.edu
Mon Jan 15 15:27:49 EST 2001
Sorry all about my previous incomprehensible message. Honestly, I tried
reading
the woodrat and the alligator but I can't get this too work.
I didn't want to send code before because I thought everyone would get
POed at
me sending code. Obviously my psuedocode was not helpful AT ALL.
Here is my code - remember think newbie and don't slam me too hard, its
one of those days. Thanks again for any help
Steve
import os
import sys
import string
import re
#numfields is the number of fields to potentially parse
numfields = 41
""" THIS LIST IS INCOMPETE - get complete list from the spreadsheet
ifields[0] is the name of the field
ifields[1] is the content of that field
so if we add a new item to the fields we have to add '' to the other
2 lists
"""
ifields = [['ID =', 'T = ', 'AU =', 'DIST =', 'DNUM =', 'ABS =',
'ARCH.FILTER = </B>I', 'ARCH.FILTER = </B>C', 'ARCH.FILTER = </B>N',
'ARCH.FILTER = </B>S', 'ARCH.FILTER = </B>E', 'CLASSIF',
'ICPSR.CLASSIF1', 'NACJD.CLASS', 'NACDA.CLASS' ,'SAMHDA.CLASS',
'IAED.CLASS', 'EXTENT.COLLECT', 'CLASSNO', 'SERIES.NAME', 'SERIES.INFO',
'RESTRICTIONS =', 'DATA.TYPE', 'TIME.PERIOD', 'DATE.OF.COLLECT',
'FUNDING.AGENCY', 'GRANT.NUMBER', 'DATA.SOURCE', 'EXTENT.PROCESS',
'DATA.FORMAT', 'COLLECT.NOTE', 'SAMPLING =', 'UNIVERSE =',
'RELATED.PUBS', 'CITATION =', 'KEYWORDS =', 'DIR =', 'CHAPTER =',
'SECTION =', 'SUBSECTION =','SUBSUB'],
['','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','','']]
#read the files and set up file for writing
dir = os.listdir('D:\\statlab\\ssda\\data') #read in the list of files
in this directory
"""open the file"""
for f in dir:
try:
fileproc = open('D:\\statlab\\ssda\\data\\'+f, 'r')
except IOError:
print 'Can\'t open file for reading.'
sys.exit(0)
#create a new iflieds to store the data. More important for looping
#through the directory
result = ifields
#read the file into a list
text = fileproc.readlines()
#loop through and find an occurnce of a tag
#if you find a tag write it to the field
#if you don't find a tag write it to the previous found field
for i in text:
jindex = 0
found = 0
for j in result[0]:
if (i.rfind(j)==0) or (i.rfind(j)==1) or (i.rfind(j)==3):
if i.rfind("=")+2 == ' ':
where = i.rfind("=")+3
else:
where = i.rfind("=")+2
result[1][jindex] = i[where:-1]
oldindex = jindex
found = 1
# need to write a test for ; at the end
if result[1][jindex][-1] == ";":
result[1][jindex] = i[where:-2]
break
elif (i.rfind(j)==6) or (i.rfind(j)==7):
if i.rfind("=")+6 == ' ':
where = i.rfind("=")+7
else:
where = i.rfind("=")+6
result[1][jindex] = i[where:-1]
found = 1
oldindex = jindex
if result[1][jindex][-1] == ";":
result[1][jindex] = i[where:-2]
break
jindex += 1
if i.rfind("BV =") != -1:
break
if (found != 1) and (i.rfind("BV =") != -1):
result[1][oldindex] += i[0:-1]
found = 0
crap = 0
while crap < numfields :
if result[1][crap]:
print "field # ", result[0][crap], " ", result[1][crap]
crap += 1
___________________________________________________________________________________________________
###Sample data file
ID = 1588;
T = Census Of Population And Housing, 1980 [United States]: Summary Tape
File 1A;
AU = United States Department of Commerce. Bureau of the Census.;
DIST = ICPSR;
DNUM = 07941;
ABS = Summary Tape File 1 consists of four sets of computer-readable
data files containing detailed tabulations of the nation's population
and housing characteristics produced from the 1980 Census. This series
is comprised of Summary Tape File 1A (STF1A), Summary Tape File 1B
(STF1B), Summary Tape File 1C (STF1C), and Summary Tape File 1D (STF1D).
STF1A, STF1B, and STF1D have 52 separate files, one for each state,
Puerto Rico, and the District of Columbia. STF1C consists of one
nation-wide datafile containing information about all
states. All files in the STF1 series are identical, containing 321
substantive data variables organized in the form of 59 ''tables,'' as
well as standard geographic identification variables. All of the data
items contained in all the STF 1 files were tabulated from the
''complete count'' or ''100%'' questions included on the 1980 Census
questionnaire. All four groups of files within the STF1 series have
identical record formats and technical characteristics and differ only
in the types of geographical areas for which the summarized data items
are presented. STF1A provides summaries for state or state equivalent,
county or county equivalent, minor civil division/census county division
(MCD/CCD), place or place segment within MCD/CCD or remainder of
MCD/CCD, census tract or block numbering area (BNA) or untracted segment
within place, place segment or remainder or MCD/CCD, block group (BG) or
BG segment or enumeration district (ED). An additional STF 1A file for
Outlying Areas is also available from ICPSR. This file contains data
specifically for the United States possessions: American Samoa, Guam,
Northern Mariana Islands, Trust Territory of the Pacific Islands, and
the Virgin Islands. The information contained in this file is similar to
but not identical with the data for the 50 states and is
documented in a separate codebook. All STF 1 files are being released on
a state-by-state ''flow'' basis, with the less populous states generally
being prepared and released before the most populous states. Each
''record'' in these files comprises 3,276 characters with two record
segments (physical records) of 1,638 characters each, the number of data
records in each file varies by state.
<P><B>CITATION = </B>U.S. Dept. of Commerce, Bureau of the Census.
CENSUS OF POPULATION AND HOUSING, 1980 [UNITED STATES]: SUMMARY TAPE
FILE 1A [Computer file]. Washington, DC: U.S. Dept. of Commerce, Bureau
of the Census [producer], 1982. Ann Arbor, MI: Inter-university
Consortium for Political and Social Research [distributor], 1983.;
DIR;
CHAPTER = Census Enumerations;
SECTION = Contemporary;
SUBSECTION = United States;
BV;
BV.TAPE;
FILE.NUMBER = 0;
NOVELL.LOC = h:\ssda\7941\da7941ct.dat;
NRECS = 8772;
LRECL = 1638;
DS.COMMENTS = ASCII data file: Connecticut;
;
More information about the Python-list
mailing list