[omaha] Fwd: Omaha post from academicedgar at gmail.com requires approval
Jeff Hinrichs - DM&T
jeffh at dundeemt.com
Sat Dec 1 05:50:12 CET 2007
Burch,
seems that you discarded your message as I went to approve it but it
wasn't there anymore.
Say that your data is in a file 'data.txt' , then to process that file
in to memory you could do something like:
thedata = [] # a list to hold the records
for line in file('data.txt'):
line = line.strip() #remove the newline char
rec = line.split('|') #split at '|' chars, returning a
#list ['1000045','NICHOLAS
FINANCIAL INC','10-Q','2007-02-14','edgar...'] as a record
thedata.append(rec)
# at this point thedata is a list of records(lists), let's iterate through it
for rec in thedata:
print rec
If you have a giant file to process you don't have to read it all into
memory to begin with, you could just process a record at a time. The
snippet you have does the same, but doesn't exploit python's
strengths. The 'for line in file():' idiom is very pythonic. As it
is very readable. I could combine some things to make it smaller, but
it would make it less readable.
I hope this answers your question and please feel free to join the email list.
-Jeff
---------- Forwarded message ----------
From: "Burch Kealey" <academicedgar at gmail.com>
To: omaha at python.org
Date: Fri, 30 Nov 2007 16:41:42 -0600
Subject: Help
I am a long-time SAS programmer. One of the reasons I have joined
this group is a desire to branch out to a new, hopefully more
versatile language. I am struggling to get my head around how the
language is structured. After trying to plow through a number of
books I have finally taken the plunge and am starting to try to find
ways to do things in PYTHON that I have been doing in SAS. I am still
trying to get over the basics and need some help. However, if you
read this question and think that this is not the forum for these
types of questions then I would appreciate any indication of the
appropriate forum.
Thanks
Burch
Here is an example of some raw data I am trying to input
1000045|NICHOLAS FINANCIAL
INC|10-Q|2007-02-14|edgar/data/1000045/0001193125-07-031642.txt
1000045|NICHOLAS FINANCIAL
INC|4|2007-01-18|edgar/data/1000045/0001144204- 07-002274.txt
1000045|NICHOLAS FINANCIAL
INC|8-K|2007-01-29|edgar/data/1000045/0001193125-07-015318.txt
1000045|NICHOLAS FINANCIAL INC|SC
13G/A|2007-02-14|edgar/data/1000045/0000950134-07-003261.txt
1000045|NICHOLAS FINANCIAL INC|SC
13G|2007-02-14|edgar/data/1000045/0000315066- 07-002043.txt
1000045|NICHOLAS FINANCIAL INC|SC
13G|2007-02-21|edgar/data/1000045/0000950135-07-000971.txt
1000069|EMPIRIC FUNDS,
INC|497|2007-01-31|edgar/data/1000069/0000894189-07-000334.txt
1000069|EMPIRIC FUNDS,
INC|N-Q|2007-02-22|edgar/data/1000069/0000894189- 07-000461.txt
1000069|TEXAS CAPITAL VALUE FUNDS
INC|485BPOS|2007-01-26|edgar/data/1000069/0000894189-07-000234.txt
I need to read the file and use the values as inputs to some later
tasks. The | is a delimitter. One of the things I think I want to do
is specific items from the list. I can make it happen to an output
file but after two days of trying (say ten hours total) I can't create
a list in memory of just one of the items.
Here is the code that will create what I need in a flat file:
import os
y=file('c:/newout4.txt','w')
x=file('C:/2007/QTR1/master.idx',"r")
obsname=[line.strip() for line in x.readlines()]
x3=[i.split('|',5) for i in obsname]
for thing in x3:
y.write('%s%s' % (thing[0],os.linesep))
y.close()
Here is one of my many attempts to create a list in memory
import os
x=file('C:/2007/QTR1/master.idx',"r")
obsname=[line.strip() for line in x.readlines()]
x3=[i.split('|',5) for i in obsname]
for thing in x3:
cik=[thing[0]]
Any suggestions?
Cheers
Burch
---------- Forwarded message ----------
From: omaha-request at python.org
To:
Date:
Subject: confirm 6f1d1eccfaea12b908759381f0510a7e2756cc45
If you reply to this message, keeping the Subject: header intact,
Mailman will discard the held message. Do this if the message is
spam. If you reply to this message and include an Approved: header
with the list password in it, the message will be approved for posting
to the list. The Approved: header can also appear in the first line
of the body of the reply.
--
Jeff Hinrichs
Dundee Media & Technology, Inc
jeffh at dundeemt.com
402.218.1473
More information about the Omaha
mailing list