[omaha] This is the message I tried to send/post this morning
Jeff Hinrichs - DM&T
jeffh at dundeemt.com
Sat Dec 1 16:52:11 CET 2007
Burch,
First off, I'm not sure what is happening to your messages. I've told
mailman to forward me any email that it would have discarded so I can
figure out what is going on. I'll get this resolved as quickly as
possible.
Now back to your original question, so if I'm understanding you want
to create a list of fld2 entries. The first thing to clarify is that
lists are 0 based. so if you want (referring to the first rec in your
example) the "NICHOLAS FINANCIAL INC" that would be rec[1], if you
wanted the "10-Q" it would be rec[2]
so continuing on with my example code,
sublist = []
for rec in thedata:
sublist.append(rec[1])
#now sublist is a list of values and thedata is still the original list of lists
#['"NICHOLAS FINANCIAL INC','NICHOLAS FINANCIAL INC',...]
A list comprehension could be used as well, it would look like:
sublist = [rec[1] for rec in thedata]
I think the first example is more clear when you are new to python.
Both return the same sublist. For more on lists and list
comprehensions see http://docs.python.org/tut/node7.html 5.1.4 List
Comprehensions. The python tutorial section is pretty good but I
normally recommend Dive in to Python
http://www.diveintopython.org/toc/index.html as one of the best
on-line resources for python.
Now if you want a sublist of complete records when it is a '10-Q'
record you would use filtering
sublist = [rec for rec in thedata if rec[2] == '10-Q']
Now sublist is a list of 10-Q records, which from your example data
would have 1 element. You can replace the rec[2] == '10-Q' test with
any test or user defined function that returns a True/False value.
sublist = [rec for rec in thedata if rec[2] == '10-Q']
is equivalent to
sublist = []
for rec in thedata:
if rec[2] == '10-Q':
sublist.append(rec[1])
Because I'm simple minded I would initially write my code in the
looping structure I just showed and then re-factor using the list
comprehension later. I consider it a speed optimization. Not to go
into detail but the comprehension is more efficient internally
something I don't worry about except in extreme cases. (I believe you
should always optimize for readability first.)
On Dec 1, 2007 8:45 AM, Burch Kealey <bkealey at mail.unomaha.edu> wrote:
>
> Burch T. Kealey, PhD.
> RH-CBA 408-N
> University of Nebraska at Omaha
> 6000 Dodge Street
> Omaha Nebraska 68104
>
> 402-554-3571
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law. If
> you are not the intended recipient, you should delete this message. Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
>
> -----Forwarded by Burch Kealey/FACSTAFF/UNO/UNEBR on 12/01/2007 08:45AM
> -----
>
> To: Omaha Python Users Group <omaha at python.org>
> From: Burch Kealey/FACSTAFF/UNO/UNEBR
> Date: 12/01/2007 07:57AM
> Subject: Re: [omaha] Fwd: Omaha post from academicedgar at gmail.com requires
> approval
>
>
> Hi Jeff
>
> Thanks for your reply. After I saw the bounce message I realized I had sent
> it from the wrong account but for some reason-I am not sure messages from
> this account are getting through as I sent it again but it has never shown
> back up in my in box.
>
> I think from reading your code I am just getting back my original input file
> as a list of lists-the input file is now a list and each line has been
> transformed into a list. My inelegant solution gets me there (x3 is a list
> of lists] My ultimate goal though is to pull fields from the records. One
> of the things I see myself doing is creating logic based on values of
> particular fields in each record. In the simple case I would want to create
> a new list with just say the first (or second . . .) field from each record.
> So how do I selectively subset items in the x3 list say I want all i in
> x3[i][2]-I think that is all rows in x3, field 2?
>
> Cheers
>
> Burch
>
>
>
> -----omaha-bounces at python.org wrote: -----
>
> To: "Omaha Python Users Group" <omaha at python.org>
> From: "Jeff Hinrichs - DM&T" <jeffh at dundeemt.com>
> Sent by: omaha-bounces at python.org
> Date: 11/30/2007 10:50PM
> cc: academicedgar at gmail.com
> Subject: [omaha] Fwd: Omaha post from academicedgar at gmail.com requires
> approval
>
> Burch,
> seems that you discarded your message as I went to approve it but it
> wasn't there anymore.
>
> Say that your data is in a file 'data.txt' , then to process that file
> in to memory you could do something like:
>
> thedata = [] # a list to hold the records
> for line in file('data.txt'):
> line = line.strip() #remove the newline char
> rec = line.split('|') #split at '|' chars, returning a
> #list ['1000045','NICHOLAS
> FINANCIAL INC','10-Q','2007-02-14','edgar...'] as a record
> thedata.append(rec)
>
> # at this point thedata is a list of records(lists), let's iterate through
> it
>
> for rec in thedata:
> print rec
>
>
> If you have a giant file to process you don't have to read it all into
> memory to begin with, you could just process a record at a time. The
> snippet you have does the same, but doesn't exploit python's
> strengths. The 'for line in file():' idiom is very pythonic. As it
> is very readable. I could combine some things to make it smaller, but
> it would make it less readable.
>
>
> I hope this answers your question and please feel free to join the email
> list.
>
> -Jeff
>
>
> ---------- Forwarded message ----------
> From: "Burch Kealey" <academicedgar at gmail.com>
> To: omaha at python.org
> Date: Fri, 30 Nov 2007 16:41:42 -0600
> Subject: Help
> I am a long-time SAS programmer. One of the reasons I have joined
> this group is a desire to branch out to a new, hopefully more
> versatile language. I am struggling to get my head around how the
> language is structured. After trying to plow through a number of
> books I have finally taken the plunge and am starting to try to find
> ways to do things in PYTHON that I have been doing in SAS. I am still
> trying to get over the basics and need some help. However, if you
> read this question and think that this is not the forum for these
> types of questions then I would appreciate any indication of the
> appropriate forum.
>
> Thanks
>
> Burch
>
> Here is an example of some raw data I am trying to input
> 1000045|NICHOLAS FINANCIAL
> INC|10-Q|2007-02-14|edgar/data/1000045/0001193125-07-031642.txt
> 1000045|NICHOLAS FINANCIAL
> INC|4|2007-01-18|edgar/data/1000045/0001144204- 07-002274.txt
> 1000045|NICHOLAS FINANCIAL
> INC|8-K|2007-01-29|edgar/data/1000045/0001193125-07-015318.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G/A|2007-02-14|edgar/data/1000045/0000950134-07-003261.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G|2007-02-14|edgar/data/1000045/0000315066- 07-002043.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G|2007-02-21|edgar/data/1000045/0000950135-07-000971.txt
> 1000069|EMPIRIC FUNDS,
> INC|497|2007-01-31|edgar/data/1000069/0000894189-07-000334.txt
> 1000069|EMPIRIC FUNDS,
> INC|N-Q|2007-02-22|edgar/data/1000069/0000894189- 07-000461.txt
> 1000069|TEXAS CAPITAL VALUE FUNDS
> INC|485BPOS|2007-01-26|edgar/data/1000069/0000894189-07-000234.txt
>
>
> I need to read the file and use the values as inputs to some later
> tasks. The | is a delimitter. One of the things I think I want to do
> is specific items from the list. I can make it happen to an output
> file but after two days of trying (say ten hours total) I can't create
> a list in memory of just one of the items.
>
> Here is the code that will create what I need in a flat file:
> import os
> y=file('c:/newout4.txt','w')
> x=file('C:/2007/QTR1/master.idx',"r")
> obsname=[line.strip() for line in x.readlines()]
> x3=[i.split('|',5) for i in obsname]
> for thing in x3:
> y.write('%s%s' % (thing[0],os.linesep))
> y.close()
>
> Here is one of my many attempts to create a list in memory
> import os
> x=file('C:/2007/QTR1/master.idx',"r")
> obsname=[line.strip() for line in x.readlines()]
> x3=[i.split('|',5) for i in obsname]
> for thing in x3:
> cik=[thing[0]]
>
> Any suggestions?
>
> Cheers
>
> Burch
>
>
>
> ---------- Forwarded message ----------
> From: omaha-request at python.org
> To:
> Date:
> Subject: confirm 6f1d1eccfaea12b908759381f0510a7e2756cc45
> If you reply to this message, keeping the Subject: header intact,
> Mailman will discard the held message. Do this if the message is
> spam. If you reply to this message and include an Approved: header
> with the list password in it, the message will be approved for posting
> to the list. The Approved: header can also appear in the first line
> of the body of the reply.
>
>
>
> --
> Jeff Hinrichs
> Dundee Media & Technology, Inc
> jeffh at dundeemt.com
> 402.218.1473
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> http://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>
>
>
--
Jeff Hinrichs
Dundee Media & Technology, Inc
jeffh at dundeemt.com
402.218.1473
More information about the Omaha
mailing list