[omaha] This is the message I tried to send/post this morning

Jeff Hinrichs - DM&T jeffh at dundeemt.com
Sat Dec 1 16:52:11 CET 2007


Burch,

First off, I'm not sure what is happening to your messages.  I've told
mailman to forward me any email that it would have discarded so I can
figure out what is going on.  I'll get this resolved as quickly as
possible.

Now back to your original question, so if I'm understanding you want
to create a list of fld2 entries.  The first thing to clarify is that
lists are 0 based.  so if you want (referring to the first rec in your
example) the "NICHOLAS FINANCIAL INC" that would be rec[1], if you
wanted the "10-Q" it would be rec[2]

so continuing on with my example code,

sublist = []
for rec in thedata:
    sublist.append(rec[1])

#now sublist is a list of values and thedata is still the original list of lists
#['"NICHOLAS FINANCIAL INC','NICHOLAS FINANCIAL INC',...]

A list comprehension could be used as well, it would look like:
sublist = [rec[1] for rec in thedata]

I think the first example is more clear when you are new to python.
Both return the same sublist.  For more on lists and list
comprehensions see http://docs.python.org/tut/node7.html  5.1.4 List
Comprehensions.  The python tutorial section is pretty good but I
normally recommend Dive in to Python
http://www.diveintopython.org/toc/index.html as one of the best
on-line resources for python.

Now if you want a sublist of complete records when it is a '10-Q'
record you would use filtering

sublist = [rec for rec in thedata if rec[2] == '10-Q']

Now sublist is a list of 10-Q records, which from your example data
would have 1 element.  You can replace the rec[2] == '10-Q' test with
any test or user defined function that returns a True/False value.

 sublist = [rec for rec in thedata if rec[2] == '10-Q']

is equivalent to

sublist = []
for rec in thedata:
    if rec[2] == '10-Q':
        sublist.append(rec[1])

Because I'm simple minded I would initially write my code in the
looping structure I just showed and then re-factor using the list
comprehension later.  I consider it a speed optimization.  Not to go
into detail but the comprehension is more efficient internally
something I don't worry about except in extreme cases.  (I believe you
should always optimize for readability first.)





On Dec 1, 2007 8:45 AM, Burch Kealey <bkealey at mail.unomaha.edu> wrote:
>
> Burch T. Kealey, PhD.
> RH-CBA 408-N
> University of Nebraska at Omaha
> 6000 Dodge Street
> Omaha Nebraska  68104
>
> 402-554-3571
>
> This message (including any attachments) contains confidential information
> intended for a specific individual and purpose, and is protected by law.  If
> you are not the intended recipient, you should delete this message.  Any
> disclosure, copying, or distribution of this message, or the taking of any
> action based on it, is strictly prohibited.
>
> -----Forwarded by Burch Kealey/FACSTAFF/UNO/UNEBR on 12/01/2007 08:45AM
> -----
>
> To: Omaha Python Users Group <omaha at python.org>
> From: Burch Kealey/FACSTAFF/UNO/UNEBR
> Date: 12/01/2007 07:57AM
> Subject: Re: [omaha] Fwd: Omaha post from academicedgar at gmail.com requires
> approval
>
>
> Hi Jeff
>
> Thanks for your reply.  After I saw the bounce message I realized I had sent
> it from the wrong account but for some reason-I am not sure messages from
> this account are getting through as I sent it again but it has never shown
> back up in my in box.
>
> I think from reading your code I am just getting back my original input file
> as a list of lists-the input file is now a list and each line has been
> transformed into a list.  My inelegant solution gets me there (x3 is a list
> of lists] My ultimate goal though is to pull fields from the records.  One
> of the things I see myself doing is creating logic based on values of
> particular fields in each record.  In the simple case I would want to create
> a new list with just say the first (or second . . .) field from each record.
> So how do I selectively subset items in the x3 list say I want all i in
> x3[i][2]-I think that is all rows in x3, field 2?
>
> Cheers
>
> Burch
>
>
>
> -----omaha-bounces at python.org wrote: -----
>
> To: "Omaha Python Users Group" <omaha at python.org>
> From: "Jeff Hinrichs - DM&T" <jeffh at dundeemt.com>
> Sent by: omaha-bounces at python.org
> Date: 11/30/2007 10:50PM
> cc: academicedgar at gmail.com
> Subject: [omaha] Fwd: Omaha post from academicedgar at gmail.com requires
> approval
>
> Burch,
> seems that you discarded your message as I went to approve it but it
> wasn't there anymore.
>
> Say that your data is in a file 'data.txt' , then to process that file
> in to memory you could do something like:
>
> thedata = [] # a list to hold the records
> for line in file('data.txt'):
>     line = line.strip()  #remove the newline char
>     rec = line.split('|')   #split at '|' chars, returning a
>                                    #list ['1000045','NICHOLAS
> FINANCIAL INC','10-Q','2007-02-14','edgar...'] as a record
>     thedata.append(rec)
>
> # at this point thedata is a list of records(lists), let's iterate through
> it
>
> for rec in thedata:
>     print rec
>
>
> If you have a giant file to process you don't have to read it all into
> memory to begin with, you could just process a record at a time.  The
> snippet you have does the same, but doesn't exploit python's
> strengths.  The 'for line in file():' idiom is very pythonic.  As it
> is very readable.  I could combine some things to make it smaller, but
> it would make it less readable.
>
>
> I hope this answers your question and please feel free to join the email
> list.
>
> -Jeff
>
>
> ---------- Forwarded message ----------
> From: "Burch Kealey" <academicedgar at gmail.com>
> To: omaha at python.org
> Date: Fri, 30 Nov 2007 16:41:42 -0600
> Subject: Help
> I am a long-time SAS programmer.  One of the reasons I have joined
> this group is a desire to branch out to a new, hopefully more
> versatile language.  I am struggling to get my head around how the
> language is structured.  After trying to plow through a number of
> books I have finally taken the plunge and am starting to try to find
> ways to do things in PYTHON that I have been doing in SAS.  I am still
> trying to get over the basics and need some help.  However, if you
> read this question and think that this is not the forum for these
> types of questions then I would appreciate any indication of the
> appropriate forum.
>
> Thanks
>
> Burch
>
> Here is an example of some raw data I am trying to input
> 1000045|NICHOLAS FINANCIAL
> INC|10-Q|2007-02-14|edgar/data/1000045/0001193125-07-031642.txt
> 1000045|NICHOLAS FINANCIAL
> INC|4|2007-01-18|edgar/data/1000045/0001144204- 07-002274.txt
> 1000045|NICHOLAS FINANCIAL
> INC|8-K|2007-01-29|edgar/data/1000045/0001193125-07-015318.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G/A|2007-02-14|edgar/data/1000045/0000950134-07-003261.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G|2007-02-14|edgar/data/1000045/0000315066- 07-002043.txt
> 1000045|NICHOLAS FINANCIAL INC|SC
> 13G|2007-02-21|edgar/data/1000045/0000950135-07-000971.txt
> 1000069|EMPIRIC FUNDS,
> INC|497|2007-01-31|edgar/data/1000069/0000894189-07-000334.txt
> 1000069|EMPIRIC FUNDS,
> INC|N-Q|2007-02-22|edgar/data/1000069/0000894189- 07-000461.txt
> 1000069|TEXAS CAPITAL VALUE FUNDS
> INC|485BPOS|2007-01-26|edgar/data/1000069/0000894189-07-000234.txt
>
>
> I need to read the file and use the values as inputs to some later
> tasks. The | is a delimitter. One of the things I think I want to do
> is specific items from the list.  I can make it happen to an output
> file but after two days of trying (say ten hours total) I can't create
> a list in memory of just one of the items.
>
> Here is the code that will create what I need in a flat file:
> import os
> y=file('c:/newout4.txt','w')
> x=file('C:/2007/QTR1/master.idx',"r")
> obsname=[line.strip() for line in x.readlines()]
> x3=[i.split('|',5) for i in obsname]
> for thing in x3:
>     y.write('%s%s' % (thing[0],os.linesep))
> y.close()
>
> Here is one of my many attempts to create a list in memory
> import os
> x=file('C:/2007/QTR1/master.idx',"r")
> obsname=[line.strip() for line in x.readlines()]
> x3=[i.split('|',5) for i in obsname]
> for thing in x3:
>     cik=[thing[0]]
>
> Any suggestions?
>
> Cheers
>
> Burch
>
>
>
> ---------- Forwarded message ----------
> From: omaha-request at python.org
> To:
> Date:
> Subject: confirm 6f1d1eccfaea12b908759381f0510a7e2756cc45
> If you reply to this message, keeping the Subject: header intact,
> Mailman will discard the held message.  Do this if the message is
> spam.  If you reply to this message and include an Approved: header
> with the list password in it, the message will be approved for posting
> to the list.  The Approved: header can also appear in the first line
> of the body of the reply.
>
>
>
> --
> Jeff Hinrichs
> Dundee Media & Technology, Inc
> jeffh at dundeemt.com
> 402.218.1473
> _______________________________________________
> Omaha Python Users Group mailing list
> Omaha at python.org
> http://mail.python.org/mailman/listinfo/omaha
> http://www.OmahaPython.org
>
>
>



-- 
Jeff Hinrichs
Dundee Media & Technology, Inc
jeffh at dundeemt.com
402.218.1473


More information about the Omaha mailing list