[Tutor] search/match file position q

Tue Oct 7 17:47:37 CEST 2014

!-----Original Message-----
!From: Tutor [mailto:tutor-bounces+crk=godblessthe.us at python.org] On
!Behalf Of Peter Otten
!Sent: Tuesday, October 07, 2014 3:50 AM
!To: tutor at python.org
!Subject: Re: [Tutor] search/match file position q
!
!Clayton Kirkwood wrote:
!
!> I was trying to keep it generic.
!> Wrapped data file:
!> <tr data-row-symbol="SWKS"><td class="col-symbol
!> txt"> data-model="name:DatumModel;id:null;" data-
!tmpl=""><a
!> data-ylk="cat:portfolio;cpos:1"
!> href="http://finance.yahoo.com/q?s=SWKS"
!> data-rapid_p="18">SWKS</a></td><td
!> class="col-fiftytwo_week_low cell-
!raw:23.270000"> class="wrapper "
!> data-model="name:DatumModel;id:SWKS:qsi:wk52:low;"
!> data-tmpl="change:yfin.datum">23.27</td><td
!> class="col-prev_close cell-raw:58.049999"> class="wrapper " data-model="name:DatumMo
!
!Doesn't Yahoo make the data available as CSV? That would be the way to
!go then.

Yes, Yahoo has a few columns that are csv, but I have maybe 15 fields that
aren't provided. Besides, what fun would that be, I try to find tasks that
allow me to expand my knowledge"<)))

!
!Anyway, regular expressions are definitely the wrong tool here, and
!reading the file one line at a time only makes it worse.

Why is it making it only worse? I don't think a char by char would be
helpful, the line happens to be very long, and I don't have a way of peeking
around the corner to the next line so to speak. If I broke it into shorter
strings, it would be much more onerous to jump over the end of the current
to potentially many next strings.

!
!> import re, os
!> line_in = file.readline()
!	# read in humongous html line
!> stock = re.search('\s*<tr data-row-symbol="([A-Z]+)">',
!line_in)
!> #scan to SWKS"> in data
!							#line, stock
!should be SWKS
!> low_52 = re.search('.+wk52:low.+([\d\.]+)<', line_in)
!#want to
!> pick up from
!							#SWKS">,
!low_52 should be 23.27
!>
!> I am trying to figure out if each re.match starts scanning at the
!> beginning of the same line over and over or does each scan start at
!> the end of the last match. It appears to start over??
!>
!> This is stock:
!> <_sre.SRE_Match object; span=(0, 47), match=' <tr
!> data-row-symbol="SWKS">'> This is low_52:
!> <_sre.SRE_Match object; span=(0, 502875), match='
!<tr
!> data-row-symbol="SWKS"><t>
!> If necessary, how do I pick up and move forward to the point right
!> after the previous match? File.tell() and file.__sizeof__(), don't
!> seem to play a useful role.
!
!You should try BeautifulSoup. Let's play:
!
!>>> from bs4 import BeautifulSoup
!>>> soup = BeautifulSoup("""<tr data-row-symbol="SWKS"><td
!>>> class="col-symbol
!txt"><a data-ylk="cat:portfolio;cpos:1"
!href="http://finance.yahoo.com/q?s=SWKS" data-
!rapid_p="18">SWKS</a></td><td class="col-fiftytwo_week_low cell-
!raw:23.270000">23.27</td><td class="col-prev_close
!cell-
!raw:58.049999">""")
!>>> soup.find("tr")
!<tr data-row-symbol="SWKS"><td class="col-symbol txt"><a data-rapid_p="18"
!data-ylk="cat:portfolio;cpos:1"
!href="http://finance.yahoo.com/q?s=SWKS">SWKS</a></td><td
!class="col- fiftytwo_week_low cell-raw:23.270000">23.27</td><td class="col-prev_close
!cell- raw:58.049999"></td></tr>
!>>> tr = soup.find("tr")
!>>> tr["data-row-symbol"]
!'SWKS'
!>>> tr.find_all("span")
![<a data-rapid_p="18" data-ylk="cat:portfolio;cpos:1"
!href="http://finance.yahoo.com/q?s=SWKS">SWKS</a>, 23.27]
!>>> span = tr.find_all("span")[1]
!>>> span["data-model"]
!'name:DatumModel;id:SWKS:qsi:wk52:low;'
!>>> span.text
!'23.27'

So, what makes regex wrong for this job? question still remains: does the
search start at the beginning of the line each time or does it step forward
from the last search? I will check out beautiful soup as suggested in a
subsequent mail; I'd still like to finish this process:<}}

Clayton