splitting file/content into lines based on regex termination

bruce badouglas at gmail.com
Thu Nov 7 19:45:36 CET 2013


hi.

thanks for the reply.

tried what you suggested. what I see now, is that I print out the
lines, but not the regex data at all. my initial try, gave me the
line, and then the next items , followed by the next line, etc...

what I then tried, was to do a capture/findall of the regex, and
combine the outputs in separate loops, which will be ugly but will
work....

  ff= "byu2.dat"
  #fff= "sdsu2.dat"
  with open(ff,"r") as myfile:
    s=myfile.read()


  s=s.replace("&nbsp", "")

  #with open(fff,"w") as myfile2:
  #  myfile2.write(s)
#<br>#45 / 58#0#
#<br>#45 / 58#0#
  #dat1=re.compile("<br>#(\d+) / (\d+)#(\d+)#").search(s).findall()
  dat1=re.findall("<br>#(\d+) / (\d+)#(\d+)#",s)
  dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s)
  dat2 = re.compile(r"<br>#\d+ / \d+#\d+#").split(s)
  #dat=re.split('("<br>#(\d+) / (\d+)#(\d+)#")',s)
  #dat=re.compile("<br>#(\d+)").split(s)


  for m in dat:
    if m:
      print "m = "+m

      #sys.exit()

  print "dat1"
  print dat1
  print len(dat1)
  print "dat2a"
  #sys.exit()

#  for m in dat1:
#    if m:
#      print "m = "+m
#
#      #sys.exit()

  for m in dat2:
    if m:
      print "m = "+m

      #sys.exit()

  sys.exit()

  return


the test data is pasted to -->>> http://bpaste.net/show/kYzBUIfhc5023phOVmcu/

thanks
!!


On Thu, Nov 7, 2013 at 1:13 PM, MRAB <python at mrabarnett.plus.com> wrote:
> On 07/11/2013 17:45, bruce wrote:
>>
>> update...
>>
>>    dat=re.compile("<br>#(\d+) / (\d+)#(\d+)#").split(s)
>>
>> almost works..
>>
>> except i get
>> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL
>> m = 45
>> m = 58
>> m = 0
>> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL
>> m = 9
>> m = 58
>> m = 0
>>
>> and what i want is:
>> m = 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL 45 / 58,0
>> m = 10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL 9 / 58,0
>>
>>
>> so i'd have the results of the "compile/regex process" to be added to
>> the split lines
>>
>> thoughts/comments??
>>
>> thanks
>>
> The split method also returns what's matched in any capture groups,
> i.e. "(\d+)". Try omitting the parentheses:
>
>     dat = re.compile(r"<br>#\d+ / \d+#\d+#").split(s)
>
> You should also be using raw string literals as above (r"..."). It
> doesn't matter in this instance, but it might in others.
>
>>
>>
>> On Thu, Nov 7, 2013 at 12:15 PM, bruce <badouglas at gmail.com> wrote:
>>>
>>> hi.
>>>
>>> got a test file with the sample content listed below:
>>>
>>> the content is one long string, and needs to be split into separate lines
>>>
>>> I'm thinking the pattern to split on should be a kind of regex like::
>>> <br>#45 / 58#0#
>>> or
>>> <br>#9 / 58#0
>>> but i have no idea how to make this happen!!
>>>
>>> if i read the content into a buf -> s
>>>
>>> import re
>>> dat = re.compile("what goes here??").split(s)
>>>
>>> --i'm not sure what goes in the compile() to get the process to work..
>>>
>>> thoughts/comments would be helpful.
>>>
>>> thanks
>>>
>>>
>>> test dat::
>>> 10116#000#C S#S#100##001##DAY#Fund of Computing#Barrett,
>>> William#3#MWF<br>#08:00am<br>#08:50am<br>#3718 HBLL <br>#45 /
>>> 58#0#10116#000#C S#S#100##002##DAY#Fund of Computing#Barrett,
>>> William#3#MWF<br>#09:00am<br>#09:50am<br>#3718 HBLL <br>#9 /
>>> 58#0#10178#000#C S#S#124##001##DAY#Computer Systems#Roper,
>>> Paul#3#MWF<br>#11:00am<br>#11:50am<br>#1170 TMCB <br>#41 /
>>> 145#0#10178#000#C S#S#124##002##DAY#Computer Systems#Roper,
>>> Paul#3#MWF<br>#2:00pm<br>#2:50pm<br>#1170 TMCB <br>#40 /
>>> 120#0#01489#002#C S#S#142##001##DAY#Intro to Computer
>>> Programming#Burton, Robert <div class='instructors'>Seppi, Kevin<br
>>> /></div><span
>>
>>
>
> --
> https://mail.python.org/mailman/listinfo/python-list



More information about the Python-list mailing list