[Tutor] regex help for a noob
David Rock
david at graniteweb.com
Mon Feb 15 17:48:07 EST 2021
* Thomas A. Anderson via Tutor <tutor at python.org> [2021-02-15 21:39]:
>
> import re
>
> def getlist():
> """ creates a list from file """ list = []
> dataload = open("/Users/drexl/Lyntin/sample.txt", "r")
> regExp = '\".*?\"' for line in dataload.readlines():
> x = re.findall(regExp, line)
> if x:
> list.append(x)
>
> print list
>
>
> getlist()
>
> I get the desired result, more or less, slightly more on the less side =(
>
> I am getting this as a list output:
> [['"n"'], ['"n"'], ['"e"'], ['"w"'], ['"n"']]
>
> where I would like a more basic list:
> list = ['n', 'n', 'e', 'w', 'n']
It sounds like you need to use a group in you regex:
instead of: '\".*?\"'
use: '\"(.*?)\"'
Basically, if you put () around the part you want, it gets "grouped" and can be referenced later by index.
re.findall will use groups if they are set:
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of
pattern in string, as a list of strings. The string is scanned left-to-right,
and matches are returned in the order found. If one or more groups are present
in the pattern, return a list of groups; this will be a list of tuples if the
pattern has more than one group. Empty matches are included in the result.
You may still end up with a list of lists, but I think that will get you closer to what you want.
--
David Rock
david at graniteweb.com
More information about the Tutor
mailing list