[Tutor] regex help for a noob

David Rock david at graniteweb.com
Mon Feb 15 17:48:07 EST 2021


* Thomas A. Anderson via Tutor <tutor at python.org> [2021-02-15 21:39]:
> 
> import re
> 
> def getlist():
>     """ creates a list from file """ list = []
>     dataload = open("/Users/drexl/Lyntin/sample.txt", "r")
>     regExp = '\".*?\"' for line in dataload.readlines():
>         x = re.findall(regExp, line)
>         if x:
>             list.append(x)
> 
>     print list
> 
> 
> getlist()
> 
> I get the desired result, more or less, slightly more on the less side =(
> 
> I am getting this as a list output:
> [['"n"'], ['"n"'], ['"e"'], ['"w"'], ['"n"']]
> 
> where I would like a more basic list:
> list = ['n', 'n', 'e', 'w', 'n']

It sounds like you need to use a group in you regex:
instead of: '\".*?\"'
use: '\"(.*?)\"'

Basically, if you put () around the part you want, it gets "grouped" and can be referenced later by index.
re.findall will use groups if they are set:


re.findall(pattern, string, flags=0)
Return all non-overlapping matches of
pattern in string, as a list of strings. The string is scanned left-to-right,
and matches are returned in the order found. If one or more groups are present
in the pattern, return a list of groups; this will be a list of tuples if the
pattern has more than one group. Empty matches are included in the result.

You may still end up with a list of lists, but I think that will get you closer to what you want.

-- 
David Rock
david at graniteweb.com


More information about the Tutor mailing list