[Tutor] regex help for a noob

Alan Gauld alan.gauld at yahoo.co.uk
Mon Feb 15 17:47:55 EST 2021


There are several things to comment on here...

On 15/02/2021 20:39, Thomas A. Anderson via Tutor wrote:

> The single characters I am looking for are nestled within a ("_"), i.e.
> parenthesis and double quote.
> 
> I have tried the following code:
> 
> 
> import re
> 
> def getlist():
>     """ creates a list from file """ 

      list = []
>     dataload = open("/Users/drexl/Lyntin/sample.txt", "r")

Best Python practice says use a with statement for this:

      with open("/Users/drexl/Lyntin/sample.txt", "r") as dataload:

That will ensue it gets closed again, even if you hit an exception.

>     regExp = '\".*?\"' 

This regex does not correspond to your specification. Where are the ()?
I'd expect something like:

regExp = "\(\"(.)\"|)   # match any single char between (" and ")...

You want to extract the bit inside the quotes so that's
what the group (ie the (.) bit) will do.

>      for line in dataload.readlines():

You don't need the readlines() its better to use the file
object as an iterator:

     for line in dataload:

However I'm not sure you eben need to scan line by line, you
could just read() the whole file and do it as a single search
with findall()... But there may be data complications that
preclude  that...

>         x = re.findall(regExp, line)
>         if x:
>             list.append(x)

findall() returns a list of found items. You are appending the whole
list to your list. You probably want to add the lists together:

list += x

Also its very bad practice to use a type name for a variable. You
have hidden the list() function so you can't now convert strings,
say, to lists:

Ls = list("abc")   -> error because list is now an actual list.

> I have tried various other regex expressions, but they only give me worse or the same results.
> So, I don't think it is regex related? But somewhere else, I am missing something?

You are mostly missing the fact that appending a list to a
list puts the whole list into the containing list

a = [1]
b = [2]
c = []
c.append(a)  -> [[1]]
c.append(b)  -> [[1],[2]]

But there's quite a few other things to tidy up too.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list