Extract all words between two keywords in .txt file (Python)
Ben Bacarisse
ben.usenet at bsb.me.uk
Wed Dec 11 13:27:42 EST 2019
A S <aishan0403 at gmail.com> writes:
> I would like to extract all words within specific keywords in a .txt
> file. For the keywords, there is a starting keyword of "PROC SQL;" (I
> need this to be case insensitive) and the ending keyword could be
> either "RUN;", "quit;" or "QUIT;". This is my sample .txt file.
>
> Thus far, this is my code:
>
> with open('lan sample text file1.txt') as file:
> text = file.read()
> regex = re.compile(r'(PROC SQL;|proc sql;(.*?)RUN;|quit;|QUIT;)')
> k = regex.findall(text)
> print(k)
Try
re.compile(r'(?si)(PROC SQL;.*(?:QUIT|RUN);)')
Read up one what (?si) means and what (?:...) means.. You can do the
same by passing flags to the compile method.
> Output:
>
> [('quit;', ''), ('quit;', ''), ('PROC SQL;', '')]
Your main issue is that | binds weakly. Your whole pattern tries to
match any one of just four short sub-patterns:
PROC SQL;
proc sql;(.*?)RUN;
quit;
QUIT;
--
Ben.
More information about the Python-list
mailing list