Finding lines in .txt file that contain keywords from two different set()
A S
aishan0403 at gmail.com
Sun Sep 8 12:02:55 EDT 2019
My problem is seemingly profound but I hope to make it sound as simplified as possible.....Let me unpack the details..:
1. I have one folder of Excel (.xlsx) files that serve as a data dictionary.
-In Cell A1, the data source name is written in between brackets
-In Cols C:D, it contains the data field names (It could be in either col C or D in my actual Excel sheet. So I had to search both columns
-*Important: I need to know which data source the field names come from
2. I have another folder of Text (.txt) files that I need to parse through to find these keywords.
These are the folders used for a better reference ( https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq ). The files are found in the folder.
This is the code I have thus far...:
import os, sys
from os.path
import join
import re
import xlrd
from xlrd import open_workbook
import openpyxl
from openpyxl.reader.excel import load_workbook
import xlsxwriter
#All the paths
dict_folder = 'C:/Users/xxxx/Documents/xxxx/Test Excel'
text_folder = 'C:/Users/xxxx/Documents/xxxx/Text'
words = set()
fieldset = set()
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
wb1 = load_workbook(join(dict_folder, file), data_only = True)
ws = wb1.active
#Here I am reading and printing all the data source names set(words) in the excel dictionaries:
cellvalues = ws["A1"].value
wordsextract = re.findall(r"\((.+?)\)", str(cellvalues))
results = wordsextract[0]
words.add(results)
print(results)
for rowofcellobj in ws["C" : "D"]:
for cellobj in rowofcellobj:
#2. Here I am printing all the field names in col C & D in the excel dictionaries:
data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
if data != []:
fields = data[0]
fieldset.add(fields)
print(fieldset)
#listing = str.remove("")
#print(listing)
#Here I am reading the name of each .txt file to the separate .xlsx file:
for r, name in enumerate(os.listdir(text_folder)):
if name.endswith(".txt"):
print(name)
#Reading .txt file and trying to make the sentence into words instead of lines so that I can compare the individual .txt file words with the .xlsx file
txtfilespath = os.chdir("C:/Users/xxxx/Documents/xxxx/Text")
#Here I am reading and printing all the words in the .txt files and compare with the excel Cell A1:
for name in os.listdir(txtfilespath):
if name.endswith(".txt"):
with open (name, "r") as texts:
# Read each line of the file:
s = texts.read()
print(s)
#if .txt files contain.....() or select or from or words from sets..search that sentence and extract the common fields
result1 = []
parens = 0
buff = ""
for line in s:
if line == "(":
parens += 1
if parens > 0:
buff += line
if line == ")":
parens -= 1
if not parens and buff:
result1.append(buff)
buff = ""
set(result1)
#Here, I include other keywords other than those found in the Excel workbooks
checkhere = set()
checkhere.add("Select")
checkhere.add("From")
checkhere.add("select")
checkhere.add("from")
checkhere.add("SELECT")
checkhere.add("FROM")
# k = list(checkhere)
# print(k)
#I only want to read/ extract the lines containing brackets () as well as the keywords in the checkhere set. So that I can check capture the source and field in each line:
#I tried this but nothing was printed......
for element in checkhere:
if element in result1:
print(result1)
My desired output for the code that could not be printed when I tried is:
(/* 1.select_no., biiiiiyyyy FROM apple_x_Ex_x */
proc sql; "TRUuuuth")
(/* 1.xxxxx FROM xxxxx*/
proc sql; "TRUuuuth")
(SELECT abc AS abc1, ab33_2_ AS mon, a_rr, iirir_vf, jk_ff, sfa_jfkj
FROM &orange..xxx_xxx_xxE
where (asre(kkk_ix as format 'xxxx-xx') gff &bcbcb_hhaha.) and
(axx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
)
(/* 1.select_no. FROM apple_x_Ex_x */
proc sql; "TRUuuuth")
(SELECT abc AS kfcccc, mcfg_2_ AS dokn, b_rr, jjhj_vf, jjjk_hj, fjjh_jhjkj
FROM &bfbd..pear_xxx_xxE
where (afdfe(kkffk_ix as format 'xxxxd-xx') gdaff &bcdadabcb_hdahaha.) and
(axx(xx_ix as format 'xxxx-xx') lec &jgjsdfdf_vnv.)
)
After which, if I'm able to get the desired output above, I will then compare these lines against the word set() and the fieldset set().
Any help would really be appreciated here..thank you
More information about the Python-list
mailing list