[Tutor] Finding lines in .txt file that contain keywords from two different set()

A S aishan0403 at gmail.com
Sun Sep 8 12:05:50 EDT 2019


Hi,

My problem is seemingly profound but I hope to make it sound as simplified
as possible.....Let me unpack the details..:

*1*. I have one folder of Excel (.xlsx) files that serve as a data
dictionary.

-In Cell A1, the data source name is written in between brackets

-In Cols C:D, it contains the data field names (It could be in either col C
or D in my actual Excel sheet. So I had to search both columns

-*Important: I need to know which data source the field names come from

*2*. I have another folder of Text (.txt) files that I need to parse
through to find these keywords.

These are the folders used for a better reference here. The files are found
in the folder
<https://drive.google.com/open?id=1_LcceqcDhHnWW3Nrnwf5RkXPcnDfesq_>

This is the code I have thus far...:

import os, sysfrom os.pathimport joinimport reimport xlrdfrom xlrd
import open_workbookimport openpyxlfrom openpyxl.reader.excel import
load_workbookimport xlsxwriter

#All the paths
dict_folder = 'C:/Users/xxxx/Documents/xxxx/Test Excel'
text_folder = 'C:/Users/xxxx/Documents/xxxx/Text'

words = set()
fieldset = set()for file in os.listdir(dict_folder):if file.endswith(".xlsx"):
    wb1 = load_workbook(join(dict_folder, file), data_only = True)
    ws = wb1.active
   #Here I am reading and printing all the data source names
set(words) in the excel dictionaries:
    cellvalues = ws["A1"].value
    wordsextract = re.findall(r"\((.+?)\)", str(cellvalues))
    results = wordsextract[0]
    words.add(results)
    print(results)

    for rowofcellobj in ws["C" : "D"]:
        for cellobj in rowofcellobj:
           #2. Here I am printing all the field names in col C & D in
the excel dictionaries:
            data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
            if data != []:
                fields = data[0]
                fieldset.add(fields)
                print(fieldset)
                #listing = str.remove("")
                #print(listing)

#Here I am reading the name of each .txt file to the separate .xlsx
file:for r, name in enumerate(os.listdir(text_folder)):
    if name.endswith(".txt"):
        print(name)
#Reading .txt file and trying to make the sentence into words instead
of lines so that I can compare the individual .txt file words with the
.xlsx file
txtfilespath = os.chdir("C:/Users/xxxx/Documents/xxxx/Text")

#Here I am reading and printing all the words in the .txt files and
compare with the excel Cell A1:for name in os.listdir(txtfilespath):
    if name.endswith(".txt"):
        with open (name, "r") as texts:
            # Read each line of the file:
            s = texts.read()
            print(s)


            #if .txt files contain.....() or select or from or words
from sets..search that sentence and extract the common fields

            result1 = []
            parens = 0
            buff = ""
            for line in s:
                if line == "(":
                    parens += 1
                if parens > 0:
                    buff += line
                if line == ")":
                    parens -= 1
               if not parens and buff:
                    result1.append(buff)
                    buff = ""
                    set(result1)
#Here, I include other keywords other than those found in the Excel workbooks
   checkhere = set()
   checkhere.add("Select")
   checkhere.add("From")
   checkhere.add("select")
   checkhere.add("from")
   checkhere.add("SELECT")
   checkhere.add("FROM")
   # k = list(checkhere)
   # print(k)

   #I only want to read/ extract the lines containing brackets () as
well as the keywords in the checkhere set. So that I can check capture
the source and field in each line:
   #I tried this but nothing was printed......
   for element in checkhere:
       if element in result1:
        print(result1)

*My desired output for the code that could not be printed when I tried is:*

(/* 1.select_no., biiiiiyyyy FROM apple_x_Ex_x */
 proc sql; "TRUuuuth")
(/* 1.xxxxx FROM xxxxx*/
proc sql; "TRUuuuth")
(SELECT abc AS abc1, ab33_2_ AS mon, a_rr, iirir_vf, jk_ff, sfa_jfkj
    FROM &orange..xxx_xxx_xxE
 where (asre(kkk_ix as format 'xxxx-xx') gff &bcbcb_hhaha.) and
  (axx(xx_ix as format 'xxxx-xx') lec &jgjsd_vnv.)
 )

 (/* 1.select_no. FROM apple_x_Ex_x */
 proc sql; "TRUuuuth")

 (SELECT abc AS kfcccc, mcfg_2_ AS dokn, b_rr, jjhj_vf, jjjk_hj, fjjh_jhjkj
    FROM &bfbd..pear_xxx_xxE
 where (afdfe(kkffk_ix as format 'xxxxd-xx') gdaff &bcdadabcb_hdahaha.) and
  (axx(xx_ix as format 'xxxx-xx') lec &jgjsdfdf_vnv.)
 )

After which, if I'm able to get the desired output above, I will then
compare these lines against the word set() and the fieldset set().

Any help would really be appreciated here..thank you


More information about the Tutor mailing list