[Tutor] Data pattern query.

mhysnm1964 at gmail.com mhysnm1964 at gmail.com
Sun Jan 6 21:38:35 EST 2019


All,

 

I am currently writing a Python program to identify common text patterns in a excel spreadsheet which I have imported using openpyxl. All the descriptions of the transactions are in a single column. I am trying to work out the easiest method of identifying the same pattern of text in the fields. End of the day, I want a list of all the vendor transaction names. I don’t care if they are a credit or debit at this stage. Then I am going to group these vendors by categories. All this data has been downloaded from my bank originally.

 

In the field, there is the vendor name, suburb/town, type of transaction, etc.

 

I am only interested in the vendor name. So the vendor could be called “a”, “b”, “c”, etc. As I don’t know all the different vendors. How can I teach the program to learn new vendor names? I was thinking of removing all the duplicate entries as a start. Was thinking of using dictionaries for this. But not sure if this is the best approach. 

 

I am really stuck and if there is a library out there which can do this for me. I would be grateful. 😊

 

Note, I am new to Python and not an advance programmer.

 

Sean 

 

 



More information about the Tutor mailing list