Regex on a Dictionary
Rhodri James
rhodri at kynesim.co.uk
Tue Feb 13 08:42:08 EST 2018
On 13/02/18 13:11, Stanley Denman wrote:
> I am trying to performance a regex on a "string" of text that python isinstance is telling me is a dictionary. When I run the code I get the following error:
>
> {'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
>
> Traceback (most recent call last):
> File "C:\Users\stand\Desktop\PythonSublimeText.py", line 9, in <module>
> x=MyRegex.findall(MyDict)
> TypeError: expected string or bytes-like object
>
> Here is the "string" of code I am working with:
>
> {'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 - 05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
>
> I want to grab the name "MILANI, JOHN C" and the last date "-mm/dd/yyyy" as a pair such that if I have X numbers of string like the above I will end out with N pairs of values (name and date)/ Here is my code:
>
> import PyPDF2,re
> pdfFileObj=open('x.pdf','rb')
> pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
> Result=pdfReader.getOutlines()
> MyDict=(Result[-1][0])
> print(MyDict)
> print(isinstance(MyDict,dict))
> MyRegex=re.compile(r"MILANI,")
> x=MyRegex.findall(MyDict)
> print(x)
As the error message says, re.findall() expects a string. A dictionary
is in no sense a string, so passing it in whole like that won't work.
If you know that the name will always show up in the title field, you
can pass just the title:
x = MyRegex.findall(MyDict['/Title'])
Otherwise you will have to loop through all the entries in the dictionary:
for entry in MyDict.values():
x = MyRegex.findall(entry)
# ...and do something with x
I rather suspect you are going to find that the titles aren't in a very
systematic format, though.
--
Rhodri James *-* Kynesim Ltd
More information about the Python-list
mailing list