Can't match str/unicode
cmpython at gmail.com
Sat Jan 7 16:40:42 EST 2017
This is probably very simple but I get confused when it comes to encoding and am generally rusty. (What follows is in Python 2.7; I know.).
I'm scraping a Word docx using win32com and am just trying to do some matching rules to find certain paragraphs that, for testing purposes, equal the word 'match', which I know exists as its own "paragraph" in the target document. First, this is at the top of the file:
# -*- coding: utf-8 -*-
Then this is the relevant code:
candidate_text = Paragraph.Range.Text.encode('utf-8')
print 'This is candidate_text:', candidate_text
print candidate_text == 'match'
if candidate_text == 'match':
# do something...
And that section produces this:
This is candidate_text: match
and, of course, doesn't enter that "do something" loop since apparently candidate_text != 'match'...even though it seems like it does.
So what's going on here? Why isn't a string with the content 'match' equal to another string with the content 'match'?
I've also tried it with removing that .encode part and the encoding part at the very top, but then the candidate_text is a unicode object that I also can't get to match to anything.
What am I doing wrong? How should I approach this? Thanks.
More information about the Python-list