<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi,<br>
<br>
I am developing code (Python 3.4) that transforms text data from one
format to another.<br>
<br>
As part of the process, I had a set of hard-coded str.replace(...)
functions that I used to clean up the incoming text into the desired
output format, something like this:<br>
<pre> dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds
dataIn = dataIn.replace('<','<') # Tidy up < character
dataIn = dataIn.replace('>','>') # Tidy up < character
dataIn = dataIn.replace('o','o') # No idea why but lots of these: convert to 'o' character
dataIn = dataIn.replace('f','f') # .. and these: convert to 'f' character
dataIn = dataIn.replace('e','e') # .. 'e'
dataIn = dataIn.replace('O','O') # .. 'O'
</pre>
These statements transform my data correctly, but the list of
statements
grows as I test the data so I thought it made sense to store the
replacement mappings in a file, read them into a dict and loop
through
that to do the cleaning up, like this:<br>
<pre> with open(fileName, 'r+t', encoding='utf-8') as mapFile:
for line in mapFile:
line = line.strip()
try:
if (line) and not line.startswith('#'):
line = line.split('#')[:1][0].strip() # trim any trailing comments
name, value = line.split('=')
name = name.strip()
self.filterMap[name]=value.strip()
except:
self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName))
raise
</pre>
Elsewhere, I use the following code to do the actual cleaning up:<br>
<pre> def filter(self, dataIn):
if dataIn:
for token, replacement in self.filterMap.items():
dataIn = dataIn.replace(token, replacement)
return dataIn</pre>
<br>
My mapping file contents look like this:<br>
<pre>\r = \\n
â = "
< = <
> = >
' = '
F = F
o = o
f = f
e = e
O = O</pre>
This all works "as advertised" <b><i>except</i></b> for the '\r'
=> '\\n' replacement. Debugging the code, I see that my '\r'
character is "escaped" to '\\r' and the '\\n' to '\\\\n' when they
are read in from the file.<br>
<br>
I've been googling hard and reading the Python docs, trying to get
my head around character encoding, but I just can't figure out how
to get these bits of code to do what I want.<br>
<br>
It seems to me that I need to either:<br>
<ul>
<li>change the way I represent '\r' and '\\n' in my mapping file;
or</li>
<li>transform them somehow when I read them in</li>
</ul>
<p>However, I haven't figured out how to do either of these.</p>
<p>TIA,<br>
</p>
<div class="moz-signature">-- <br>
Rob Hills<br>
Waikiki, Western Australia<br>
</div>
</body>
</html>