<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    Hi,<br>

    <br>

    I am developing code (Python 3.4) that transforms text data from one

    format to another.<br>

    <br>

    As part of the process, I had a set of hard-coded str.replace(...)

    functions that I used to clean up the incoming text into the desired

    output format, something like this:<br>

    <pre>    dataIn = dataIn.replace('\r', '\\n') # Tidy up linefeeds

    dataIn = dataIn.replace('&lt;','<') # Tidy up < character

    dataIn = dataIn.replace('&gt;','>') # Tidy up < character

    dataIn = dataIn.replace('&#111;','o') # No idea why but lots of these: convert to 'o' character

    dataIn = dataIn.replace('&#102;','f') # .. and these: convert to 'f' character

    dataIn = dataIn.replace('&#101;','e') # ..  'e'

    dataIn = dataIn.replace('&#079;','O') # ..  'O'

</pre>

    These statements transform my data correctly, but the list of

    statements

    grows as I test the data so I thought it made sense to store the

    replacement mappings in a file, read them into a dict and loop

    through

    that to do the cleaning up, like this:<br>

    <pre>        with open(fileName, 'r+t', encoding='utf-8') as mapFile:

            for line in mapFile:

                line = line.strip()

                try:

                    if (line) and not line.startswith('#'):

                        line = line.split('#')[:1][0].strip() # trim any trailing comments

                        name, value = line.split('=')

                        name = name.strip()

                        self.filterMap[name]=value.strip()

                except:

                    self.logger.error('exception occurred parsing line [{0}] in file [{1}]'.format(line, fileName))

                    raise

</pre>

    Elsewhere, I use the following code to do the actual cleaning up:<br>

    <pre>    def filter(self, dataIn):

        if dataIn:

            for token, replacement in self.filterMap.items():

                dataIn = dataIn.replace(token, replacement)

        return dataIn</pre>

    <br>

    My mapping file contents look like this:<br>

    <pre>\r = \\n

â = &quot;

&lt; = <

&gt; = >

&#039; = &apos;

&#070; = F

&#111; = o

&#102; = f

&#101; = e

&#079; = O</pre>

    This all works "as advertised" <b><i>except</i></b> for the '\r'

    => '\\n' replacement. Debugging the code, I see that my '\r'

    character is "escaped" to '\\r' and the '\\n' to '\\\\n' when they

    are read in from the file.<br>

    <br>

    I've been googling hard and reading the Python docs, trying to get

    my head around character encoding, but I just can't figure out how

    to get these bits of code to do what I want.<br>

    <br>

    It seems to me that I need to either:<br>

    <ul>

      <li>change the way I represent '\r' and '\\n' in my mapping file;

        or</li>

      <li>transform them somehow when I read them in</li>

    </ul>

    <p>However, I haven't figured out how to do either of these.</p>

    <p>TIA,<br>

    </p>

    <div class="moz-signature">-- <br>

      Rob Hills<br>

      Waikiki, Western Australia<br>

    </div>

  </body>

</html>