ascii-unicode replacement

Andrea Valle andrea.valle at unito.it
Thu Apr 5 19:28:20 CEST 2007


Hi to all,

I scripted some text files with another language which cannot handle  
unicode.
As I need special character in the resulting text files (IPA  
extension), my idea was to define some special ascii sequences in the  
text files, open the text files in Python, replace the special  
sequences with unicode and encode in utf8. I made some tests with  
consolle and everything seemed fine.

But my script keeps on raising exceptions related to encoding.

Sorry if it's obvious but I really can't figure out what to do.

The script follows.

Thanks a lot

-a-

# a class for replacing ascii with unicode


import codecs
import os

class Unicoder:

         def __init__(self, folder):
             files = os.listdir(folder)
             paths = []
             for x in files:
                 paths.append(folder+"/"+x)
             self.files = paths
             # a list containing all the sc-generated .ly files

         def intoText(self, inFile):
             aFile = codecs.open(inFile, "r")
             text = aFile.read() # read all its content in text
             return text

         def replaceSpecials(self, text):
             replacementDict = (
             {"[O]":u"\u0254",
              "[U]":u"\u0277",
              "[E]":u"\u025b",
              "[o|]":u"\xf8",
              "[oe]":u"\u0153",
              "[e:]":u"\u0259",
              "[I]":u"\u026a",
              "[ae]":u"\xe6",
              "[A]":u"\u0251",
              "[Q]":u"\u0252",
              "[V]":u"\u028c"
              }

             )
             # hash table where to look up for replacement
             for ascii in replacementDict:
                 print ascii
                 utf = replacementDict[ascii]
                 text = text.replace(ascii, utf.encode("utf-8"))
             return text

         def toFile(self, text, outFileName):
             outFile = codecs.open(outFileName, encoding='utf-8',  
mode="w")
             outFile.write(text)
             outFile.close()

         def run(self):
             for aFileName in self.files:
                 outFileName = aFileName.split(".")[0]+"UTF.ly"
                 text = self.intoText(aFileName)
                 text = self.replaceSpecials(text)
                 self.toFile(text, outFileName)

if __name__ == "__main__":
     a = Unicoder("/musica/antigone/scores/")

# EOF

--------------------------------------------------
Andrea Valle
--------------------------------------------------
CIRMA - DAMS
Università degli Studi di Torino
--> http://www.cirma.unito.it/andrea/
--> andrea.valle at unito.it
--------------------------------------------------


  I did this interview where I just mentioned that I read Foucault.  
Who doesn't in university, right? I was in this strip club giving  
this guy a lap dance and all he wanted to do was to discuss Foucault  
with me. Well, I can stand naked and do my little dance, or I can  
discuss Foucault, but not at the same time; too much information.
(Annabel Chong)





-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20070405/f92e2ed1/attachment.html>


More information about the Python-list mailing list