How to replace characters in a string?
Jon Ribbens
jon+usenet at unequivocal.eu
Wed Jun 8 06:26:59 EDT 2022
On 2022-06-08, Dave <dave at looktowindward.com> wrote:
> I misunderstood how it worked, basically I’ve added this function:
>
> def filterCommonCharacters(theString):
> myNewString = theString.replace("\u2019", "'")
> return myNewString
> Which returns a new string replacing the common characters.
>
> This can easily be extended to include other characters as and when
> they come up by adding a line as so:
>
> myNewString = theString.replace("\u2014", “]” #just an example
>
> Which is what I was trying to achieve.
Here's a head-start on some characters you might want to translate,
mostly spaces, hyphens, quotation marks, and ligatures:
def unicode_translate(s):
return s.translate({
8192: ' ', 8193: ' ', 8194: ' ', 8195: ' ', 8196: ' ',
8197: ' ', 198: 'AE', 8199: ' ', 8200: ' ', 8201: ' ',
8202: ' ', 8203: '', 64258: 'fl', 8208: '-', 8209: '-',
8210: '-', 8211: '-', 8212: '-', 8722: '-', 8216: "'",
8217: "'", 8220: '"', 8221: '"', 64256: 'ff', 160: ' ',
64260: 'ffl', 8198: ' ', 230: 'ae', 12288: ' ', 173: '',
497: 'DZ', 498: 'Dz', 499: 'dz', 64259: 'ffi', 8230: '...',
64257: 'fi', 64262: 'st'})
If you want to go further then the Unidecode package might be helpful:
https://pypi.org/project/Unidecode/
More information about the Python-list
mailing list