Is there a function to remove escape characters from a string ?
sjmachin at lexicon.net
Fri Dec 26 02:03:01 CET 2008
On Dec 26, 8:53 am, Stef Mientki <stef.mien... at gmail.com> wrote:
> Steven D'Aprano wrote:
> > On Thu, 25 Dec 2008 11:00:18 +0100, Stef Mientki wrote:
> >> hello,
> >> Is there a function to remove escape characters from a string ?
> >> (preferable all escape characters except "\n").
> > Can you explain what you mean? I can think of at least four alternatives:
> I have the following kind of strings,
> the funny "þ" is ASCII character 254, used as a separator character
ASCII ends at 127. Just refer to it as chr(254).
> Counts = "1þ11þ16" ==> 1,11,16
> Init1 = "1þ\BCtrl" ==> 1,Ctrl
> State5 = "8þ\BJUMP_COMPL\b\n>PCWrite = 1\n>PCSource = 10"
> ==> 8, JUMP_COMPL\n>PCWrite = 1\n>PCSource = 10
After making those substitutions, what are you going to do with it?
Split it up into fields using the csv module or stuff.split(",") or
some other DIY method? Is there a possibility that whoever "designed"
that data format used chr(254) as a separator because the data fields
contained "," sometimes and so "," could not be used as a separator?
> Seeing and testing all your answers, with great solutions that I've
> never seen before,
As far as str methods and built-ins that work on str objects are
concerned, there is no corpus of secret knowledge known only to a
cabal of wizards; it's all in the manual, and you don't need special
magical spectacles to see it :-)
> knowing nothing of escape sequences (I'm a windows guy ;-)
Why do you think that whether or not you are a "windows guy" is
relevant to knowing anything about escape sequences?
> I now see that the characters I need to remove, like \B and \b are
> not "official" escape sequences.
\b *is* an "official" escape sequence, just like \n; see below:
| >>> x = '\b'; print len(x), repr(x)
| 1 '\x08'
| >>> x = r'\b'; print len(x), repr(x)
| 2 '\\b'
| >>> x = '\B'; print len(x), repr(x)
| 2 '\\B'
| >>> x = r'\B'; print len(x), repr(x)
| 2 '\\B'
> So in this case the best (easiest to understand) method is a few replace
> s = s.replace ( '\b', '' ).replace( '\B', '' )
It's probable that \b and \B are both TWO-byte sequences, in which
case you should use r'\b' so that it does what you want it to do, and
use r'\B' for consistency.
More information about the Python-list