C's isprint() concept?

Michael P. Reilly arcege at shore.net
Sun Aug 15 21:15:40 EDT 1999


Jeff Pinyan <jeffp at crusoe.net> wrote:
:> If I want to replace all non-printable characters in a string with a single
:> space, what would be the best way?  Do I need to loop over the entire string
:> character by character checking the ord() value of each one?  Anyone have
:> a sane way to do this with regular expressions?

: In Perl, I could do it this way:
:   $string =~ tr/\x00-\x1f\x80-\xff//d;

: What that means is this:
:   remove the following characters from $string:
:     characters whose ASCII value is from \x00 (0) to \x1f (31)
:     characters whose ASCII value is from \x80 (128) to \xff (255)

: This can be done, almost as painlessly, in Python.

Considering this was asked to a Python newsgroup, how about showing
how to do it in Python.

  import string
  # works for ASCII
  control_chars = string._idmap[:ord(' ')]  # 0 to 31
  high_chars = string._idmap[ord('~')+:]    # 127 to 255
  to_remove = control_chars + high_chars
  map = string.maketrans(to_remove, ' ' * len(to_remove))

  midstr = string.translate(instr, map)
  outstr = string.join(string.split(midstr))

or:
  import re
  outstr = re.sub(r'[^ -~]+', ' ', instr)

(Indented for "easy cut&paste" ;)

  -Arcege





More information about the Python-list mailing list