non printable (moving away from Perl)
Peter Otten
__peter__ at web.de
Fri Mar 11 10:22:16 EST 2016
Fillmore wrote:
> On 03/11/2016 07:13 AM, Wolfgang Maier wrote:
>> One lesson for Perl regex users is that in Python many things can be
>> solved without regexes. How about defining:
>>
>> printable = {chr(n) for n in range(32, 127)}
>>
>> then using:
>>
>> if (set(my_string) - set(printable)):
>> break
>
> seems computationally heavy. I have a file with about 70k lines, of which
> only 20 contain "funny" chars.
>
> ANy idea on how I can create a script that compares Perl speed vs. Python
> speed in performing the cleaning operation?
Try
for line in ...:
if has_nonprint(line):
continue
...
with the has_nonprint() function as defined below:
$ cat isprint.py
import sys
import unicodedata
class Lookup(dict):
def __missing__(self, n):
c = chr(n)
cat = unicodedata.category(c)
if cat in {'Cs', 'Cn', 'Zl', 'Cc', 'Zp'}:
self[n] = c
return c
else:
self[n] = None
return None
lookup = Lookup()
lookup[10] = None # allow newline
def has_nonprint(s):
return bool(s.translate(lookup))
$ python3 -i isprint.py
>>> has_nonprint("foo")
False
>>> has_nonprint("foo\n")
False
>>> has_nonprint("foo\t")
True
>>> has_nonprint("\0foo")
True
More information about the Python-list
mailing list