Finding nonprintable characters?
sdm7g at Virginia.EDU
Tue Feb 19 20:35:40 CET 2002
On Tue, 19 Feb 2002, VanL wrote:
> I have a function
> that I'm not sure how to implement. I've decided to define binary as
> containing characters above \x80. But what is the best way to do this?
> 1. iterate through xreadline, so the whole thing doesn't get loaded into
I would use file.read( bytes ) -- if it's binary, then you probably
don't need to read the whole file in. Most programs I've seen that
try to determine 'binaryness' only check the first N bytes anyway.
( I've seen some that want a certain percentage of non-printing chars
per block -- not just a single out of range char. )
> 2. String searching? If so, for what string? Searching for anything
> greater than \x7f?
> 3. Re searching? for what class?
How about something like:
filter( lambda c: ord(c) > value, file.read( blocksize ) )
or, as you note, save the ord() call and use an octal or hex string
literal. If you want to use list comprehensions it would be something
[ c for c in file.read( blocksize ) if c > '\x7f' ]
but list comprehensions give you a list while filter on a string
yields a string. You can divide the (float) length of the filtered value
by the length of the original ( blocksize ) for a ratio if you
want to use that instead of a single out of range char.
-- Steve Majewski
More information about the Python-list