[Python-Dev] ascii.py

Eric S. Raymond esr@thyrsus.com
Mon, 5 Jun 2000 21:02:31 -0400

Content-Type: text/plain; charset=us-ascii

Latest version.  Adds isctrl() and ismeta() functions (because I use them...)

Um.  What's the checkin procedure for library modules?  And do I have
permissions to do it?
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

"Guard with jealous attention the public liberty.  Suspect every one
who approaches that jewel.  Unfortunately, nothing will preserve it
but downright force.  Whenever you give up that force, you are
inevitably ruined."
	-- Patrick Henry, speech of June 5 1788

Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ascii.py"

# ascii.py -- constants and memembership tests for ASCII characters

NUL	= 0x00	# ^@
SOH	= 0x01	# ^A
STX	= 0x02	# ^B
ETX	= 0x03	# ^C
EOT	= 0x04	# ^D
ENQ	= 0x05	# ^E
ACK	= 0x06	# ^F
BEL	= 0x07	# ^G
BS	= 0x08	# ^H
TAB	= 0x09	# ^I
HT	= 0x09	# ^I
LF	= 0x0a	# ^J
NL	= 0x0a	# ^J
VT	= 0x0b	# ^K
FF	= 0x0c	# ^L
CR	= 0x0d	# ^M
SO	= 0x0e	# ^N
SI	= 0x0f	# ^O
DLE	= 0x10	# ^P
DC1	= 0x11	# ^Q
DC2	= 0x12	# ^R
DC3	= 0x13	# ^S
DC4	= 0x14	# ^T
NAK	= 0x15	# ^U
SYN	= 0x16	# ^V
ETB	= 0x17	# ^W
CAN	= 0x18	# ^X
EM	= 0x19	# ^Y
SUB	= 0x1a	# ^Z
ESC	= 0x1b	# ^[
FS	= 0x1c	# ^\
GS	= 0x1d	# ^]
RS	= 0x1e	# ^^
US	= 0x1f	# ^_
SP	= 0x20	# space
DEL	= 0x7f	# delete

controlnames = [
"NUL", "SOH", "STX", "ETX", "EOT", "ENQ", "ACK", "BEL",
"BS",  "HT",  "LF",  "VT",  "FF",  "CR",  "SO",  "SI",
"DLE", "DC1", "DC2", "DC3", "DC4", "NAK", "SYN", "ETB",
"CAN", "EM",  "SUB", "ESC", "FS",  "GS",  "RS",  "US",

def _ctoi(c):
    if type(c) == type(""):
        return ord(c)
        return c

def isalnum(c): return isalpha(c) or isdigit(c)
def isalpha(c): return isupper(c) or islower(c)
def isascii(c): return _ctoi(c) <= 127		# ?
def isblank(c): return _ctoi(c) in (8,32)
def iscntrl(c): return _ctoi(c) <= 31
def isdigit(c): return _ctoi(c) >= 48 and _ctoi(c) <= 57
def isgraph(c): return _ctoi(c) >= 33 and _ctoi(c) <= 126
def islower(c): return _ctoi(c) >= 97 and _ctoi(c) <= 122
def isprint(c): return _ctoi(c) >= 32 and _ctoi(c) <= 126
def ispunct(c): return _ctoi(c) != 32 and not isalnum(c)
def isspace(c): return _ctoi(c) in (12, 10, 13, 9, 11)
def isupper(c): return _ctoi(c) >= 65 and _ctoi(c) <= 90
def isxdigit(c): return isdigit(c) or \
    (_ctoi(c) >= 65 and _ctoi(c) <= 70) or (_ctoi(c) >= 97 and _ctoi(c) <= 102)
def isctrl(c): return _ctoi(c) < 32
def ismeta(c): return _ctoi(c) > 127

def ascii(c):
    if type(c) == type(""):
        return chr(_ctoi(c) & 0x7f)
        return _ctoi(c) & 0x7f

def ctrl(c):
    if type(c) == type(""):
        return chr(_ctoi(c) & 0x1f)
        return _ctoi(c) & 0x1f

def alt(c):
    if type(c) == type(""):
        return chr(_ctoi(c) | 0x80)
        return _ctoi(c) | 0x80

def unctrl(c):
    bits = _ctoi(c)
    if bits == 0x7f:
        rep = "^?"
    elif bits & 0x20:
        rep = chr((bits & 0x7f) | 0x20)
        rep = "^" + chr(((bits & 0x7f) | 0x20) + 0x20)
    if bits & 0x80:
        return "!" + rep
    return rep

Content-Type: application/x-tex
Content-Disposition: attachment; filename="libascii.tex"

\section{\module{ascii} ---
         Constants and set-membership functions for ASCII characters.}

\modulesynopsis{Constants and set-membership functions for ASCII characters.}
\moduleauthor{Eric S. Raymond}{esr@thyrsus.com}
\sectionauthor{Eric S. Raymond}{esr@thyrsus.com}


The \module{ascii} module supplies name constants for ASCII characters
and functions to test membership in various ASCII character classes.  
The constants supplied are names for control characters as follows:


NL and LF are synonyms; so are HT and TAB.  The module also supplies
the following functions, patterned on those in the standard C library:

Checks for an ASCII alphanumeric character; it is equivalent to
isalpha(c) or isdigit(c))

Checks for an ASCII alphabetic character; it is equivalent to
isupper(c) or islower(c))

Checks for a character value that fits in the 7-bit ASCII set.

Checks for an ASCII alphanumeric character; it is equivalent to
isalpha(c) or isdigit(c))

Checks for an ASCII control character (range 0x00 to 0x1f).

Checks for an ASCII decimal digit, 0 through 9.

Checks for ASCII any printable character except space.

Checks for an ASCII lower-case character.

Checks for any ASCII printable character including space.

Checks for any printable ASCII character which is not a space or an
alphanumeric character.

Checks for ASCII white-space characters; space, tab, line feed,
carriage return, form feed, horizontal tab, vertical tab.

Checks for an ASCII uppercase letter.

Checks for an ASCII hexadecimal digit, i.e. one of 0123456789abcdefABCDEF.

Checks for an ASCII control character, bit values 0 to 31.

Checks for a (non-ASCII) character, bit values 0x80 and above.

These functions accept either integers or strings; when the argument
is a string, it is first converted using the built-in function ord().

Note that all these functions check ordinal bit values derived from the 
first character of the string you pass in; they do not actually know
anything about the host machine's character encoding.  For functions 
that know about the character encoding (and handle
internationalization properly) see the string module.

The following two functions take either a single-character string or
integer byte value; they return a value of the same type.

Return the ASCII value corresponding to the low 7 bits of c.

Return the control character corresponding to the given character
(the character bit value is logical-anded with 0x1f).

Return the 8-bit character corresponding to the given ASCII character
(the character bit value is logical-ored with 0x80).

The following function takes either a single-character string or
integer byte value; it returns a string.

Return a string representation of the ASCII character c.  If c is
printable, this string is the character itself.  If the character
is a control character (0x00-0x1f) the string consists of a caret
(^) followed by the corresponding uppercase letter.  If the character
is an ASCII delete (0x7f) the string is "^?".  If the character has
its meta bit (0x80) set, the meta bit is stripped, the preceding rules
applied, and "!" prepended to the result.

Finally, the module supplies a 33-element string array 
called controlnames that contains the ASCII mnemonics for the
thirty-two ASCII control characters from 0 (NUL) to 0x1f (US),
in order, plus the mnemonic "SP" for space.