[Python-Dev] ascii.py + documentation

Eric S. Raymond esr@thyrsus.com
Tue, 30 May 2000 14:58:38 -0400


--gBBFr7Ir9EOA20Yy
Content-Type: text/plain; charset=us-ascii

Fred L. Drake, Jr. <fdrake@acm.org>:
>   Appearantly the rest of us haven't heard of it.  Since Guido's a
> little distracted right now, perhaps you should send the files to
> python-dev for discussion?

Righty-O.  Here they are enclosed.  I wrote this for use with the
curses module; one reason it's useful is because because the curses
getch function returns ordinal values rather than characters.  It should
be more generally useful for any pPython program with a raw character-by-
character commmand interface.

The tex may need trivial markup fixes.  You might want to add a "See also"
to curses.

I'm using this code heavily in my CML2 project, so it has been tested.
For those of you who haven't heard about CML2, I've written a replacement
for the Linux kernel configuration system in Python.  You can find out more
at:

	http://www.tuxedo.org/~esr/kbuild/

The code has some interesting properties, including the ability to
probe its environment and come up in a Tk-based, curses-based, or
line-oriented mode depending on what it sees.

ascii.py will probably not be the last library code this project spawns.
I have another package called menubrowser that is a framework for writing
menu systems. And I have some Python wrapper enhancements for curses in
the works.
-- 
		<a href="http://www.tuxedo.org/~esr">Eric S. Raymond</a>

The two pillars of `political correctness' are, 
  a) willful ignorance, and
  b) a steadfast refusal to face the truth
	-- George MacDonald Fraser

--gBBFr7Ir9EOA20Yy
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="ascii.py"

#
# ascii.py -- constants and memembership tests for ASCII characters
#

NUL	= 0x00	# ^@
SOH	= 0x01	# ^A
STX	= 0x02	# ^B
ETX	= 0x03	# ^C
EOT	= 0x04	# ^D
ENQ	= 0x05	# ^E
ACK	= 0x06	# ^F
BEL	= 0x07	# ^G
BS	= 0x08	# ^H
TAB	= 0x09	# ^I
HT	= 0x09	# ^I
LF	= 0x0a	# ^J
NL	= 0x0a	# ^J
VT	= 0x0b	# ^K
FF	= 0x0c	# ^L
CR	= 0x0d	# ^M
SO	= 0x0e	# ^N
SI	= 0x0f	# ^O
DLE	= 0x10	# ^P
DC1	= 0x11	# ^Q
DC2	= 0x12	# ^R
DC3	= 0x13	# ^S
DC4	= 0x14	# ^T
NAK	= 0x15	# ^U
SYN	= 0x16	# ^V
ETB	= 0x17	# ^W
CAN	= 0x18	# ^X
EM	= 0x19	# ^Y
SUB	= 0x1a	# ^Z
ESC	= 0x1b	# ^[
FS	= 0x1c	# ^\
GS	= 0x1d	# ^]
RS	= 0x1e	# ^^
US	= 0x1f	# ^_
SP	= 0x20	# space
DEL	= 0x7f	# delete

def _ctoi(c):
    if type(c) == type(""):
        return ord(c)
    else:
        return c

def isalnum(c): return isalpha(c) or isdigit(c)
def isalpha(c): return isupper(c) or islower(c)
def isascii(c): return _ctoi(c) <= 127		# ?
def isblank(c): return _ctoi(c) in (8,32)
def iscntrl(c): return _ctoi(c) <= 31
def isdigit(c): return _ctoi(c) >= 48 and _ctoi(c) <= 57
def isgraph(c): return _ctoi(c) >= 33 and _ctoi(c) <= 126
def islower(c): return _ctoi(c) >= 97 and _ctoi(c) <= 122
def isprint(c): return _ctoi(c) >= 32 and _ctoi(c) <= 126
def ispunct(c): return _ctoi(c) != 32 and not isalnum(c)
def isspace(c): return _ctoi(c) in (12, 10, 13, 9, 11)
def isupper(c): return _ctoi(c) >= 65 and _ctoi(c) <= 90
def isxdigit(c): return isdigit(c) or \
    (_ctoi(c) >= 65 and _ctoi(c) <= 70) or (_ctoi(c) >= 97 and _ctoi(c) <= 102)

def ctrl(c):
    if type(c) == type(""):
        return chr(_ctoi(c) & 0x1f)
    else:
        return _ctoi(c) & 0x1f

def alt(c):
    if type(c) == type(""):
        return chr(_ctoi(c) | 0x80)
    else:
        return _ctoi(c) | 0x80





--gBBFr7Ir9EOA20Yy
Content-Type: application/x-tex
Content-Disposition: attachment; filename="ascii.tex"

\section{\module{ascii} ---
         Constants and set-membership functions for ASCII characters.}

\declaremodule{standard}{ascii}
\modulesynopsis{Constants and set-membership functions for ASCII characters.}
\moduleauthor{Eric S. Raymond}{esr@thyrsus.com}
\sectionauthor{Eric S. Raymond}{esr@thyrsus.com}

\versionadded{1.6}

The \module{ascii} module supplies name constants for ASCII characters
and functions to test membership in various ASCII character classes.  
The constants supplied are names for control characters as follows:

NUL, SOH, STX, ETX, EOT, ENQ, ACK, BEL, BS, TAB, HT, LF, NL, VT, FF, CR,
SO, SI, DLE, DC1, DC2, DC3, DC4, NAK, SYN, ETB, CAN, EM, SUB, ESC, FS, 
GS, RS, US, SP, DEL.

NL and LF are synonyms.  The module also supplies the following
functions, patterned on those in the standard C library:

\begin{funcdesc}{isalnum}{c}
Checks for an ASCII alphanumeric character; it is equivalent to
isalpha(c) or isdigit(c))
\end{funcdesc}

\begin{funcdesc}{isalpha}{c}
Checks for an ASCII alphabetic character; it is equivalent to
isupper(c) or islower(c))
\end{funcdesc}

\begin{funcdesc}{isascii}{c}
Checks for a character value that fits in the 7-bit ASCII set.
\end{funcdesc}

\begin{funcdesc}{isblank}{c}
Checks for an ASCII alphanumeric character; it is equivalent to
isalpha(c) or isdigit(c))
\end{funcdesc}

\begin{funcdesc}{iscntrl}{c}
Checks for an ASCII control character (range 0x00 to 0x1f).
\end{funcdesc}

\begin{funcdesc}{isdigit}{c}
Checks for an ASCII decimal digit, 0 through 9.
\end{funcdesc}

\begin{funcdesc}{isgraph}{c}
Checks for ASCII any printable character except space.
\end{funcdesc}

\begin{funcdesc}{islower}{c}
Checks for an ASCII lower-case character.
\end{funcdesc}

\begin{funcdesc}{isprint}{c}
Checks for any ASCII printable character including space.
\end{funcdesc}

\begin{funcdesc}{ispunct}{c}
Checks for any printable ASCII character which is not a space or an
alphanumeric character.
\end{funcdesc}

\begin{funcdesc}{isspace}{c}
Checks for ASCII white-space characters; space, tab, line feed,
carriage return, form feed, horizontal tab, vertical tab.
\end{funcdesc}

\begin{funcdesc}{isupper}{c}
Checks for an ASCII uppercase letter.
\end{funcdesc}

\begin{funcdesc}{isxdigit}{c}
Checks for an ASCII hexadecimal digit, i.e. one of 0123456789abcdefABCDEF.
\end{funcdesc}

These functions accept either integers or strings; when the argument
is a string, it is first converted using the built-in function ord().

Note that all these functions check ordinal bit values derived from the 
first character of the string you pass in; they do not actually know
anything about the host machine's character encoding.  For functions 
that know about the character encoding (and handle
internationalization properly) see the string module.

The following two functions take either a single-character string or
integer byte value; they return a value of the same type.

\begin{funcdesc}{ctrl}{c}
Return the control character corresponding to the given character
(the character bit value is logical-anded with 0x1f).
\end{funcdesc}

\begin{funcdesc}{alt}{c}
Return the 8-bit character corresponding to the given ASCII character
(the character bit value is logical-ored with 0x80).
\end{funcdesc}





--gBBFr7Ir9EOA20Yy--