First Python Script: Roman Numbers

Fri Apr 13 08:01:35 EDT 2001

 Here's my first Python script.  Way back when I was taking classes
 at a community college and learning Pascal, I got an assignment to
 convert numbers to roman numeral strings.  After tangling with the
 various rules for a bit, I realized that I could actually eliminate
 most of the conditionals by using a more elegant list of tokens.

 Since then I use the "toRoman" as a stock exercise for teaching
 myself new scripting and programming languages.  I've even used
 version in Bourne syntax for teaching classes in shell scripting
 (I've included that, too).

 Here's the script, I'll let it speak for itself:

#!/usr/bin/env python
# Roman Number Converter 
#	by James T. Dennis (c)2001 <jimd at starshine.org>

#  This code is released and licensed under the terms of the GPL
#  or, at the user's option, the BSD license as specified at the
#  following URLs:
#	http://www.freebsd.org/copyright/freebsd-license.html
#	http://www.gnu.org/copyleft/gpl.html
#
#   In any event it is provided free of charge, "as-is" and wholly
#   without any warranty.   Use it, mangle it, incorporate it into
#   any software that will have it.

# Convert natural numbers to their Roman numeral representations 
# and vice versa.

# First we associate a list of numeric values with
# their Roman numeral (string token) equivalents as follows:

Rom={}
Rom["M"] = 	1000
Rom["CM"] =  	 900
Rom["D"] =  	 500
Rom["CD"] =  	 400
Rom["C"] = 	 100
Rom["XC"] =	  90
Rom["L"] =	  50
Rom["XL"] =	  40
Rom["X"] =	  10
Rom["IX"] =	   9
Rom["V"] =	   5
Rom["IV"] =	   4
Rom["I"] =	   1

RomSeq = ( "M", "CM", "D", "CD", "C", "XC", "L", "XL", 
	   "X", "IX", "V", "IV", "I" )

# In this case we create a Python dictionary (otherwise we'd
# create two arrays, one of integer values and the other of strings
# and use our own functions to search them and translate using a 
# common index).  We also create a sequence tuple in descending
# order.  It's for interating over the value list in a convenient order.

# We include the two letter tokens (IV, CM, CD, XC, etc) because
# it makes our main conversion loop simpler (as we'll see).
# Basically it means we can loop straight through without having
# to encode a bunch of parsing conditionals (the sequence, including
# the two letter tokens already incorporates most the the parsing
# rules for roman numeral strings).  

# This allows us to convert from Arabic to Roman in about 7 lines 
# of code; and from Roman back to Arabic less than 20

# Here's how we use these data structures:

def toRoman (n):
	result=""
	if n < 1 or n > 4000:
		return None
		## raise IndexError?
	for i in RomSeq:
		while Rom[i] <= n:
			result = result + i
			n = n - Rom[i]
	return result

# Our result starts as an empty string.
# We interate over the sequence of roman numeral component strings
# if the corresponding value (the value associated with "M" or "CM"
# etc) is greater than our number, we append the current string to 
# our result and subtract its corresponding value from our copy of n

# Converting from a roman numeral string back to arabic representation
# is a bit trickier.  We can try using the same data structure.  However,
# we could encounter many types of errors.  
# We were able to filter out all invalid integers with a single compound 
# conditional but there are lots of ways to compose strings that are 
# *not* valid roman numerals.  

def fromRoman (s):
	result=0
	s = s.upper()
	# Start by converting to upper case for convenience

	for i in RomSeq:
		# A Roman number, in canonical form, is 
		# sorted from the larger tokens towards the 
		# lower valued ones.  Thus IIMXCC is not 
		# a proper roman number (it would have to be
		# sorted to MCXCII or 1192).  In fact CMM is
		# also not valid --- it would have to be MCM
		# and CMCM is is not valid at all it can't
		# be simply re-arranged, it must be re-written 
		# (MDCCC)

		# So we simply walk through the list of tokens
		# in sequence (just as we did for convert toRoman)
		# and we compare each token to the "head" of the
		# remaining string.

		seen = 0
		limit = 1
		if i in ("M", "C", "X", "I"):
			limit = 3
		# The M, C, X and I tokens make appear in sequences
		# up to three times.  All others may only appear once
		# each

		head = s[:len(i)]
		while i == head:
			# on a match (head of string matches token):
			# track number of times we've see this token
			# vs. the token's sequence limit
			seen = seen + 1
			if seen > limit:
				break

			s = s[len(i):]
			# behead the remaining string
			head = s[:len(i)]
			# and get the new head of the string
			result = result + Rom[i]
			# oh yeah, and add corresponding value to result

	if s == "":
		return result

## The following simply prints a list
## by converting to a roman number *and back*.
longest=""
for i in range(1,4000):
	s=toRoman(i)
	if i != fromRoman(s):
		print "Error?  is %s really %d" % (s, i)
	print "%5d\t%s" % (fromRoman(s), s) 
	if len(s) > len(longest):
		longest = s
print longest

#### End of script

 The last 10 lines are just a test suite.

 I've noticed that this script works fine under Python 2.x
 but it seems that s.upper() isn't in Python 1.5.x (or that
 I have to import a string module; I didn't spend time on that).

 Here's a shell script version of just the "toRoman()"
 function:

#!/bin/bash
valarray="1000 M 900 CM 500 D 400 CD 100 C 90 XC 50 L 40 XL 10"
valarray="$valarray X 9 IX 5 V 4 IV 1 I"

[ "$1" -lt 4000 ] || {
   echo "Can't Represent numbers over 4000 in this character set" >&2
   exit 1
   }

n="$1"
set -- $valarray
while [ "$n" -gt 0 ]; do  # build roman numeral
        while [ "$n" -lt "$1" ]; do  # find scale
                shift 2
                done
        while [ "$n" -ge $1 ]; do  # add values to result
                let n-=$1
                result=$result$2
                done
        done
echo $result