a interesting Parallel Programing Problem: asciify-string
xahlee at gmail.com
Tue Mar 6 14:11:40 CET 2012
here's a interesting problem that we are discussing at comp.lang.lisp.
〈Parallel Programing Problem: asciify-string〉
here's the plain text. Code example is emacs lisp, but the problem is
for a bit python relevancy… is there any python compiler that's
Parallel Programing Problem: asciify-string
Here's a interesting parallel programing problem.
The task is to change this function so it's parallelable. (code
example in emacs lisp)
(defun asciify-string (inputStr)
"Make Unicode string into equivalent ASCII ones."
(setq inputStr (replace-regexp-in-string "á\\|à\\|â\\|ä" "a"
(setq inputStr (replace-regexp-in-string "é\\|è\\|ê\\|ë" "e"
(setq inputStr (replace-regexp-in-string "í\\|ì\\|î\\|ï" "i"
(setq inputStr (replace-regexp-in-string "ó\\|ò\\|ô\\|ö" "o"
(setq inputStr (replace-regexp-in-string "ú\\|ù\\|û\\|ü" "u"
Here's a more general description of the problem.
You are given a Unicode text file that's a few peta bytes. For certain
characters in the file, they need to be changed to different char.
(For example of practical application, see: IDN homograph attack ◇
Duplicate characters in Unicode.)
One easy solution is to simply use regex, as the above sample code, to
search thru the file sequentially, and perform the transfrom of a
particular set of chars, then repeat for each char chat needs to be
But your task is to use a algorithm parallelizable. That is, in a
parallel-algorithm aware language (e.g. Fortress), the compiler will
automatically span the computation to multiple processors.
Refer to Guy Steele's video talk if you haven't seen already. See: Guy
Steele on Parallel Programing.
A better way to write it for parallel programing, is to map a char-
transform function to each char in the string. Here's a pseudo-code in
lisp by Helmut Eller:
(defun asciify-char (c)
((? ? ? ?) ?a)
((? ? ? ?) ?e)
((? ? ? ?) ?i)
((? ? ? ?) ?o)
((? ? ? ?) ?u)
(defun asciify-string (string) (map 'string #'asciify-string string))
One problem with this is that the function “asciify-char” itself is
sequential, and not 100% parallelizable. (we might assume here that
there are billions of chars in Unicode that needs to be transformed)
It would be a interesting small project, if someone actually use a
parallel-algorithm-aware language to work on this problem, and report
on the break-point of file-size of parallel-algorithm vs sequential-
Anyone would try it? Perhaps in Fortress, Erlang, Ease, Alice, X10, or
other? Is the Clojure parallel aware?
More information about the Python-list