[CentralOH] Regex Parser in Python, Go Benchmark
James Bonanno
james at atlantixeng.com
Wed Jul 27 01:23:11 EDT 2016
I have a Python3 program, containing a lexer/parser written in pure
Regex's (re module) that runs in typically 50 milli-seconds for a
typical file that is parses. Then I converted this program into Golang,
and it typically runs in about 600 milli-seconds. I had to see what the
fuss is all about.
Since I use Regex's quite a bit, this has me very curious. I will say
that the Python re module is quite nice, particularly the
re.scanner(|.join(multiple_regex)) module that is somewhat of a hidden
gem. I can't produce the same functionality in Go without hacking, and
it became quite frustrating to simply try to have an iterable that
contains the name,value pair of each regex obtained token in a string
(or line of text.) In the end, this wasn't possible with the native
Regex library in Go, because match names in named regex are a different
api call versus match values of a regex.
The glory of compiled languages seems to lose meaning without real world
benchmarks, albeit I like how fast Go compiles in general. The re module
in Python has been rugged and dependable.
James
More information about the CentralOH
mailing list