[CentralOH] Regex Parser in Python, Go Benchmark

James Bonanno james at atlantixeng.com
Wed Jul 27 01:23:11 EDT 2016


I have a Python3 program, containing a lexer/parser written in pure 
Regex's (re module) that runs in typically 50 milli-seconds for a 
typical file that is parses. Then I converted this program into Golang, 
and it typically runs in about 600 milli-seconds. I had to see what the 
fuss is all about.

Since I use Regex's quite a bit, this has me very curious. I will say 
that the Python re module is quite nice, particularly the 
re.scanner(|.join(multiple_regex)) module that is somewhat of a hidden 
gem. I can't produce the same functionality in Go without hacking, and 
it became quite frustrating to simply try to have an iterable that 
contains the name,value pair of each regex obtained token in a string 
(or line of text.) In the end, this wasn't possible with the native 
Regex library in Go, because match names in named regex are a different 
api call versus match values of a regex.

The glory of compiled languages seems to lose meaning without real world 
benchmarks, albeit I like how fast Go compiles in general. The re module 
in Python has been rugged and dependable.

James





More information about the CentralOH mailing list