ANN: WordSegment 0.6.1 Released

Grant Jenks grant.jenks at
Tue Sep 15 19:04:28 CEST 2015

Announcing the Release of WordSegment Version 0.6.1

What is WordSegment?

WordSegment is an Apache2 licensed module for English word segmentation,
written in pure-Python, and based on a trillion-word corpus. Based on code from
the chapter “Natural Language Corpus Data” by Peter Norvig from the book
“Beautiful Data” (Segaran and Hammerbacher, 2009). Data files are derived from
the Google Web Trillion Word Corpus. It's implemented in pure-Python with 100%
code coverage and complete documentation.

What's new in 0.6.1?

- Exposed TOTAL constant representing the count of all unigrams in the corpus.
  Defaults to 1,024,908,267,229.
- Added documentation on how to use a different corpus:


- Documentation:
- Download:
- Source:
- Issues:

This release is backwards-compatible. Please upgrade.

More information about the Python-announce-list mailing list