N-grams
srinivas devaki
mr.eightnoteight at gmail.com
Thu Nov 10 02:13:47 EST 2016
> def ngrams(iterable, n=2):
> if n < 1:
> raise ValueError
> t = tee(iterable, n)
> for i, x in enumerate(t):
> for j in range(i):
> next(x, None)
> return zip(*t)
def myngrams(iterable, n=2):
t = list(tee(iterable, 1))
for _ in range(n - 1):
t.extend(tee(t.pop()))
next(t[-1], None)
return zip(*t)
complexity wise it's O(N), but space complexity is O(N**2) to execute
this function,
to consume the output i.e completely iterate over zip object, the time
complexity is obviously O(N*M), and but space complexity is just
O(N*N).
---------------------------------------------------------------
In [1]: %paste
def ngrams(iterable, n=2):
if n < 1:
raise ValueError
t = tee(iterable, n)
for i, x in enumerate(t):
for j in range(i):
next(x, None)
return zip(*t)
## -- End pasted text --
In [2]: %paste
def myngrams(iterable, n=2):
t = list(tee(iterable, 1))
for _ in range(n - 1):
t.extend(tee(t.pop()))
next(t[-1], None)
return zip(*t)
## -- End pasted text --
In [3]: from itertools import *
In [4]: %timeit ngrams(range(1000), n=500)
100 loops, best of 3: 17.3 ms per loop
In [5]: %timeit myngrams(range(1000), n=500)
1000 loops, best of 3: 469 µs per loop
In [6]: %timeit list(ngrams(range(1000), n=500))
10 loops, best of 3: 21.2 ms per loop
In [7]: %timeit list(myngrams(range(1000), n=500))
100 loops, best of 3: 4.41 ms per loop
In [8]: %timeit ngrams(range(1000), n=100)
1000 loops, best of 3: 722 µs per loop
In [9]: %timeit myngrams(range(1000), n=100)
10000 loops, best of 3: 94.9 µs per loop
In [10]: %timeit list(ngrams(range(1000), n=100))
100 loops, best of 3: 2.09 ms per loop
In [11]: %timeit list(myngrams(range(1000), n=100))
1000 loops, best of 3: 1.46 ms per loop
In [12]:
-----------------------------------------------------------
Regards
Srinivas Devaki
Senior (4th year) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
phone: +91 9491 383 249
telegram: @eightnoteight
More information about the Python-list
mailing list