Debugging reason for python running unreasonably slow when adding numbers
Alexander Nestorov
alexandernst at gmail.com
Tue Mar 14 03:48:24 EDT 2023
I'm working on an NLP and I got bitten by an unreasonably slow behaviour in Python while operating with small amounts of numbers.
I have the following code:
```python
import random, time
from functools import reduce
def trainPerceptron(perceptron, data):
learningRate = 0.002
weights = perceptron['weights']
error = 0
for chunk in data:
input = chunk['input']
output = chunk['output']
# 12x slower than equivalent JS
sum_ = 0
for key in input:
v = weights[key]
sum_ += v
# 20x slower than equivalent JS
#sum_ = reduce(lambda acc, key: acc + weights[key], input)
actualOutput = sum_ if sum_ > 0 else 0
expectedOutput = 1 if output == perceptron['id'] else 0
currentError = expectedOutput - actualOutput
if currentError:
error += currentError ** 2
change = currentError * learningRate
for key in input:
weights[key] += change
return error
# Build mock data structure
data = [{
'input': random.sample(range(0, 5146), 10),
'output': 0
} for _ in range(11514)]
perceptrons = [{
'id': i,
'weights': [0.0] * 5147,
} for i in range(60)] # simulate 60 perceptrons
# Simulate NLU
for i in range(151): # 150 iterations
hrstart = time.perf_counter()
for perceptron in perceptrons:
trainPerceptron(perceptron, data)
hrend = time.perf_counter()
print(f'Epoch {i} - Time for training: {int((hrend - hrstart) * 1000)}ms')
```
Running it on my M1 MBP I get the following numbers.
```
Epoch 0 - Time for training: 199ms
Epoch 1 - Time for training: 198ms
Epoch 2 - Time for training: 199ms
Epoch 3 - Time for training: 197ms
Epoch 4 - Time for training: 199ms
...
Epoch 146 - Time for training: 198ms
Epoch 147 - Time for training: 200ms
Epoch 148 - Time for training: 198ms
Epoch 149 - Time for training: 198ms
Epoch 150 - Time for training: 198ms
```
Each epoch is taking around 200ms, which is unreasonably slow given the small amount of numbers that are being processed. I profiled the code with `cProfile` in order to find out what is going on:
```
655306 function calls (655274 primitive calls) in 59.972 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
3/1 0.000 0.000 59.972 59.972 {built-in method builtins.exec}
1 0.005 0.005 59.972 59.972 poc-small.py:1(<module>)
9060 59.850 0.007 59.850 0.007 poc-small.py:4(trainPerceptron)
1 0.006 0.006 0.112 0.112 poc-small.py:34(<listcomp>)
11514 0.039 0.000 0.106 0.000 random.py:382(sample)
115232 0.034 0.000 0.047 0.000 random.py:235(_randbelow_with_getrandbits)
11548 0.002 0.000 0.012 0.000 {built-in method builtins.isinstance}
11514 0.002 0.000 0.010 0.000 <frozen abc>:117(__instancecheck__)
183616 0.010 0.000 0.010 0.000 {method 'getrandbits' of '_random.Random' objects}
11514 0.002 0.000 0.008 0.000 {built-in method _abc._abc_instancecheck}
11514 0.002 0.000 0.006 0.000 <frozen abc>:121(__subclasscheck__)
115140 0.005 0.000 0.005 0.000 {method 'add' of 'set' objects}
11514 0.003 0.000 0.004 0.000 {built-in method _abc._abc_subclasscheck}
115232 0.004 0.000 0.004 0.000 {method 'bit_length' of 'int' objects}
151 0.003 0.000 0.003 0.000 {built-in method builtins.print}
```
This wasn't too helpful, so I tried with [line_profiler][1]:
```
Timer unit: 1e-06 s
Total time: 55.2079 s
File: poc-small.py
Function: trainPerceptron at line 4
Line # Hits Time Per Hit % Time Line Contents
==============================================================
4 @profile
5 def trainPerceptron(perceptron, data):
6 1214 301.0 0.2 0.0 learningRate = 0.002
7 1214 255.0 0.2 0.0 weights = perceptron['weights']
8 1214 114.0 0.1 0.0 error = 0
9 13973840 1742427.0 0.1 3.2 for chunk in data:
10 13973840 1655043.0 0.1 3.0 input = chunk['input']
11 13973840 1487543.0 0.1 2.7 output = chunk['output']
12
13 # 12x slower than equivalent JS
14 13973840 1210755.0 0.1 2.2 sum_ = 0
15 139738397 13821056.0 0.1 25.0 for key in input:
16 139738397 13794656.0 0.1 25.0 v = weights[key]
17 139738396 14942692.0 0.1 27.1 sum_ += v
18
19 # 20x slower than equivalent JS
20 #sum_ = reduce(lambda acc, key: acc + weights[key], input)
21
22 13973839 1618273.0 0.1 2.9 actualOutput = sum_ if sum_ > 0 else 0
23
24 13973839 1689194.0 0.1 3.1 expectedOutput = 1 if output == perceptron['id'] else 0
25 13973839 1365346.0 0.1 2.5 currentError = expectedOutput - actualOutput
26 13732045 1211916.0 0.1 2.2 if currentError:
27 241794 38375.0 0.2 0.1 error += currentError ** 2
28 241794 25377.0 0.1 0.0 change = currentError * learningRate
29 2417940 271237.0 0.1 0.5 for key in input:
30 2417940 332890.0 0.1 0.6 weights[key] += change
31
32 1213 405.0 0.3 0.0 return error
```
This shows that these 3 lines (that are adding the numbers) are taking the entire runtime budget:
```python
for key in input:
v = weights[key]
sum_ += v
```
I thought throwing numpy at the problem in order to speed it up, but `input` has a very small size, which means that the overhead of calling numpy will hurt the performance more than the gains obtained by making the math operations with it.
Anyhow, adding numbers shouldn't be that slow, which makes me believe something weird is going on with Python. In order to confirm my theory, I ported the code to Javascript. This is the result:
```js
function trainPerceptron(perceptron, data) {
const learningRate = 0.002;
const weights = perceptron['weights'];
let error = 0;
for (const chunk of data) {
const input = chunk['input'];
const output = chunk['output'];
const sum = input.reduce((acc, key) => acc + weights[key], 0);
const actualOutput = sum > 0 ? sum : 0;
const expectedOutput = output === perceptron['id'] ? 1 : 0;
const currentError = expectedOutput - actualOutput;
if (currentError) {
error += currentError ** 2;
const change = currentError * learningRate;
for (const key in input) {
weights[key] += change;
}
}
}
return error;
}
// Build mock data structure
const data = new Array(11514);
for (let i = 0; i < data.length; i++) {
const inputSet = new Set();
while (inputSet.size < 10) {
inputSet.add(Math.floor(Math.random() * 5146));
}
const input = Array.from(inputSet);
data[i] = { input: input, output: 0 };
}
const perceptrons = Array.from({ length: 60 }, (_, i) => ({
id: i,
weights: Array.from({ length: 5147 }, () => 0.0),
})); // simulate 60 perceptrons
// Simulate NLU
for (let i = 0; i < 151; i++) { // 150 iterations
const hrstart = performance.now();
for (const perceptron of perceptrons) {
trainPerceptron(perceptron, data);
}
const hrend = performance.now();
console.log(`Epoch ${i} - Time for training: ${Math.floor(hrend - hrstart)}ms`);
}
```
When I run the JS code I get the following numbers:
```
Epoch 0 - Time for training: 30ms
Epoch 1 - Time for training: 18ms
Epoch 2 - Time for training: 17ms
Epoch 3 - Time for training: 17ms
Epoch 4 - Time for training: 17ms
...
Epoch 147 - Time for training: 17ms
Epoch 148 - Time for training: 17ms
Epoch 149 - Time for training: 17ms
Epoch 150 - Time for training: 17ms
```
These numbers confirm my theory. Python is being unreasonably slow. Any idea why or what exactly is making it perform so poorly?
Runtime details:
MacOS Ventura 13.2.1 (22D68)
Macbook Pro M1 Pro 32GB
Python 3.11.0 (native Apple Silicon)
Regards!
[1]: https://github.com/pyutils/line_profiler
More information about the Python-list
mailing list