New GitHub issue #119654 from nedbat:<br>
<hr>
<pre>
# Bug report
### Bug description:
The tokenize module creates TokenInfo objects with a `.line` attribute. In Python 3.11, each token on a line used the same string object for `.line`. In 3.12, each token has a new copy of the same string.
This is part of a memory issue reported against coverage.py: https://github.com/nedbat/coveragepy/issues/1791
```python
# tok.py
import io
import sys
import tokenize
print(f"{sys.version = }")
text = "lorem ipsum quia dolor sit amet consectetur adipisci velit"
readline = io.StringIO(text).readline
toks = list(tokenize.generate_tokens(readline))
print(f"{toks[0].line = }")
print(f"{(toks[0].line == toks[1].line) = }")
print(f"{(toks[0].line is toks[1].line) = }")
```
3.11 re-uses string objects:
```
% python3.11 /tmp/tok.py
sys.version = '3.11.9 (main, Apr 8 2024, 14:01:56) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = True
```
3.12 (and above) makes new string objects:
```
% python3.12 /tmp/tok.py
sys.version = '3.12.3 (main, Apr 9 2024, 15:45:14) [Clang 15.0.0 (clang-1500.3.9.4)]'
toks[0].line = 'lorem ipsum quia dolor sit amet consectetur adipisci velit'
(toks[0].line == toks[1].line) = True
(toks[0].line is toks[1].line) = False
```
### CPython versions tested on:
3.11, 3.12, 3.13, CPython main branch
### Operating systems tested on:
macOS
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/119654">View on GitHub</a>
<p>Labels: type-bug</p>
<p>Assignee: </p>