A suggestion for improving the documentation of chapter 10.9. Data Compression
Hello dear Python gods :) I'm new at Python and was going over the documentation . This useful paragraph : https://docs.python.org/3.7/tutorial/stdlib.html#data-compression provides info for compressing text but IMHO some important additional info is needed, without which I expect nearly all users would get stuck just as I have. Moreover, while looking for a solution I noticed a lot of articles with outdated examples which no longer work - this confused me even more to the point I started suspecting something was amiss on my end... Anyhow, adding this info would do the trick : zlib works only with bytes and in Python 3.x strings are no longer bytes. Example : *>>> zlib.compress('this is a sample string which does not work in zlib because it requires bytes and not unicode')Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: a bytes-like object is required, not 'str'>>>* Luckily Python has a very simple solution for this : encode to utf-8, compress, decode back to unicode like so : *>>> text = 'this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this »שלום '>>> encoded = text.encode()>>> text = 'this is a sample with unicode chars »שלום '>>> encoded = text.encode()>>> compressed = zlib.compress(encoded)>>> decompressed = zlib.decompress(compressed)>>> decoded = decompressed.decode()>>> len(text),len(encoded),len(compressed) #notice that compression of short strings can be larger than the original(42, 47, 54)>>> print(text,'\n',encoded,'\n',compressed)this is a sample with unicode chars »שלום b'this is a sample with unicode chars \xc2\xbb\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d ' b'x\x9c+\xc9\xc8,V\x00\xa2D\x85\xe2\xc4\xdc\x82\x9cT\x85\xf2\xcc\x92\x0c\x85\xd2\xbc\xcc\xe4\xfc\x94T\x85\xe4\x8c\xc4\xa2b\x85C\xbb\xaf\xaf\xbc>\xe7\xfa\xd4\xebs\x15\x00\xb2K\x14|'>>> text == decodedTrue* Thank you ! You are doing a great thing ! B.R., Guy Goldner
OOPS ! I forgot to give credit to this article : https://webkul.com/blog/string-and-bytes-conversion-in-python3-x/ On Sun, Sep 22, 2019 at 5:48 PM Guy Goldner <ggoldner@gmail.com> wrote:
Hello dear Python gods :) I'm new at Python and was going over the documentation .
This useful paragraph : https://docs.python.org/3.7/tutorial/stdlib.html#data-compression provides info for compressing text but IMHO some important additional info is needed, without which I expect nearly all users would get stuck just as I have. Moreover, while looking for a solution I noticed a lot of articles with outdated examples which no longer work - this confused me even more to the point I started suspecting something was amiss on my end... Anyhow, adding this info would do the trick : zlib works only with bytes and in Python 3.x strings are no longer bytes. Example :
*>>> zlib.compress('this is a sample string which does not work in zlib because it requires bytes and not unicode')Traceback (most recent call last): File "<stdin>", line 1, in <module>TypeError: a bytes-like object is required, not 'str'>>>*
Luckily Python has a very simple solution for this : encode to utf-8, compress, decode back to unicode like so :
*>>> text = 'this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this »שלום '>>> encoded = text.encode()>>> text = 'this is a sample with unicode chars »שלום '>>> encoded = text.encode()>>> compressed = zlib.compress(encoded)>>> decompressed = zlib.decompress(compressed)>>> decoded = decompressed.decode()>>> len(text),len(encoded),len(compressed) #notice that compression of short strings can be larger than the original(42, 47, 54)>>> print(text,'\n',encoded,'\n',compressed)this is a sample with unicode chars »שלום b'this is a sample with unicode chars \xc2\xbb\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d ' b'x\x9c+\xc9\xc8,V\x00\xa2D\x85\xe2\xc4\xdc\x82\x9cT\x85\xf2\xcc\x92\x0c\x85\xd2\xbc\xcc\xe4\xfc\x94T\x85\xe4\x8c\xc4\xa2b\x85C\xbb\xaf\xaf\xbc>\xe7\xfa\xd4\xebs\x15\x00\xb2K\x14|'>>> text == decodedTrue*
Thank you ! You are doing a great thing !
B.R., Guy Goldner
Hello Python Documentation Team, I wanted to extend my gratitude for the excellent Python documentation, especially the section on [data compression](https://docs.python.org/3.7/tutorial/stdlib.html#data-compression). It has been incredibly useful as I learn more about Python. However, I did encounter a point where I, and I believe many other users, might get stuck. The documentation mentions using `zlib` for compressing text, but it doesn't explicitly state that `zlib` works only with bytes, not strings. In Python 3.x, strings are no longer bytes, and this distinction is crucial for successful compression. To illustrate, attempting to compress a string directly will result in an error: ```python
import zlib zlib.compress('this is a sample string which does not work in zlib because it requires bytes and not unicode') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: a bytes-like object is required, not 'str'
To resolve this, we need to encode the string to bytes before compressing it and decode it back to a string after decompressing. Here’s a practical example:
```python
import zlib
# Sample text with unicode characters
text = 'this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this »שלום'
# Encode the string to bytes
encoded = text.encode()
# Compress the byte-encoded string
compressed = zlib.compress(encoded)
# Decompress the compressed bytes
decompressed = zlib.decompress(compressed)
# Decode the bytes back to a string
decoded = decompressed.decode()
# Display the lengths and the content
print(f"Original text: {text}")
print(f"Encoded text: {encoded}")
print(f"Compressed data: {compressed}")
print(f"Decompressed and decoded text: {decoded}")
# Check if the decompressed and decoded text matches the original text
print(f"Is decompressed text same as original? {text == decoded}")
Output: ``` Original text: this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this »שלום Encoded text: b'this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this \xc2\xbb\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d' Compressed data: b'x\x9c+\xc9\xc8,V\x00\xa2D\x85\xe2\xc4\xdc\x82\x9cT\x85\xf2\xcc\x92\x0c\x85\xd2\xbc\xcc\xe4\xfc\x94T\x85\xe4\x8c\xc4\xa2b\x85C\xbb\xaf\xaf\xbc>\xe7\xfa\xd4\xebs\x15\x00\xb2K\x14|' Decompressed and decoded text: this is a sample string which does not work in zlib because it requires bytes and not unicode with chars like this »שלום Is decompressed text same as original? True ``` This additional information about encoding and decoding could greatly assist users in avoiding common pitfalls. Thanks again for the great work on the documentation! Read More About Our Other Services https://www.codilar.com/adobe-commerce-development/ https://www.codilar.com/pimcore-development/ https://www.codilar.com/magento-development-company-uae/
participants (2)
-
Guy Goldner
-
marketplace codilar