[issue19548] 'codecs' module docs improvements

Mon Nov 11 02:29:13 CET 2013

New submission from Jan Kaliszewski:

When learning about the 'codecs' module I encountered several places in the docs of the module that, I believe, could be improved to be clearer and easier for codecs-begginers: 

1. Ad `codecs.encode` and `codecs.decode` descriptions: I believe it would be worth to mention that, unlike str.encode()/bytes.decode(), these functions (and all their counterparts in the classes the module contains) support not only "traditional str/bytes encodings", but also bytes-to-bytes as well as str-to-str encodings. 

2. Ad 'codecs.register': in two places there is such a text: `These have to be factory functions providing the following interface: factory([...] errors='strict')` -- `errors='strict'` may be confusing (at the first sight it may suggest that the only valid value is 'strict'; maybe `factory(errors=<error handler label>)` with an appropriate description below would be better?).

3. Ad `codecs.open`: I believe there should be a reference to the built-in open() as an alternative that is better is most cases.

4. Ad `codecs.BOM*`: `These constants define various encodings of the Unicode byte order mark (BOM).` -- the world `encodings` seems to be confusing here; maybe `These constants define various byte sequences being Unicode byte order marks (BOMs) for several encodings. They are used...` would be better?

5. Ad `7.2.1. Codec Base Classes` + `codecs.IncrementalEncoder`/`codecs/IncrementalDecoder`:
  * `Each codec has to define four interfaces to make it usable as codec in Python: stateless encoder, stateless decoder, stream reader and stream writer` -- only four? Not six? What about incremental encoder/decoder???
  * Comparing the fragments (and tables) about error halding methods (Codecs Base Classes, IncrementalEncoder, IncrementalDecoder) with similar fragment in the `codecs.register` description and with the `codecs.register_error` description I was confused: is it the matter of a particular codec implementation or of a registered error handler to implement a particular way of error handling? I believe it would be worth to describe clearly relations between these elements of the API. Also more detailed description of differences beetween error handling for encoding and decoding, and translation would be a good thing. 

6. Ad `7.2.1.6. StreamReaderWriter Objects` and `7.2.1.7. StreamRecoder Objects`: It would be worth to say explicitly that, contrary to previously described abstract classes (IncrementalEncoder/Decoder, StreamReader/Writer), these classes are *concrete* ones (if I understand it correctly).

7. Ad `7.2.4. Python Specific Encodings`:
  * `raw_unicode_encoding` -- see: ticket #19539.
  * `unicode_encoding` -- `Produce a string that is suitable as Unicode literal in Python source code` but it is *not* a string; it's a *bytes* object (which could be used in source code using an `ascii`-compatibile encoding).
  * `bytes-to-bytes` and `str-to-str` encodings -- maybe it would be nice to mention that these encodings cannot be used with str.encode()/bytes.decode() methods (and to mention again they *can* be used with the functions/method provided by the `codecs` module).

----------
assignee: docs at python
components: Documentation
messages: 202593
nosy: docs at python, zuo
priority: normal
severity: normal
status: open
title: 'codecs' module docs improvements
versions: Python 2.6, Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19548>
_______________________________________