Walter Dörwald wrote:
The stateful decoder has a little problem: At least three bytes have to be available from the stream until the StreamReader decides whether these bytes are a BOM that has to be skipped. This means that if the file only contains "ab", the user will never see these two characters.
This can be improved, of course: If the first byte is "a", it most definitely is *not* an UTF-8 signature.
So we only need a second byte for the characters between U+F000 and U+FFFF, and a third byte only for the characters U+FEC0...U+FEFF. But with the first byte being \xef, we need three bytes *anyway*, so we can always decide with the first byte only whether we need to wait for three bytes.
A solution for this would be to add an argument named final to the decode and read methods that tells the decoder that the stream has ended and the remaining buffered bytes have to be handled now.
Shouldn't an empty read from the underlying stream be taken as an EOF?