[Tutor] dbus.Array to string

Steven D'Aprano steve at pearwood.info
Wed Aug 14 04:14:19 CEST 2013


On 14/08/13 02:49, Marc Tompkins wrote:
> On Tue, Aug 13, 2013 at 8:59 AM, Amit Saha <amitsaha.in at gmail.com> wrote:
>
>
>> What does it mean (and will it always work?) when I don't specify any
>> encoding:
>>
>>>>> bytearray(ssid).decode()
>> u'BigPond679D85'
>>
>
> If you don't specify an encoding, then the default encoding is used; as you
> point out a bit later, your local default is ascii.
>
> Will it always work?  NO.  If there are any characters in the input stream
> (the SSID in this case), .decode will fail (probably with
> UnicodeDecodeError, but I can't test it at the moment.)

Careful -- you are confusing two distinct concepts here. ssid does not contain characters. It contains bytes. There are exactly 256 possible bytes, which are numbers 0, 1, ... 255. They may *represent* characters, or sounds, or images, or motion video, or any other form of data you like, in which case you have to ask (e.g.) "how is the sound encoded into bytes? is it a WAV file, or MP3, or OGG, or something else?"

In this case, the ssid represents characters, but it contains bytes, and the same question applies -- how are the characters A, B, C, ... encoded into bytes? Unless you know which encoding is used, you have to guess. If you guess wrong, you'll get errors. If you're lucky you will get an exception, and know that you guessed wrong, but if you're unlucky you'll just get garbage characters.

Fortunately, there are a couple of decent guesses you can make which will often be correct, at least in Western European countries, Australia, the USA, and similar:

UTF-8
ASCII
Latin-1

Latin-1 should be considered the "last resort" encoding, since it will never fail. But it can return garbage. UTF-8 should be considered your first guess, since it is the standard encoding that everyone should use. (Any application that doesn't use UTF-8 by default in the 21st century is, in my opinion, buggy.)



> I don't know the WiFi spec well enough to know whether you're ever going to
> run into non-ASCII characters in an SSID;

A little bit of googling shows that it definitely happens, and that UTF-8 is the standard encoding to use.




-- 
Steven


More information about the Tutor mailing list