[Tutor] Unknown encoded file types.

Alan Gauld alan.gauld at yahoo.co.uk
Sun Feb 7 07:15:27 EST 2021


On 07/02/2021 09:55, mhysnm1964 at gmail.com wrote:

> When using binary mode to load a text file. Does all the encoding bytes stay
> present in the file after the content of the file has been loaded? Thus when
> you join the content from two files together. You are getting the encoding
> information half way through the join text?

This is is the problem, there are no encoding bytes. the encoding is
implicit. You have to know (or figure out) the encoding based on the
content!

Modern file formats such as HTML/XML etc include an encoding string at
the start of the file for exactly that reason. But older encodings
(pre-unicode) had no clue what the encoding was.

About all you can do is visually inspect them - maybe even using
a hex editor or assembly debugger! A good editor such as vim or
emacs or <whatever the windoze equivalent might be!> will try to
present anything it recognizes and that can help figure out
what's missing and give a clue. Then you can try multiple
different encodings to see if any of them work.

But ultimately it is guess work and trial and error!

If you know the country/language used that can often narrow things
down significantly. And if its unicode then you really only have 3
options to try so its not too bad.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos




More information about the Tutor mailing list