[Expat-discuss] Hello and question about unknown encoding handler
Alfonso
alforan at tin.it
Tue Nov 20 09:49:26 CET 2007
Hi expat gurus.
I am Alfonso from Italy and I ported expat to MorphOS/Amiga (see
alfie.altervista.org), while writing a RSS client.
My problem is this:
let's say I am downloading from http://bash.org.ru/rss.
- it sends me a xml file encoded in windows-1251
- expat calls my unknownEncodingHandler
- in the above handler I know how to handle with that encoding (I am
using lib codesets for that porpouse); so I set the map this way:
---
/* codeset=findcodeset() */
for (i = 0; i<256; i++)
{
int l = codeset->table[i].utf8[0];
if (l==1) info->map[i] = i;
else info->map[i] = -l;
}
info->convert = convert;
info->data = codeset;
---
- now for any char to be translated to UTF-8, convert is called:
---
int convert(void *data, const char *s)
{
struct codeset *codeset = (struct codeset *)data;
return codeset->table[*s].ucs4 ;
}
---
I thought that I should return the UTF-8 unsigned int of the char.
But it doesn't work, for the simple reason, any windows-1251 chars is
just a single char and doesn't start any sequences. That results in
expat considering any "strange" char as starting a sequence of 2 or 3
chars, while it is always a single one.
Is there a solution for the above problem? What should I do in the
unknownEncoding handler or in convert()? I am sure there is something I
don't get :P
Thank for your help.
Ciao. Alfonso
More information about the Expat-discuss
mailing list