[XML-SIG] how to get the 'codepage' from a xml document

Remy C. Cool dev-xml@smartology.nl
Fri, 10 Jan 2003 14:49:32 +0100

On Friday 10 January 2003 14:41, Thomas B. Passin wrote:
> [Remy C. Cool]
> > > > just have to find out how to get this implemented in a such a
> > > > way that I can pass the encoding to the parser.
> > >
> > > Why do you think you need to do this? A compliant parser is
> > > going to be autodetecting the encoding if you don't force it to
> > > use something else. Why do you want to do the autodetect
> > > externally?
> >
> > My application appends/inserts data into an existing xml file ...
> > some what like a print queue. So I need the encoding to be able
> > to create 'the new' xml file in the same encoding as the original
> > and I don't like to hardcode the encoding into the source. It
> > uses no external entity's (except for a DTD in plain ASCII) so
> > that's not a problem.
> I do not think you are looking at things quite the right way here. 
> When you read the source and parse it, the resulting characters
> should no longer be "encoded" - they are in the computer's internal
> format.  You should read the external file into which you want to
> insert the new material.  It will now be in the internal format
> too, and the two can be combined. When you write the combined file,
> you can specify the encodng to use.
> It is true that you still have to decide what encoding to use for
> the output, but you no longer have a mix-and-match problem. 
> Anyway, if you have to figure out the external file's encoding, you
> can always read the first line of the external file, look for
> "encoding = ",  look at the byte order mark if necessary, and do a
> simple-minded detection. It is bound to be good enough for the
> merging output in your situation.

Thanks for your comment.
The thing is that I don't want to hardcode the encoding format in the 
source for the output file ... it should be the same as the input 
automatically. I realized that a 'helper' function to read the first 
line would suffice for this purpose, and it does.