22 Jan
2016
22 Jan
'16
3:54 p.m.
In running an lxml on some3,000 TEI files, about 2,000 ended up with string of null characters( \x00) after the closing </TEI> element. They generate the error message "Content is not allowed in training section."The problem doesn't seem to be random. While I can't see the difference between the input files(all of them validate with Jing), files that don't generate the\x00 characters the first time, also don't do it the second time, but files that do it the first time do it the second time. I can remove the offending character manually, but that doesn't scale to 3,000 files. Is there a way of checking for the characters before or after the final serialization and getting rid of them? Martin Mueller
3189
Age (days ago)
3189
Last active (days ago)
0 comments
1 participants
participants (1)
-
Martin Mueller