I have the task of merging 2 XML documents together that differ in their encoding types. One is declared as a UTF-8 (8bit UCS Transformation format) doc, the other the ISO-8859-1 format type (Latin Alpha No. 1). I wanted to do it w/o parsing the xml as well, as that's an expensive operation and with large documents can be problematic. Well, I figured, this is easy! I'll do the following:
- Create a String Buffer to hold the new Large XML
- Strip the Root Nodes and any xml header (doctype/?xml etc)
- Write each xml doc to the String Buffer
- Append the Root Node Back to the Main String
- Close the String Buffer
- Have a snack
<?xml version="1.0" encoding="UTF-8"?>XML Doc 2:
<jobs>
<job>
<jobtitle>Job 1</jobtitle> ...
</job>
</jobs>
<?xml version="1.0" encoding="iso-8859-1"?>
<jobs>
<job>
<jobtitle>Job 2</jobtitle> ...
</job>
</jobs>
Turns out, that only works well if the XML documents you're attempting to merge are of the same encoding type. Any ideas on a work around?
Solution:
What I did, is I specified an encoding type for a FileWriter object, and followed the same process, but had to write the file to disk specifying a unified encoding type, then read the file back.
This worked ok, but I am looking at alternatives like going to Binary and back to String again, but for now, this is my best available option.
No comments:
Post a Comment