Discussion:
[jira] [Commented] (XALANJ-2560) ToXMLStream does not support unicode supplementary characters
Thomas Scheffler (JIRA)
2018-01-04 07:14:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16310888#comment-16310888 ]

Thomas Scheffler commented on XALANJ-2560:
------------------------------------------

As Xalan produces invalid XML. This is a real show stopper. Sad to see, that it is still unresolved.
ToXMLStream does not support unicode supplementary characters
-------------------------------------------------------------
Key: XALANJ-2560
URL: https://issues.apache.org/jira/browse/XALANJ-2560
Project: XalanJ2
Issue Type: Bug
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: Serialization
Affects Versions: 2.7.1
Environment: Xalan 2.7.1 serializer.
Tested on Ubuntu 12.04 with Oracle JDK 1.7.0_05.
Reporter: Damien Guillaume
Labels: serialization, unicode
org.apache.xml.serializer.ToXMLStream (which extends ToStream) does not support serialization of unicode supplementary characters such as U+1D49C. It creates invalid characters entities like "��" instead of "𝒜" (or F0 9D 92 9C with UTF-8). ToXMLStream is used by LSSerializer when Xalan's serializer is on the classpath.
org.apache.xml.serialize.DOMSerializerImpl (included in Xerces) does not have this problem, but it is deprecated since Xerces 2.9.0, so this is a regression.
See http://stackoverflow.com/questions/11952289/serializing-supplementary-unicode-characters-into-xml-documents-with-java for more details.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org

Loading...