Discussion:
[jira] [Commented] (XALANJ-2540) Very inefficient default behaviour for looking up DTMManager
Matthew Broadhead (JIRA)
2018-04-04 18:12:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425953#comment-16425953 ]

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

Does anyone know the issues with this?  Is it actually fixable?  What code is this happening in?
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Laszlo Hornyak (JIRA)
2018-04-04 20:05:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426099#comment-16426099 ]

Laszlo Hornyak commented on XALANJ-2540:
----------------------------------------

It is easy fix, but the project is abandoned for years. It is better if you set the properties as described.
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Matthew Broadhead (JIRA)
2018-04-09 18:27:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431029#comment-16431029 ]

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

if it is easy to fix can you explain how to do it?  do you know which code is involved?  this issue has 31 upvotes.  we have problems with jsp taglibs in tomcat and tomee causing problems when redeploying webapps (https://bz.apache.org/bugzilla/show_bug.cgi?id=61875).  also blocking using apache fop or any xslt processing in Tomcat and TomEE (works but not after webapp redeploy).  
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Matthew Broadhead (JIRA)
2018-04-16 10:08:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439216#comment-16439216 ]

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

?
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Matthew Broadhead (JIRA)
2018-04-16 13:19:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439427#comment-16439427 ]

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

if i go to the xalan frontpage [https://xalan.apache.org/] it says the code can be found at [http://svn.apache.org/repos/asf/xalan/xalan-j/trunk/] which just says "Not found".  I am trying to find the org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() function mentioned in the original bug report.  Can anyone help?
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Gary Gregory (JIRA)
2018-04-16 13:25:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439437#comment-16439437 ]

Gary Gregory commented on XALANJ-2540:
--------------------------------------

It's been a long time but I am pretty sure I released 2.7.1 out of [https://svn.apache.org/repos/asf/xalan/java/branches/xalan-j_2_7_1_maint/]

Gary
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Maruan Sahyoun (JIRA)
2018-04-16 18:41:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439854#comment-16439854 ]

Maruan Sahyoun commented on XALANJ-2540:
----------------------------------------

Just in case you haven't found it it's in http://svn.apache.org/repos/asf/xalan/java/trunk/src/org/apache/xml/dtm/ObjectFactory.java
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Gary Gregory (JIRA)
2018-04-16 19:52:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439952#comment-16439952 ]

Gary Gregory commented on XALANJ-2540:
--------------------------------------

Do keep in mind that it has been a long time since a release and if there is a 2.7.3 it will likely be from the branch.
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org
Matthew Broadhead (JIRA)
2018-05-23 09:21:00 UTC
Permalink
[ https://issues.apache.org/jira/browse/XALANJ-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16486962#comment-16486962 ]

Matthew Broadhead commented on XALANJ-2540:
-------------------------------------------

[~garydgregory] under your recommendation i have cloned 2.7.1 as that is the highest maintenance release i can see.  i have greped for 2.7.3 but cannot see that mentioned in any of the files

There is no pom.xml or anything so it looks like manual building?

[~msahyoun] thanks i have looked in ObjectFactory and see where it doing lookUpFactoryClassName().  do you think it is possible to cache the result into a Singleton for future requests?  or might this cause clashes?

I could submit a patch for Singleton suggestion but I am not sure how to build and deploy the project
Very inefficient default behaviour for looking up DTMManager
------------------------------------------------------------
Key: XALANJ-2540
URL: https://issues.apache.org/jira/browse/XALANJ-2540
Project: XalanJ2
Issue Type: Improvement
Security Level: No security risk; visible to anyone(Ordinary problems in Xalan projects. Anybody can view the issue.)
Components: DTM, XPath
Affects Versions: 2.7.1, 2.7
Reporter: Lukas Eder
Priority: Major
http://stackoverflow.com/questions/6340802/java-xpath-apache-jaxp-implementation-performance
I think the default behaviour of
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName() is quite sub-optimal and should be improved, statically. I imagine, it is unlikely that this configuration is going to change once classes have been loaded. Hence, the fallback lookup of META-INF/service/org.apache.xml.dtm.DTMManager should only be done once.
----
Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();
// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();
// Negligible
XPath xpath = factory.newXPath();
// Accounts for 70% (caching a compiled expression doesn't change much...)
String result = (String) xpath.evaluate(
"//SomeElementName", document, XPathConstants.STRING);
org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl
I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.
----
I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in
org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()
It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's
META-INF/service/org.apache.xml.dtm.DTMManager
-Dorg.apache.xml.dtm.DTMManager=
org.apache.xml.dtm.ref.DTMManagerDefault
or
-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault
measured library : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation : 10400ms | 4717ms | | 25500ms
reusing XPathFactory : 5995ms | 2829ms | |
reusing XPath : 5900ms | 2890ms | |
reusing XPathExpression : 5800ms | 2915ms | 16000ms | 25000ms
adding the JVM param : 1163ms | 761ms | n/a |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-***@xalan.apache.org
For additional commands, e-mail: dev-***@xalan.apache.org

Loading...