CHAINS research project at KTH Royal Institute of Technology

Master Thesis Topics in Project Chains

Project Chains hosts master’s students for their theses, here are available topics. See main page for completed theses.

Empirical Study of API Difference Tools for Java Dependencies

Contact: Frank Reyes Garcia

Java applications rely extensively on external libraries, which are frequently updated and modified. As these libraries evolve, changes to their public APIs can introduce breaking changes, binary incompatibilities, or subtle behavioral issues that may impact client projects. Detecting and understanding these API changes is critical for maintaining software reliability and facilitating safe dependency updates. Several tools such as roseau, japicmp, Revapi, and Clirr have been developed to analyze and report API differences between library versions. This thesis will conduct a comprehensive comparative study of leading API diff tools, applying them to a diverse set of real-world open-source Java projects. The evaluation will focus on each tool’s ability to detect and classify different types of API changes (e.g., breaking, non-breaking, additions, deprecations). The outcome will be a benchmark and critical analysis of existing API diff tools and a dataset of API changes in real-world Java libraries.

Related Work:

[1] API evolution and compatibility: A data corpus and tool evaluation

[2] Understanding the Impact of APIs Behavioral Breaking Changes on Client Applications

How prevalent is Maven Class Hijacking?

Contact: Aman Sharma, Frank Reyes Garcia

Maven Class Hijacking [1] is a supply chain attack where a legitimiate Java class deep in the dependency tree can act malicious by shadowing a legitimate Java class that one declares directly. We want to explore how prevalent the condition “infection dependency precedes the gadget dependency” is. In this thesis, we will construct a dataset of Maven projects to answer the above question. The two criteria of the dataset can be 1) duplication of fully qualified names of class across two different dependencies. 2) dependencies that could become infectious by analyzing social engineering proxies such as no commits in the past 10 years. In the paper [1], we also recommend a mitigation for this attack. We would like to know how prevalent this mitigation is and in what cases it can break the build leading to a false-positive.

[1] Maven-Hijack: Software Supply Chain Attack Exploiting Packaging Order

Related Work:

[2] Will Dependency Conflicts Affect My Program’s Semantics?

[3] DevPhish: Exploring Social Engineering in Software Supply Chain Attacks on Developers

Ahead of Time Compilation Cache Analysis

Contact: Aman Sharma

JEP 483 introduced a performance optimization technique to improve startup time. It allowed creating an “AOT” cache which stores the compiled versions of commonly loaded classfiles. In this thesis, we will explore the commonly loaded classfile by implementing an AOT Cache reader. Next, we can analyze how are synthetically generated classfiles handled. Another question to investigate is if this cache can be repurposed as an allowlist of classes similar to the concept of BOMI in SBOM.exe [1].

[1] SBOM.EXE: Countering Dynamic Code Injection based on Software Bill of Materials in Java

Trust Assumptions and Threats in Build Attestation System

Contact: Larissa Schmid

Description: Build attestations are cryptographically verifiable statements that describe how, when, and by whom a software artifact was produced. They are used for strengthening software supply chain security by ensuring that binaries and container images can be traced back to a documented build process. While standards like SLSA and tools such as Sigstore, Tekton Chains, and GitHub's native attestations promise to ensure trust in build outputs, there is no systematic assessment of their capabilities and limitations. This thesis will examine which trust assumptions different build attestation systems make, what attacker models they use, and how well current implementations satisfy their security goals. The work should evaluate potential attack vectors and propose recommendations for more robust, verifiable provenance.

Empirical study of vulnerability tracking processes in vulnerability reports

Contact: Yekatierina Churakova

Description: Vulnerability scanning tools play a crucial role in the identification and collection of vulnerabilities across different systems and platforms. Having reliable and accurate report, which lists all associated vulnerabilities for the dependencies list, is crucial for supply-chain security. [SBOM](https://cyclonedx.org/capabilities/sbom/) and [VEX](https://cyclonedx.org/capabilities/vex/) productions tools (e.g. [Trivy](https://trivy.dev/), [Grype](https://github.com/anchore/grype), [DepScan](https://github.com/owasp-dep-scan/dep-scan) etc.) are used for this purpose. Every tool has a number of vulnerability database integrations to provide the most distinct report. However, vulnerability databases often use diverse naming conventions, IDs, and tracking systems, making it difficult to reveal information about a specific vulnerability. The inconsistency and fragmentation in vulnerability reporting is hapening, where different references to different vulnerability databases may use different identifiers for the same vulnerability, making it difficult to trace and assess risks consistently. In this project we will explore the area of vulnerability tracking and aims to address the vulnerability naming problems. The thesis will be focused on studying the approach for mapping various vulnerability identifiers across different databases to their corresponding Common Vulnerabilities and Exposures (CVE) IDs. The aim is to improve vulnerability tracking, propose a way to solve the naming problem, and enhance the accuracy of vulnerability reports. Related works: 1. [Impacts of Software Bill of Materials (SBOM) Generation on Vulnerability Detection](https://www.cs.montana.edu/izurieta/pubs/SCORED2024.pdf) 2. [Minimum Requirements for Vulnerability Exploitability eXchange (VEX) ](https://www.cisa.gov/sites/default/files/2023-04/minimum-requirements-for-vex-508c.pdf) 3. [Enhancing the Container Image Scanning Tool - GRYPE](https://ieeexplore.ieee.org/document/10200828) 4. [Understanding the Quality of Container Security Vulnerability Detection Tools](https://arxiv.org/pdf/2101.03844)

Empirical Study of Compilation Reproducibility in Solidity

Contact: Aman Sharma

Description: The reproducibility of software builds is a critical aspect of secure software development This concept has been pushed forward in the context of Solidity, the programming language used for writing smart contracts on the Ethereum blockchain, with the notion of "verified contracts". In this thesis, you will conduct an empirical study on the reproducibility of compilation in Solidity. You will recompile verified Solidity contracts and analyze the consistency of the results. The datasets for this study will be sourced from Etherscan and Sourcify. This research will contribute to the understanding of software integrity in the growing field of technology and could potentially inform best practices for Solidity development.

  1. Reproducible Builds: Increasing the Integrity of Software Supply Chains

  2. Etherscan

  3. Sourcify

Dynamic Integrity Verification & Repair for Java Applications

Contact: Martin Monperrus

Description: Attackers constantly try to tamper with the code of software applications in production. Chang and Attalah have proposed a technique to not only detect modifications and also repairing the code after attacks by a network of small security units called guards. These guards can be programmed to perform tasks such as checksumming the program code, and they work in concert to create mutual protection. In this thesis, you will devise, implement and evaluate such as an approach in the context of modern Java software with dependencies. An open question is how to set up guard inside or around dependency code.

  1. Protecting Software Code by Guards

  2. Reflection as a mechanism for software integrity verification

  3. Practical integrity protection with oblivious hashing

Dynamic Introspection of Dependencies in Java Applications

Contact: Aman Sharma

Description: We aim to design and develop a prototype for dynamic introspection of dependencies in Java applications. This would enable real-time tracking and decision based on the dependency execution context. By leveraging Java's instrumentation capabilities, the proposed system will monitor and identify the active dependencies at any given point during program execution. The focus will be on minimizing performance overhead to ensure that the introspection process does not significantly impact the application's responsiveness or efficiency, while integrating seamlessly with any existing Java application. Rigorous evaluation against various benchmarks will be one to assess its accuracy, performance, and usability.

Automatic Backporting of Java Libraries to Older Bytecode Versions

Contact: Aman Sharma

Description: With the rapid evolution of Java, libraries often get updated to new bytecode versions. This causes compatibility issues and breakages for applications that are still running on older versions of Java. To address this, a possible solution is to automatically backport Java libraries to older bytecode versions. This thesis will focus on designing and implementing an automated tool for backporting Java libraries. The tool should be capable of translating new bytecode instructions to their older equivalents, maintaining the functional behavior of the library while ensuring compatibility with older Java versions. An open question is how to handle new language features and APIs that do not have direct equivalents in older versions.

  1. Back to the past–analysing backporting practices in package dependency networks

  2. Recommending code changes for automatic backporting of Linux device drivers

  3. Transforming C++11 Code to C++03 to Support Legacy Compilation Environments

Reproducible Just-in-Time (JIT) Compilation

Contact: Martin Monperrus

This thesis will explore the concept of reproducible JIT compilation, focusing on the challenges and solutions for ensuring consistent and repeatable execution of program compilation across different runs. By analyzing the factors that contribute to variability in JIT compilation, such as optimization heuristics, runtime conditions, and system configurations, you will propose methodologies to achieve reproducibility. The study will involve the design and implementation of a framework that captures and standardizes the JIT compilation process, enabling developers to reproduce the same JIT compilation results reliably. Through empirical evaluation, the thesis will assess the impact of reproducible JIT compilation on software reliability, debugging, and performance, ultimately contributing to the development of more robust and trustworthy software systems.

  1. Recompilation for debugging support in a JIT-compiler
  2. https://github.com/rschwietzke/jmh-C2-compile