The CHAINS research project on software supply chains @ KTH

Master Thesis Topics in Project Chains

Project Chains hosts master’s students for their theses, here are available topics. See main page for completed theses.

Reproducible Builds for non-compiled languages like JavaScript

Contact: Eric Cornelissen

Reproducible builds can create trust in the build process of distributed software artifacts through independent verifiers. If multiple people are able to build the same artifact it is likely that the build has not been tampered with. The reproducible builds project has made significant progress in the reproducibility of binary artifacts. But what about non-compiled languages such as JavaScript? In the world of JavaScript it is common to either bundle a codebase (e.g. using the popular webpack) or transpile down from a related language (e.g. from TypeScript). These transformations are similar to compilation and may not be reproducible. Here too, an attack on the build system could be used to attack systems stealthily, and reproducible builds could aid in detection.

In this project we will study the reproducibility of JavaScript build processes in one or more areas (npm packages, client-side bundling, transpiling, GitHub Actions, etc.) to understand the state of reproducibility in JavaScript and, depending on findings, propose fixes.

Related work:

  1. Reproducible Builds: Increasing the Integrity of Software Supply Chains
  2. Prototype of reproducing GitHub actions
  3. Reproducible Central

Detection and Mitigation of GitHub action smells

Contact: Eric Cornelissen

GitHub Actions is the continuous integration and continuous delivery (CI/CD) solution offered by GitHub. It supports “expressions” which are parts of the workflow that are filled in at runtime. The values may come from other parts of the CI/CD workflow or directly from the GitHub website. A problem with this is that an attacker controlled value used in the wrong way can lead to compromise of the CI/CD workflow. In this project we will look into automatically fixing such misconfigurations in GitHub Actions workflow definitions.

Related work:

Academic

  1. ARGUS: A Framework for Staged Static Taint Analysis of GitHub Workflows and Actions
  2. Automatic Security Assessment of GitHub Actions Workflows
  3. Characterizing the Security of Github CI Workflows
  4. Ambush From All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines
  5. Mitigating Security Issues in GitHub Actions
  6. ActionsRemaker: Reproducing GitHub Actions
  7. Quantifying Security Issues in Reusable JavaScript Actions in GitHub Workflows
  8. A Preliminary Study of GitHub Actions Dependencies
  9. On the outdatedness of workflows in the GitHub Actions ecosystem

Industry

  1. https://github.com/CycodeLabs/raven
  2. https://github.com/boostsecurityio/poutine
  3. https://boostsecurityio.github.io/lotp/
  4. https://github.com/AdnaneKhan/ActionsTOCTOU

Prototype:

Empirical Study of Compilation Reproducibility in Solidity

Contact: Aman Sharma

Description: The reproducibility of software builds is a critical aspect of secure software development This concept has been pushed forward in the context of Solidity, the programming language used for writing smart contracts on the Ethereum blockchain, with the notion of "verified contracts". In this thesis, you will conduct an empirical study on the reproducibility of compilation in Solidity. You will recompile verified Solidity contracts and analyze the consistency of the results. The datasets for this study will be sourced from Etherscan and Sourcify. This research will contribute to the understanding of software integrity in the growing field of technology and could potentially inform best practices for Solidity development.

  1. Reproducible Builds: Increasing the Integrity of Software Supply Chains

  2. Etherscan

  3. Sourcify

Zero-knowledge software bills of materials

Contact: Javier Ron

Description: Software bills of materials (SBOMs) are complete lists of software components [1], these can be helpful in tracing vulnerabilities, license compliance, etc. However, revealing an SBOM publicly also means revealing said vulnerabilities to malicious actors. Furthermore, some proprietary software developers advocate for access control for SBOM distribution [2]. Zero-knowledge proofs allows a party to convey that a statement is true without disclosing any additional information. [3] You will design, develop, and evaluate a zero-knowledge SBOM system, which allows developers to disclose limited, but verifiable SBOM information to authorized users.

  1. The Minimum Elements For a Software Bill of Materials https://www.ntia.doc.gov/files/ntia/publications/sbomminimumelementsreport.pdf

  2. An Empirical Study on Software Bill of Materials: Where We Stand and the Road Ahead http://arxiv.org/abs/2301.05362

  3. Zero-knowledge proof https://en.wikipedia.org/wiki/Zero-knowledgeproof

  4. Trust in Software Supply Chains: Blockchain-Enabled SBOM and the AIBOM Future 2024

Study of non-reproducible builds in the Java ecosystem

Description: Build Reproducibility means that a software build always results in a bit-by-bit identical output provided the source code and build environment is also the exact same [1]. This property is a good safeguard against compromised build process threat [2] and hence it is an important safeguard for software supply chain security. In Java ecosystem, Reproducible Central attempts to reproduce Maven/Gradle/sbt artifacts on Maven Central. It does so by building the artifact from source and then comparing it with the artifact in Maven registry. If it is bit-by-bit identical, then the maven package is said to be reproducible, else the package is non-reproducible. In this thesis, you will create a taxonomy of reasons for non-reproducible builds of Maven packages.

  1. https://reproducible-builds.org/

  2. AROMA: Automatic Reproduction of Maven Artifacts

Diverse-double compilation for Java

Contact: Martin Monperrus

Description: Java is a key programming language for enterprise applications. As such, the Java compiler is an ideal target for a trusting trust attack. This thesis aims to investigate the feasibility of diverse-double compilation (DDC) to mitigate this problem You will design, implement and evaluate DDC for Java.

  1. Reflections on Trusting Trust

  2. Countering trusting trust through diverse double-compiling

  3. Diverse Double-Compiling to Harden Cryptocurrency Software (Master's thesis KTH 2023)

(a related crazy idea is to do diverse-double compilation for a JIT compiler)

Diverse-double compilation in a CI/CD Pipeline

Description: C is a fundamental programming language for system-level software. Given its widespread use, the C compiler is a prime target for trusting trust attacks. This thesis aims to explore the systematic use of diverse-double compilation (DDC) in a modern Continuous Integration/Continuous Deployment (CI/CD) pipeline. You will design, implement and evaluate DDC in a CI/CD environment.

  1. Reflections on Trusting Trust

  2. Countering trusting trust through diverse double-compiling

  3. Diverse Double-Compiling to Harden Cryptocurrency Software (Master's thesis KTH 2023)

Dynamic Integrity Verification & Repair for Java Applications

Contact: Martin Monperrus

Description: Attackers constantly try to tamper with the code of software applications in production. Chang and Attalah have proposed a technique to not only detect modifications and also repairing the code after attacks by a network of small security units called guards. These guards can be programmed to perform tasks such as checksumming the program code, and they work in concert to create mutual protection. In this thesis, you will devise, implement and evaluate such as an approach in the context of modern Java software with dependencies. An open question is how to set up guard inside or around dependency code.

  1. Protecting Software Code by Guards

  2. Reflection as a mechanism for software integrity verification

  3. Practical integrity protection with oblivious hashing

Dynamic Introspection of Dependencies in Java Applications

Description: We aim to design and develop a prototype for dynamic introspection of dependencies in Java applications. This would enable real-time tracking and decision based on the dependency execution context. By leveraging Java's instrumentation capabilities, the proposed system will monitor and identify the active dependencies at any given point during program execution. The focus will be on minimizing performance overhead to ensure that the introspection process does not significantly impact the application's responsiveness or efficiency, while integrating seamlessly with any existing Java application. Rigorous evaluation against various benchmarks will be one to assess its accuracy, performance, and usability.

Automatic Backporting of Java Libraries to Older Bytecode Versions

Contact: Aman Sharma

Description: With the rapid evolution of Java, libraries often get updated to new bytecode versions. This causes compatibility issues and breakages for applications that are still running on older versions of Java. To address this, a possible solution is to automatically backport Java libraries to older bytecode versions. This thesis will focus on designing and implementing an automated tool for backporting Java libraries. The tool should be capable of translating new bytecode instructions to their older equivalents, maintaining the functional behavior of the library while ensuring compatibility with older Java versions. An open question is how to handle new language features and APIs that do not have direct equivalents in older versions.

  1. Back to the past–analysing backporting practices in package dependency networks

  2. Recommending code changes for automatic backporting of Linux device drivers

  3. Transforming C++11 Code to C++03 to Support Legacy Compilation Environments