Paper @ MSR2025

TerraDS: A Dataset for Terraform HCL Programs

David Spielmann presented his paper on Infrastructure from Code, one of the key technologies on which CAPE is based on:

TerraDS, the first large-scale dataset of Terraform (by HashiCorp) configurations is written in HCL, sourced exclusively from open-source repositories with permissive licenses to support reproducible research and tool development. Terraform is among the most established and widely adopted Infrastructure as Code (IaC) tools in use today. Yet, despite its popularity, there has been no comprehensive dataset to study real-world HCL programs at scale.

TerraDS fills this gap, collecting data from over 62,000 repositories, enriched with metadata and original HCL source code. As a case study, we used Checkov, a static analysis tool, to explore security issues in the dataset. For example, hundreds of IAM policies grant full administrative access, posing serious risks in real-world deployments. These insights show how TerraDS can serve as a foundation for improving tooling, analysis, and security in the IaC ecosystem.

 Paper: https://lnkd.in/e95CvGpc
 Dataset: https://zenodo.org/records/14217386

Share the Post:​

LinkedIn

Related Posts​

CAPE at ECESCON17

The R&D department of IPTO, presented the CAPE project at ECESCON 17.  IPTO actively participated in the ECESCON conference, where it presented the European research project CAPE and the latest

CAPE article in Innovation News Network

Enjoy the news article “Composability for powerful edge computing” about CAPE published by the Innovation News Network (INN) in their issue 25 (digital technology news)  

CAPE Update #7: Use Case “Edge AI”

Written by Deepak Molly Mathew (Fraunhofer ITWM) In earlier CAPE updates, we introduced the main elements of CAPE: Edge Micro Data Centers (EMDCs) and embedded High Performance Servers (eHPS) as