Paper @ MSR2025

TerraDS: A Dataset for Terraform HCL Programs

David Spielmann presented his paper on Infrastructure from Code, one of the key technologies on which CAPE is based on:

TerraDS, the first large-scale dataset of Terraform (by HashiCorp) configurations is written in HCL, sourced exclusively from open-source repositories with permissive licenses to support reproducible research and tool development. Terraform is among the most established and widely adopted Infrastructure as Code (IaC) tools in use today. Yet, despite its popularity, there has been no comprehensive dataset to study real-world HCL programs at scale.

TerraDS fills this gap, collecting data from over 62,000 repositories, enriched with metadata and original HCL source code. As a case study, we used Checkov, a static analysis tool, to explore security issues in the dataset. For example, hundreds of IAM policies grant full administrative access, posing serious risks in real-world deployments. These insights show how TerraDS can serve as a foundation for improving tooling, analysis, and security in the IaC ecosystem.

 Paper: https://lnkd.in/e95CvGpc
 Dataset: https://zenodo.org/records/14217386

Share the Post:​

LinkedIn

Related Posts​

HiPEAC 2026

The HiPEAC’26 conference took place in Krakow from January 26th to 28th. At the CAPE booth close to the reception, welcomed many interested people.  We presented on our booth:  5 Posters

CAPE Update #4: embedded High Performance Server (eHPS)

Within the CAPE project, two hardware platforms supporting our concept of “Composable Infrastructure for scalable Edge Servers” are developed: The embedded High-Performance Server (eHPS) and the Embedded Micro Data Center

CAPE Update #3: Software Architecture

CAPE’s Software Architecture: Towards an Automated, Open, and Intelligent Edge-Cloud Continuum Written by: Milad Afzal (HIRO) Overview The Edge-Cloud landscape in Europe is transforming. Applications across energy, mobility, satellite data