TerraDS: A Dataset for Terraform HCL Programs
David Spielmann presented his paper on Infrastructure from Code, one of the key technologies on which CAPE is based on:
TerraDS, the first large-scale dataset of Terraform (by HashiCorp) configurations is written in HCL, sourced exclusively from open-source repositories with permissive licenses to support reproducible research and tool development. Terraform is among the most established and widely adopted Infrastructure as Code (IaC) tools in use today. Yet, despite its popularity, there has been no comprehensive dataset to study real-world HCL programs at scale.
TerraDS fills this gap, collecting data from over 62,000 repositories, enriched with metadata and original HCL source code. As a case study, we used Checkov, a static analysis tool, to explore security issues in the dataset. For example, hundreds of IAM policies grant full administrative access, posing serious risks in real-world deployments. These insights show how TerraDS can serve as a foundation for improving tooling, analysis, and security in the IaC ecosystem.
Paper: https://lnkd.in/e95CvGpc
Dataset: https://zenodo.org/records/14217386


