• At ngrok, the development of a robust data platform has been a significant undertaking, primarily managed by a single engineer, Christian Hollinger. This initiative aims to bridge the gap between traditional data engineering practices and the unique requirements of ngrok's infrastructure. The article outlines the journey of building this data platform, emphasizing transparency regarding the types of data stored and the engineering processes involved. Ngrok collects various data types, including customer account information, usage metrics, subscription details, and third-party support interactions. Importantly, the platform does not store the content of user traffic but focuses on metadata, ensuring user privacy. The data architecture has evolved from a utilitarian setup reliant on AWS tools to a more open-source-focused approach, utilizing technologies like Apache Iceberg, Dagster for orchestration, and Apache Flink for streaming data processing. The data engineering role at ngrok is distinct due to its integration with broader backend engineering tasks. The small team structure allows for close collaboration with subject matter experts who handle data modeling, resulting in a more holistic approach to data management. This contrasts with traditional data engineering roles, where dedicated teams often operate in silos. The article details the evolution of ngrok's data architecture, highlighting the transition from a basic system that relied heavily on AWS services to a more sophisticated setup that leverages self-hosted open-source tools. The previous architecture faced challenges such as expensive queries and a lack of proper data lineage and auditing capabilities. The current architecture addresses these issues by implementing tools like Dagster for orchestration and dbt for data modeling, enhancing the overall efficiency and reliability of data processing. Christian discusses specific technical challenges encountered during this transition, such as integrating data workflows within a Go monorepo and managing complex schemas between Airbyte and AWS Glue. The team has developed custom solutions to ensure compatibility and streamline data ingestion processes, including a post-processing step that normalizes schemas for better querying. The article also touches on the use of streaming data to combat abuse on the platform, illustrating how ngrok leverages its data to monitor and respond to potentially harmful activities. This proactive approach not only helps maintain the integrity of the service but also enhances user trust. In conclusion, the article serves as both a high-level overview and a detailed account of the challenges and solutions encountered in building ngrok's data platform. It invites readers to explore ngrok's data capabilities further and encourages potential candidates to consider joining the team to contribute to ongoing improvements and innovations in data engineering.