How Partition Access Visualizations Reduced our Data Lake S3 Cost by 33%
-
Nick Del Nano, Data Streaming
- May 21, 2026
Introduction In large analytics environments, data teams often struggle to answer deceptively simple questions, like who their stakeholders are and how their data is being used. At Yelp, we address this by visualizing access patterns, plotting time-based partition key values against access event timestamps. These visualizations reveal distinct usage signatures – ad hoc queries, daily batch jobs, and periodic backfills – allowing data owners to understand their stakeholders and use cases. This deeper insight into data usage has enabled high-impact platform initiatives including migrating thousands of tables to Apache Iceberg format and identifying storage efficiencies which reduced the cost of...