Data enters into Snowflake in many different modalities and frequencies. This often results in overly complex queries. Often people attempt to solve their data pipeline issues by creating views on views on views. However, this can turn into a chain of complexity that can also result in abstracting Snowflake’s optimization of the table. Snowflake will do it, but it has to break it down into processable chunks, which results in too many micro-partitions being scanned and slowed down queries. This is only one use case among others to be cautious about.

In this session, Rich Hathaway and Arkady Kleyner discuss a number of challenges that occur with complex data pipelines and how to take alternative approaches that don’t bog down the Snowflake optimizer.

You will learn: 

  • The architecture of a curated data pipeline
  • How to limit the view layers
  • Strategies in high churn data vs low churn
  • "Materialized views" the way Snowflake does them
  • Strategies on curation to a broad user community