Schema-on-read != schema-no-need
With the rising popularity of semi-structured data, schema-on-read also entered the lexicon of big data. Schema-on-read is the idea that, unlike in relational modeling, the schema definition for semi-structured data can be delayed until long after the data has been loaded into the data platform. Delaying this task means there are no bottlenecks within the ETL process for generating and ingesting semi-structured data. However, implicit in the design is that a knowable schema exists underneath the flexible semi-structured form.
In this section, we will learn how to query JSON data and infer details about its contents using SQL and Snowflake-native functions. Let’s begin by extracting some basic attributes for our pirate:
SELECT * FROM pirate_json;
Although we can query a table containing semi-structured data in a VARIANT
column, a simple SELECT *
statement does not return meaningful results, as you can see in the following figure: