About DataHub Schema History
Schema History is a valuable tool for understanding how a Dataset changes over time and gives insight into the following cases, along with informing Data Practitioners when these changes happened.
- A new field is added
- An existing field is removed
- An existing field changes type
Schema History uses DataHub's Timeline API to compute schema changes.
Schema History Setup, Prerequisites, and Permissions
Schema History is viewable in the DataHub UI for any Dataset that has had at least one schema change. To view a Dataset, a user must have the View Entity Page privilege, or be assigned to any DataHub Role.
Using Schema History
You can view the Schema History for a Dataset by navigating to that Dataset's Schema Tab. As long as that Dataset has more than one version, you can view what a Dataset looked like at any given version by using the version selector. Here's an example from DataHub's official Demo environment with the
Snowflake pets dataset.If you click on an older version in the selector, you'll be able to see what the schema looked like back then. Notice
the changes here to the glossary terms for the status
field, and to the descriptions for the created_at
and updated_at
fields.
In addition to this, you can also toggle the Audit view that shows you when the most recent changes were made to each field. You can active this by clicking on the Audit icon you see above the top right of the table.
You can see here that some of these fields were added at the oldest dataset version, while some were added only at this latest version. Some fields were even modified and had a type change at the latest version!
GraphQL
FAQ and Troubleshooting
What updates are planned for the Schema History feature?
In the future, we plan on adding the following features
- Supporting a linear timeline view where you can see what changes were made to various schema fields over time
- Adding a diff viewer that highlights the differences between two versions of a Dataset