DataHub Roadmap
The DataHub Roadmap has a new home!
Please refer to the new DataHub Roadmap for the most up-to-date details of what we are working on!
If you have suggestions about what we should consider in future cycles, feel free to submit a feature request and/or upvote existing feature requests so we can get a sense of level of importance!
Historical Roadmap
This following represents the progress made on historical roadmap items as of January 2022. For incomplete roadmap items, we have created Feature Requests to gauge current community interest & impact to be considered in future cycles. If you see something that is still of high-interest to you, please up-vote via the Feature Request portal link and subscribe to the post for updates as we progress through the work in future cycles.
Q4 2021 [Oct - Dec 2021]
Data Lake Ecosystem Integration
- Spark Delta Lake - View in Feature Reqeust Portal
- Apache Iceberg - Included in Q1 2022 Roadmap - Community-Driven Metadata Ingestion Sources
- Apache Hudi - View in Feature Request Portal
Metadata Trigger Framework
View in Feature Request Portal
- Stateful sensors for Airflow
- Receive events for you to send alerts, email
- Slack integration
ML Ecosystem
- Features (Feast)
- Models (Sagemaker)
- Notebooks - View in Feature Request Portal](https://feature-requests.datahubproject.io/admin/p/jupyter-integration)
Metrics Ecosystem
View in Feature Request Portal
- Measures, Dimensions
- Relationships to Datasets and Dashboards
Data Mesh oriented features
- Data Product modeling
- Analytics to enable Data Meshification
Collaboration
View in Feature Reqeust Portal
- Conversations on the platform
- Knowledge Posts (Gdocs, Gslides, Gsheets)
Q3 2021 [Jul - Sept 2021]
Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- Support for data profiling and preview extraction through ingestion pipeline (column samples, not rows)
Data Quality
Included in Q1 2022 Roadmap - Display Data Quality Checks in the UI
- Support for data profiling and time-series views
- Support for data quality visualization
- Support for data health score based on data quality results and pipeline observability
- Integration with systems like Great Expectations, AWS deequ, dbt test etc.
Fine-grained Access Control for Metadata
- Support for role-based access control to edit metadata
- Scope: Access control on entity-level, aspect-level and within aspects as well.
Column-level lineage
Included in Q1 2022 Roadmap - Column Level Lineage
- Metadata Model
- SQL Parsing
Operational Metadata
- Partitioned Datasets - - View in Feature Request Portal
- Support for operational signals like completeness, freshness etc.
Q2 2021 (Apr - Jun 2021)
Cloud Deployment
- Production-grade Helm charts for Kubernetes-based deployment
- How-to guides for deploying DataHub to all the major cloud providers
- AWS
- Azure
- GCP
Product Analytics for DataHub
- Helping you understand how your users are interacting with DataHub
- Integration with common systems like Google Analytics etc.
Usage-Based Insights
- Display frequently used datasets, etc.
- Improved search relevance through usage data
Role-based Access Control
- Support for fine-grained access control for metadata operations (read, write, modify)
- Scope: Access control on entity-level, aspect-level and within aspects as well.
- This provides the foundation for Tag Governance, Dataset Preview access control etc.
No-code Metadata Model Additions
Use Case: Developers should be able to add new entities and aspects to the metadata model easily
- No need to write any code (in Java or Python) to store, retrieve, search and query metadata
- No need to write any code (in GraphQL or UI) to visualize metadata
Q1 2021 [Jan - Mar 2021]
React UI
- Build a new UI based on React
- Deprecate open-source support for Ember UI
Python-based Metadata Integration
- Build a Python-based Ingestion Framework
- Support common people repositories (LDAP)
- Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
- Support common transformation sources (dbt, Looker)
- Support for push-based metadata emission from Python (e.g. Airflow DAGs)
Dashboards and Charts
- Support for dashboard and chart entity page
- Support browse, search and discovery
SSO for Authentication
- Support for Authentication (login) using OIDC providers (Okta, Google etc)
Tags
Use-Case: Support for free-form global tags for social collaboration and aiding discovery
- Edit / Create new tags
- Attach tags to relevant constructs (e.g. datasets, dashboards, users, schema_fields)
- Search using tags (e.g. find all datasets with this tag, find all entities with this tag)
Business Glossary
- Support for business glossary model (definition + storage)
- Browse taxonomy
- UI support for attaching business terms to entities and fields
Jobs, Flows / Pipelines
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand lineage with datasets
- Support for Metadata Models + Backend Implementation
- Metadata Integrations with systems like Airflow.
Data Profiling and Dataset Previews
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- Support for data profiling and preview extraction through ingestion pipeline
- Out of scope for Q1: Access control of data profiles and sample data