We are pleased to announce the initial release of OpenLineage. This is the culmination of a broad community effort, and establishes a common framework for data lineage collection and analysis.
We want to thank all the contributors as well all the projects and companies involved in the design (in alphabetical order): Airflow, Astronomer, Datakin, Data Mesh, dbt, Egeria, GetInData, Great Expectations, Iceberg (and others that I am probably forgetting).
This release includes:
- The initial 1-0-0 release of the OpenLineage specification
- A core lineage model of Jobs, Runs and Datasets
- Core facets
- Data Quality Metrics and statistics
- Dataset schema
- Source code location
- SQL
- Clients that send OpenLineage events to an HTTP backend
- Java
- Python
- Integrations that collect lineage metadata as OpenLineage events
- Apache Airflow with support for BigQuery, Great Expectations, Postgres, Redshift, Snowflake
- Apache Spark
- dbt
This is only the beginning. We invite everyone interested to consult and contribute to the roadmap. The roadmap currently contains, among other things: adding support for Kafka, BI dashboards, and column level lineage...but you can influence it by participating!
Follow the repo to stay updated. And, as always, you can join the conversation on Slack.