@Nafisah Islam has joined the channel
@Antonio Moctezuma has joined the channel
@Joshua Wankowski has joined the channel
@Maciej Obuchowski has joined the channel
Hello, my team would like to contribute to the OpenLineage-Dagster integration work and wanted to start a public channel for general discussion on this topic.
Issue #489 is currently open for review and includes the proposal for the integration. As we proceed with the initial implementation, we’d appreciate feedback from the community to make sure the approach is reasonable and OpenLineage events are captured accurately.
Looking forward to more discussions. Thanks!
FYI, I reached out to the Dagster community and they replied to @Dalin Kim’s ticket: https://github.com/OpenLineage/OpenLineage/issues/489#issuecomment-1020718071
@Michael Robinson has joined the channel
Let me intro @Michael Robinson who among other things is looking for topics to speak about in the OpenLineage monthly meeting
Thanks, @Julien Le Dem. If anyone is interested in speaking at the next OL TSC meeting about their work on the Dagster integration, please reply here or message me. The integration is on the agenda for the upcoming https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting|meeting on 2/9 at 9 am PT. @Dalin Kim
*Thread Reply:* Hi Michael, While I’m currently going through internal review process before creating a PR, I can do a quick demo on the current OpenLineage sensor approach and get some initial feedback if that is okay.
*Thread Reply:* thanks for the demo!
*Thread Reply:* Thanks for the opportunity!
Hello, I created a pull request for the OpenLineage sensor work here. This initial work handles the basic lifecycles of Dagster jobs & ops (pipelines & steps), and more discussion will be needed to define how we can handle datasets. Hopefully, this PR is acceptable as an initial groundwork, and all feedback is appreciated. Thanks!
*Thread Reply:* Thanks for the PR! I'll take a look tomorrow.
*Thread Reply:* @Dalin Kim looks great! I approved it. One thing to do is to rebase on main and force-with-lease push: it appears that there are some conflicts.
*Thread Reply:* @Maciej Obuchowski Thank you for the review. Just had a small conflict in CHANGELOG, which has been resolved. For integration test, do you suggest a similar approach like airflow integration using flask?
*Thread Reply:* Also, I reached out to Sandy from Dagster for a review to make sure this is good from Dagster side of things.
*Thread Reply:* > For integration test, do you suggest a similar approach like airflow integration using flask? Yes, I think we can reuse that part. The most important thing is that we have real Dagster running "real" workloads.
*Thread Reply:* Documentation has been updated based on feedback from Yuhan from Dagster team, and I believe everything is set from my end.
*Thread Reply:* Great! Let's just get rid of this linting error and I'll merge it then.
*Thread Reply:* @Willy Lulciuc I think you configured it, am I right?
@Dalin Kim one more rebase, please 🙏 I've turned auto-merge on, but unfortunately I can't rebase your branch on fork.
*Thread Reply:* @Dalin Kim merged!
*Thread Reply:* @Maciej Obuchowski Awesome! Thank you so much for all your help!
One small question on ci - airflow integration test checks are stuck in the “expected” state. Is this expected or is there something I missed in the ci update?
*Thread Reply:* Sorry - we're not running integration tests for airflow on forks due to security reason. If you're not touching any Airflow files then it should not affect you at all 🙂
*Thread Reply:* In other words, it's just as expected.
*Thread Reply:* Thank you for the clarification
@Dominique Tipton has joined the channel
Hi all 👋
I have opened up an issue/proposal on getting datasets incorporated with the dagster integration. I would love to get some feedback and conversations going with the community on the proposed approach. Thanks!
FYI @Dalin Kim @Dominique Tipton Dagster 0.14.3 broke something, and unit tests started to fail. I've pinned version to 0.14.2 for now, but can you take a look?
*Thread Reply:* Thanks for letting us know. We’ll take a look and follow up.
*Thread Reply:* Just as an update on findings, it appears that this MR introduced a breaking change for the test helper function that creates a test EventLogRecord.
Here is PR to fix the failing tests with latest Dagster version.
@Ofek Braunstein has joined the channel
@David ROBERT has joined the channel
Hello, Dagster tests are failing on main: https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/2675/workflows/fdaf[…]704-ad9b-e33d2b1e956c/jobs/24743/parallel-runs/0/steps/0-105
I think something broke again with 0.14.5
*Thread Reply:* Thanks for the heads up. We’ll look into it and follow up
*Thread Reply:* Here is the PR to fix the error with the latest Dagster version
*Thread Reply:* Thanks! Merged.
*Thread Reply:* Thanks!
I love the pattern of parsing logs in this integration, so much more flexible compared to the Airflow integration
*Thread Reply:* That's definitely the advantage of the log-parsing method. The Airflow integration, especially the most recent version for Airflow 2.3+, has the advantage of being more robust when it comes to delivering lineage in real-time
*Thread Reply:* Thanks for the heads up on this integration!
*Thread Reply:* I suspect future integrations will move towards this pattern as well? Sorry, off-topic for this channel, but this was the first place I've seen this metadata collection method in this project.
*Thread Reply:* I would be curious to explore similar external executor integrations for Airflow, say for Papermill or Kubernetes (via the corresponding operators). Suppose one would need to pass job metadata through to the respective platform where logs are actually collected.
On Tuesday, the Dagster team will be showcasing Embedded ELT, a way to save tens of thousands of dollars on data movement tasks. Come catch the session with @Pedram (Dagster) https://dagster.io/events/embedded-elt-dec-2023