Dalin Kim (dalinkim@northwesternmutual.com)
2022-01-21 21:26:13

@Dalin Kim has joined the channel

Kevin Mellott (kevin.r.mellott@gmail.com)
2022-01-21 21:28:55

@Kevin Mellott has joined the channel

Nafisah Islam (nafisahislam@northwesternmutual.com)
2022-01-21 21:28:56

@Nafisah Islam has joined the channel

Antonio Moctezuma (antoniomoctezuma@northwesternmutual.com)
2022-01-21 21:28:56

@Antonio Moctezuma has joined the channel

Joshua Wankowski (joshuawankowski@northwesternmutual.com)
2022-01-21 21:28:56

@Joshua Wankowski has joined the channel

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-01-21 21:28:56

@Maciej Obuchowski has joined the channel

Julien Le Dem (julien@apache.org)
2022-01-21 21:28:56

@Julien Le Dem has joined the channel

Dalin Kim (dalinkim@northwesternmutual.com)
2022-01-21 22:17:56

Hello, my team would like to contribute to the OpenLineage-Dagster integration work and wanted to start a public channel for general discussion on this topic.

Issue #489 is currently open for review and includes the proposal for the integration. As we proceed with the initial implementation, we’d appreciate feedback from the community to make sure the approach is reasonable and OpenLineage events are captured accurately.

Looking forward to more discussions. Thanks!

Comments
2
👏 Eric Veleker
firas (firas.omrane.contact@gmail.com)
2022-01-22 17:08:23

@firas has joined the channel

Laurent Paris (laurent@datakin.com)
2022-01-24 11:49:36

@Laurent Paris has joined the channel

Julien Le Dem (julien@apache.org)
2022-01-24 20:40:19

FYI, I reached out to the Dagster community and they replied to @Dalin Kim’s ticket: https://github.com/OpenLineage/OpenLineage/issues/489#issuecomment-1020718071

Julien Le Dem (julien@apache.org)
2022-01-24 20:40:41

Thank you for getting this going @Dalin Kim!

Michael Robinson (michael.robinson@astronomer.io)
2022-02-04 15:53:24

@Michael Robinson has joined the channel

👋 Michael Robinson
Julien Le Dem (julien@apache.org)
2022-02-04 15:59:17

Let me intro @Michael Robinson who among other things is looking for topics to speak about in the OpenLineage monthly meeting

Michael Robinson (michael.robinson@astronomer.io)
2022-02-04 16:08:51

Thanks, @Julien Le Dem. If anyone is interested in speaking at the next OL TSC meeting about their work on the Dagster integration, please reply here or message me. The integration is on the agenda for the upcoming https://wiki.lfaidata.foundation/display/OpenLineage/Monthly+TSC+meeting|meeting on 2/9 at 9 am PT. @Dalin Kim

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-04 17:18:32

*Thread Reply:* Hi Michael, While I’m currently going through internal review process before creating a PR, I can do a quick demo on the current OpenLineage sensor approach and get some initial feedback if that is okay.

Julien Le Dem (julien@apache.org)
2022-02-09 13:04:24

*Thread Reply:* thanks for the demo!

👍 Dalin Kim
Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-10 22:37:06

*Thread Reply:* Thanks for the opportunity!

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-15 13:53:17

Hello, I created a pull request for the OpenLineage sensor work here. This initial work handles the basic lifecycles of Dagster jobs & ops (pipelines & steps), and more discussion will be needed to define how we can handle datasets. Hopefully, this PR is acceptable as an initial groundwork, and all feedback is appreciated. Thanks!

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-15 15:39:14

*Thread Reply:* Thanks for the PR! I'll take a look tomorrow.

👍 Dalin Kim
Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-16 09:37:27

*Thread Reply:* @Dalin Kim looks great! I approved it. One thing to do is to rebase on main and force-with-lease push: it appears that there are some conflicts.

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-16 11:59:49

*Thread Reply:* @Maciej Obuchowski Thank you for the review. Just had a small conflict in CHANGELOG, which has been resolved. For integration test, do you suggest a similar approach like airflow integration using flask?

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-16 12:03:08

*Thread Reply:* Also, I reached out to Sandy from Dagster for a review to make sure this is good from Dagster side of things.

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-16 12:09:12

*Thread Reply:* > For integration test, do you suggest a similar approach like airflow integration using flask? Yes, I think we can reuse that part. The most important thing is that we have real Dagster running "real" workloads.

👍 Dalin Kim
Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-17 00:45:37

*Thread Reply:* Documentation has been updated based on feedback from Yuhan from Dagster team, and I believe everything is set from my end.

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-17 05:25:16

*Thread Reply:* Great! Let's just get rid of this linting error and I'll merge it then.

https://app.circleci.com/pipelines/github/OpenLineage/OpenLineage/2204/workflows/cdb412e6-b41a-4fab-bc8e-d8bee71d051d/jobs/18963

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-17 11:27:41

*Thread Reply:* Thank you. I overlooked this when updating docstring. All should be good now.

One final question - should we make the dagster unit test job “required” in the ci and how can that be configured?

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-17 11:54:08

*Thread Reply:* @Willy Lulciuc I think you configured it, am I right?

@Dalin Kim one more rebase, please 🙏 I've turned auto-merge on, but unfortunately I can't rebase your branch on fork.

👍 Dalin Kim
Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-17 11:58:34

*Thread Reply:* Rebased and pushed

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-17 12:23:44

*Thread Reply:* @Dalin Kim merged!

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-17 12:24:28

*Thread Reply:* @Maciej Obuchowski Awesome! Thank you so much for all your help!

🙌 Maciej Obuchowski
Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-15 14:01:43

One small question on ci - airflow integration test checks are stuck in the “expected” state. Is this expected or is there something I missed in the ci update?

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-15 14:47:49

*Thread Reply:* Sorry - we're not running integration tests for airflow on forks due to security reason. If you're not touching any Airflow files then it should not affect you at all 🙂

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-02-15 14:48:23

*Thread Reply:* In other words, it's just as expected.

Dalin Kim (dalinkim@northwesternmutual.com)
2022-02-15 15:09:42

*Thread Reply:* Thank you for the clarification

Willy Lulciuc (willy@datakin.com)
2022-02-17 11:54:18

@Willy Lulciuc has joined the channel

Nicola Monger (nicola.monger@moonpig.com)
2022-02-17 16:52:44

@Nicola Monger has joined the channel

John Thomas (john@datakin.com)
2022-02-18 13:03:29

@John Thomas has joined the channel

Dominique Tipton (dominiquetipton@northwesternmutual.com)
2022-03-01 17:21:49

@Dominique Tipton has joined the channel

Dominique Tipton (dominiquetipton@northwesternmutual.com)
2022-03-01 17:36:02

Hi all 👋

I have opened up an issue/proposal on getting datasets incorporated with the dagster integration. I would love to get some feedback and conversations going with the community on the proposed approach. Thanks!

👍 Dalin Kim, Maciej Obuchowski
Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-03-04 06:59:34

FYI @Dalin Kim @Dominique Tipton Dagster 0.14.3 broke something, and unit tests started to fail. I've pinned version to 0.14.2 for now, but can you take a look?

Dalin Kim (dalinkim@northwesternmutual.com)
2022-03-04 09:21:58

*Thread Reply:* Thanks for letting us know. We’ll take a look and follow up.

Dalin Kim (dalinkim@northwesternmutual.com)
2022-03-04 12:32:26

*Thread Reply:* Just as an update on findings, it appears that this MR introduced a breaking change for the test helper function that creates a test EventLogRecord.

Dalin Kim (dalinkim@northwesternmutual.com)
2022-03-04 14:38:59

Here is PR to fix the failing tests with latest Dagster version.

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-03-04 15:35:01

Thanks! Merged.

Ofek Braunstein (ofekbraunshtein@gmail.com)
2022-03-06 12:35:36

@Ofek Braunstein has joined the channel

David ROBERT (david.robert.ext@louisvuitton.com)
2022-03-18 11:19:18

@David ROBERT has joined the channel

Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-03-21 11:11:09
Dominique Tipton (dominiquetipton@northwesternmutual.com)
2022-03-21 12:56:33

*Thread Reply:* Thanks for the heads up. We’ll look into it and follow up

Dominique Tipton (dominiquetipton@northwesternmutual.com)
2022-03-22 16:02:01

*Thread Reply:* Here is the PR to fix the error with the latest Dagster version

Comments
1
Maciej Obuchowski (maciej.obuchowski@getindata.com)
2022-03-23 08:20:41

*Thread Reply:* Thanks! Merged.

Dominique Tipton (dominiquetipton@northwesternmutual.com)
2022-03-23 09:49:53

*Thread Reply:* Thanks!

marc_pan (pxy0592@gmail.com)
2022-03-28 23:03:45

@marc_pan has joined the channel

Nico Ritschel (nico@antmoney.com)
2022-03-30 20:23:13

@Nico Ritschel has joined the channel

Orbit
2022-03-31 13:21:47

@Orbit has joined the channel

Nico Ritschel (nico@antmoney.com)
2022-03-31 14:18:06

I love the pattern of parsing logs in this integration, so much more flexible compared to the Airflow integration

John Thomas (john@datakin.com)
2022-03-31 15:59:33

*Thread Reply:* That's definitely the advantage of the log-parsing method. The Airflow integration, especially the most recent version for Airflow 2.3+, has the advantage of being more robust when it comes to delivering lineage in real-time

🙌 Nico Ritschel
Nico Ritschel (nico@antmoney.com)
2022-03-31 19:18:29

*Thread Reply:* Thanks for the heads up on this integration!

Nico Ritschel (nico@antmoney.com)
2022-03-31 19:20:10

*Thread Reply:* I suspect future integrations will move towards this pattern as well? Sorry, off-topic for this channel, but this was the first place I've seen this metadata collection method in this project.

Nico Ritschel (nico@antmoney.com)
2022-03-31 19:22:14

*Thread Reply:* I would be curious to explore similar external executor integrations for Airflow, say for Papermill or Kubernetes (via the corresponding operators). Suppose one would need to pass job metadata through to the respective platform where logs are actually collected.

Sudhir Rao (sudhir@zemosolabs.com)
2022-04-01 13:25:58

@Sudhir Rao has joined the channel

Fraser Marlow (fraser@elementl.com)
2023-12-02 21:43:01

On Tuesday, the Dagster team will be showcasing Embedded ELT, a way to save tens of thousands of dollars on data movement tasks. Come catch the session with @Pedram (Dagster) https://dagster.io/events/embedded-elt-dec-2023

dagster.io