SummaryIn this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.AnnouncementsHello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of businessInterviewIntroductionHow did you get involved in the area of data management?Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?How does that reconciliation relate to the practice of "master data management"What are the scaling challenges with the current set of practices for reconciling data?ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?What (if any) transformative capabilities do LLMs introduce?What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?When is ML/AI the wrong choice for data cleaning/reconciliation?What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?Contact InfoLinkedInParting QuestionFrom your perspective, what is the biggest gap in the tooling or technology for data management today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email
[email protected] with your story.LinksTamrMaster Data ManagementCERNLHCMichael StonebrakerConway's LawExpert SystemsInformation RetrievalActive LearningThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA