The Data Flowcast: Mastering Airflow for Data Engineering & AI
Astronomer
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airf...
Powering Finance With Advanced Data Solutions at Ramp with Ryan Delgado
Data is the backbone of every modern business, but unlocking its full potential requires the right tools and strategies. In this episode, Ryan Delgado, Director of Engineering at Ramp, joins us to explore how innovative data platforms can transform business operations and fuel growth. He shares insights on integrating Apache Airflow, optimizing data workflows and leveraging analytics to enhance customer experiences.Key Takeaways:(01:52) Data is the lifeblood of Ramp, touching every vertical in the company.(03:18) Ramp’s data platform team enables high-velocity scaling through tailored tools.(05:27) Airflow powers Ramp’s enterprise data warehouse integrations for advanced analytics.(07:55) Centralizing data in Snowflake simplifies storage and analytics pipelines.(12:08) Machine learning models at Ramp integrate seamlessly with Airflow for operational excellence.(14:11) Leveraging Airflow datasets eliminates inefficiencies in DAG dependencies.(17:22) Platforms evolve from solving narrow business problems to scaling organizationally.(18:55) ClickHouse enhances Ramp’s OLAP capabilities with 100x performance improvements.(19:47) Ramp’s OLAP platform improves performance by reducing joins and leveraging ClickHouse.(21:46) Ryan envisions a lighter-weight, more Python-native future for Airflow.Resources Mentioned:Ryan Delgado - https://www.linkedin.com/in/ryan-delgado-69544568/Ramp - https://www.linkedin.com/company/ramp/Apache Airflow - https://airflow.apache.org/Snowflake - https://www.snowflake.com/ClickHouse -https://clickhouse.com/dbt -https://www.getdbt.com/Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
--------
24:35
Exploring the Power of Airflow 3 at Astronomer with Amogh Desai
What does it take to go from fixing a broken link to becoming a committer for one of the world’s leading open-source projects? Amogh Desai, Senior Software Engineer at Astronomer, takes us through his journey with Apache Airflow. From small contributions to building meaningful connections in the open-source community, Amogh’s story provides actionable insights for anyone on the cusp of their open-source journey.Key Takeaways:(02:09) Building data engineering platforms at Cloudera with Kubernetes.(04:00) Brainstorming led to contributing to Apache Airflow.(05:17) Starting small with link fixes, progressing to Breeze development.(07:00) Becoming a committer for Apache Airflow in September 2023.(09:51) The steep learning curve for contributing to Airflow.(16:30) Using GitHub’s “good-first-issue” label to get started.(18:15) Setting up a development environment with Breeze.(22:00) Open-source contributions enhance your resume and career.(24:51) Amogh’s advice: Start small and stay consistent.(28:12) Engage with the community via Slack, email lists and meetups.Resources Mentioned:Amogh Desai -https://www.linkedin.com/in/amogh-desai-385141157/?originalSubdomain=in%20%20https://www.linkedin.com/company/astronomer/Astronomer -https://www.linkedin.com/company/astronomer/Apache Airflow GitHub Repository -https://github.com/apache/airflowContributors Quick Guide -https://github.com/apache/airflow/blob/main/CONTRIBUTING.rstBreeze Development Tool -https://github.com/apache/airflow/tree/main/dev/breezeThanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
--------
30:24
Using Airflow To Power Machine Learning Pipelines at Optimove with Vasyl Vasyuta
Data orchestration and machine learning are shaping how organizations handle massive datasets and drive customer-focused strategies. Tools like Apache Airflow are central to this transformation. In this episode, Vasyl Vasyuta, R&D Team Leader at Optimove, joins us to discuss how his team leverages Airflow to optimize data processing, orchestrate machine learning models and create personalized customer experiences.Key Takeaways:(01:59) Optimove tailors marketing notifications with personalized customer journeys.(04:25) Airflow orchestrates Snowflake procedures for massive datasets.(05:11) DAGs manage workflows with branching and replay plugins.(05:41) The "Joystick" plugin enables seamless data replays.(09:33) Airflow supports MLOps for customer data grouping.(11:15) Machine learning predicts customer behavior for better campaigns.(13:20) Thousands of DAGs run every five minutes for data processing.(15:36) Custom versioning allows rollbacks and gradual rollouts.(18:00) Airflow logs enhance operational observability.(23:00) DAG versioning in Airflow 3.0 could boost efficiency.Resources Mentioned:Vasyl Vasyuta -https://www.linkedin.com/in/vasyl-vasyuta-3270b54a/Optimove -https://www.linkedin.com/company/optimove/Apache Airflow -https://airflow.apache.org/Snowflake -https://www.snowflake.com/Datadog -https://www.datadoghq.com/Apache Airflow Survey -https://astronomer.typeform.com/airflowsurvey24Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
--------
24:11
Maximizing Business Impact Through Data at GlossGenius with Katie Bauer
Bridging the gap between data teams and business priorities is essential for maximizing impact and building value-driven workflows. Katie Bauer, Senior Director of Data at GlossGenius, joins us to share her principles for creating effective, aligned data teams. In this episode, Katie draws from her experience at GlossGenius, Reddit and Twitter to highlight the common pitfalls data teams face and how to overcome them. She offers practical strategies for aligning team efforts with organizational goals and fostering collaboration with stakeholders. Key Takeaways:(02:36) GlossGenius provides an all-in-one platform for beauty professionals.(03:59) Airflow orchestrates data and MLOps workflows at GlossGenius.(04:41) Focusing on value helps data teams achieve greater impact.(06:23) Aligning team priorities with company goals minimizes friction.(08:44) Building strong stakeholder relationships requires curiosity.(12:46) Treating roles as flexible fosters team innovation.(13:21) Adapting to new technologies improves effectiveness.(18:28) Acting like your time is valuable earns respect.(23:38) Proactive data initiatives drive strategic value.(24:20) Usage data offers critical insights into tool effectiveness.Resources Mentioned:Katie Bauer -https://www.linkedin.com/in/mkatiebauer/GlossGenius -https://www.linkedin.com/company/glossgenius/Apache Airflow -https://airflow.apache.org/DBT -https://www.getdbt.com/Cosmos -https://cosmos.apache.org/Apache Airflow Survey -https://astronomer.typeform.com/airflowsurvey24Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
--------
25:49
Optimizing Large-Scale Deployments at LinkedIn with Rahul Gade
Scaling deployments for a billion users demands innovation, precision and resilience. In this episode, we dive into how LinkedIn optimizes its continuous deployment process using Apache Airflow. Rahul Gade, Staff Software Engineer at LinkedIn, shares his insights on building scalable systems and democratizing deployments for over 10,000 engineers. Rahul discusses the challenges of managing large-scale deployments across 6,000 services and how his team leverages Airflow to enhance efficiency, reliability and user accessibility.Key Takeaways:(01:36) LinkedIn minimizes human involvement in production to reduce errors.(02:00) Airflow powers LinkedIn’s Continuous Deployment platform.(05:43) Continuous deployment adoption grew from 8% to a targeted 80%.(11:25) Kubernetes ensures scalability and flexibility for deployments.(12:04) A custom UI offers real-time deployment transparency.(16:23) No-code YAML workflows simplify deployment tasks.(17:18) Canaries and metrics ensure safe deployments across fabrics.(20:45) A gateway service ensures redundancy across Airflow clusters.(24:22) Abstractions let engineers focus on development, not logistics.(25:20) Multi-language support in Airflow 3.0 simplifies adoption.Resources Mentioned:Rahul Gade -https://www.linkedin.com/in/rahul-gade-68666818/LinkedIn -https://www.linkedin.com/company/linkedin/Apache Airflow -https://airflow.apache.org/Kubernetes -https://kubernetes.io/Open Policy Agent (OPA) -https://www.openpolicyagent.org/Backstage -https://backstage.io/Apache Airflow Survey -https://astronomer.typeform.com/airflowsurvey24Thanks for listening to The Data Flowcast: Mastering Airflow for Data Engineering & AI. If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.#AI #Automation #Airflow #MachineLearning
Acerca de The Data Flowcast: Mastering Airflow for Data Engineering & AI
Welcome to The Data Flowcast: Mastering Airflow for Data Engineering & AI — the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward.
Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems.
Podcast Webpage: https://www.astronomer.io/podcast/
Escucha The Data Flowcast: Mastering Airflow for Data Engineering & AI, The Good Robot y muchos más podcasts de todo el mundo con la aplicación de radio.net