Data Council
Data Council
  • 915
  • 3 804 991
Building an Ecosystem for Open Foundation Models, Together
In this talk, Ce Zhang shares experiences in building the open source foundation model ecosystem through collaboration with the community. He delves into how balancing data quality, model architecture and infrastructure presents both opportunities and challenges. He also discusses navigating the extensive scale and cost of GPU clusters and optimizing their usage. Most importantly, he explores how data quality can be reasoned about in a structured manner to boost model quality.
This video provides a unique perspective on managing technical issues in open source ecosystems and is a must-watch for those interested in understanding the behind-the-scenes of data science and AI development.
👉 Sign up for our "No BS" Newsletter to get the latest technical data & AI content: hubs.li/Q02vz6xC0
#opensource #gpu #dataquality
ABOUT DATA COUNCIL:
Data Council brings together the brightest minds in data to share industry knowledge, technical architectures and best practices in building cutting edge data & AI systems and tools.
FIND US:
Twitter: datacouncilai
LinkedIn: www.linkedin.com/company/datacouncil-ai/
Website: www.datacouncil.ai/
Переглядів: 255

Відео

Stochastic | AI Launchpad '24
Переглядів 2502 місяці тому
Stochastic is an end-to-end AI platform for enterprise knowledge work that provides personalized AI agents with zero setup or coding. ABOUT THE SPEAKER: Glenn Ko, Co-founder & CEO, Stochastic AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference i...
sea.dev | AI Launchpad '24
Переглядів 2602 місяці тому
sea.dev is breaking the constraints of existing data systems and NL2SQL with graph-based tools to allow LLM apps to reliably act on fintech data ABOUT THE SPEAKERS: Matt Arderne, Co-founder, sea.dev Marya Bazzi, Co-founder, sea.dev Vladimirs Murevics, Co-founder, sea.dev AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on sta...
Phaselab | AI Launchpad '24
Переглядів 872 місяці тому
Phaselab builds smart automation to make companies’ data privacy programs more effective and efficient. ABOUT THE SPEAKER: Josh Schwartz, Co-founder & CEO, Phaselab AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Austin 2024. 👉 Sign up fo...
Parea | AI Launchpad '24
Переглядів 2782 місяці тому
Parea builds developer tools for evaluating, testing and monitoring LLM-powered applications. ABOUT THE SPEAKER: Joel Alexander, Co-founder, Parea AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Austin 2024. 👉 Sign up for our “No BS” News...
InQuery | AI Launchpad '24
Переглядів 2362 місяці тому
InQuery simplifies data lakehouse maintenance, saving your data team time and money. ABOUT THE SPEAKERS: Erick Enriquez, Co-founder & CEO, InQuery Khalil Miri, Co-founder & CTO, InQuery AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Aust...
Dataland | AI Launchpad '24
Переглядів 1992 місяці тому
Dataland is the AI-powered internal tools platform. It is the easiest way to deliver high-quality internal tools to your business users ABOUT THE SPEAKER: Arthur Wu, Co-founder, Dataland AI LAUNCHPAD: Data Council Zero Prime Ventures partnered to give six AI-first startups a chance to present brief demos on stage to top investors and elite founders during Data Council's annual conference in Aus...
Rising Tides with Radical Transparency: Why and How to Open Source Your Data Platform
Переглядів 1252 місяці тому
Join Tim Castillo from Dagster Labs for an insightful journey into how their data platform became successfully open-sourced. Discover the hurdles, cultural shifts and innovative implementations behind this strategic decision. Data engineers, analytics engineers and data platform engineers - learn how to leverage open source to enhance your projects and contribute to the data community. 👉 Sign u...
Case Studies from a Methodologist on an Experimentation Platform
Переглядів 3082 місяці тому
Dive into the world of A/B testing with Microsoft's Experimentation Platform Team. Join Laura Cosgrove for an exclusive tech talk where she uncovers the secrets behind Microsoft’s cutting-edge statistical evaluation and simulation frameworks. In this video, discover the power of Microsoft's variance reduction estimator and its game-changing impact on service efficacy. Ready to elevate your A/B ...
A 101 in Time Series Analytics with Apache Arrow, Pandas and Parquet
Переглядів 1 тис.2 місяці тому
Dive deep into the world of databases and analytics in this talk from Zoe Steinkamp of InfluxData. Learn how you can unleash the potential of Apache Arrow and Apache Parquet for efficient, scalable handling of time-series data. Equip your toolbox with cutting-edge open-source technologies and industry-standard analytics libraries to build the foundation of a high performance analytics applicati...
Unified Stream/Batch Execution with Ibis
Переглядів 5192 місяці тому
This talk is a deep dive exploration into the powerful world of Ibis, as Voltron Data showcases their recent work merging batch and streaming concepts and introducing an Apache Flink backend. This comprehensive tutorial will provide you with invaluable insights for working with data across a variety of platforms. Watch the full video to explore the potential of a unified approach for both batch...
How Beam Uses Code-Based Dashboards to Scale Analytics Products
Переглядів 3172 місяці тому
In this talk, Emilio Tamez unravels the magic behind dashboards-as-code. From Python scripts to modular design, Beam is breaking down the barriers between complexity and simplicity. The dashboards-as-code methodology has allowed Beam to incrementally approach their goals by building boilerplate dashboards as a series of code-defined, standardized modules which can be arranged into a dashboard i...
Building Responsible and Trustworthy Generative AI Products at LinkedIn
Переглядів 5342 місяці тому
Dive into the heart of LinkedIn's commitment to ethical AI development, where revolutionary Generative AI meets responsibility. Listen in to this insightful exploration as Daniel Olmedilla unveils the foundational principles and architecture guiding LinkedIn's AI journey. With a special focus on their cutting-edge Generative AI products and features, this talk gives an exclusive look into Linke...
What Makes for an Effective Data Practitioner in 2024?
Переглядів 4062 місяці тому
Listen in as Marck Vaisman shares insights from his years of experience and demystifies the complexities of the data practitioner role, while providing a roadmap for skill development across all levels. Whether you're a seasoned leader aiming to upskill your team or a novice stepping into the realm of data, this video offers valuable guidance to propel your career in the right direction. 👉 Sign...
Is Kubernetes a Database?
Переглядів 5012 місяці тому
Uncover how Kubernetes extends beyond stateless apps and now supports stateful workloads and database management with Custom Resources. In this video, discover the potential to eliminate traditional databases by transforming the Kubernetes API into a potent database and metastore. Don't miss this chance to learn how leveraging Kubernetes can revolutionize your tech projects. 👉 Sign up for our "...
How Developers Should Think About the Emerging AI Stack | Together, Pinecone, Anthropic
Переглядів 5722 місяці тому
How Developers Should Think About the Emerging AI Stack | Together, Pinecone, Anthropic
From Playgrounds to Production: The Evolution of AI Evaluation at Coda
Переглядів 982 місяці тому
From Playgrounds to Production: The Evolution of AI Evaluation at Coda
Events Sourcing with Kafka at Scale
Переглядів 1482 місяці тому
Events Sourcing with Kafka at Scale
Creating a Competitive Advantage in the Age of Intelligence as a Service
Переглядів 1052 місяці тому
Creating a Competitive Advantage in the Age of Intelligence as a Service
Build Faster, More Responsive Analytics with a Semantic Layer | Cube Workshop
Переглядів 2762 місяці тому
Build Faster, More Responsive Analytics with a Semantic Layer | Cube Workshop
Streaming CDC data from PostgreSQL to Snowflake, challenges and solutions
Переглядів 4562 місяці тому
Streaming CDC data from PostgreSQL to Snowflake, challenges and solutions
OttoBot: Productionizing LLM Models
Переглядів 1452 місяці тому
OttoBot: Productionizing LLM Models
Building a User-Level Targeting Platform
Переглядів 1362 місяці тому
Building a User-Level Targeting Platform
Data Culture 2.0: Leveraging AI to Build Human Connections and Expand Your Influence
Переглядів 972 місяці тому
Data Culture 2.0: Leveraging AI to Build Human Connections and Expand Your Influence
Beyond Kafka: Cutting Costs and Complexity with WarpStream and S3
Переглядів 2622 місяці тому
Beyond Kafka: Cutting Costs and Complexity with WarpStream and S3
Ten Years of Building Open Source Standards
Переглядів 2482 місяці тому
Ten Years of Building Open Source Standards
Move Fast and Don't Break Things -- How to Build a Data Platform that Scales with your Organization
Переглядів 3102 місяці тому
Move Fast and Don't Break Things How to Build a Data Platform that Scales with your Organization
Redefining Database Workloads: The Future with Modern Object Storage
Переглядів 962 місяці тому
Redefining Database Workloads: The Future with Modern Object Storage
Beyond MLOps: Building AI systems with Metaflow
Переглядів 6272 місяці тому
Beyond MLOps: Building AI systems with Metaflow
How to Align AI Capabilities with Product Strategy so You Can Innovate
Переглядів 2162 місяці тому
How to Align AI Capabilities with Product Strategy so You Can Innovate

КОМЕНТАРІ

  • @Anhar001
    @Anhar001 7 годин тому

    all this jank just to solve the issue which is basically Python. Just write a fully statically compiled binary and shove that on a NFS, then just use rsync between dev machines and NFS. Have a shell script watch binary file changes and relaunch when file is changed. Look ma, I just replaced entire solid with a few bash scripts 😂

  • @jimshtepa5423
    @jimshtepa5423 16 годин тому

    10:55 what's wrong with uzbekistan?))))

  • @krishnapraveen777
    @krishnapraveen777 21 годину тому

    Chad engineer

  • @hemantishwaran5741
    @hemantishwaran5741 День тому

    It’s great for ggplot and webpages. But if you ever write a textbook go straight to latex from the command line.

  • @malware_creations2606
    @malware_creations2606 День тому

    Also I've read the Kafka has an issue with consumer lag. How do you handle those ?

  • @zuowang5185
    @zuowang5185 11 днів тому

    Is there an updated version of the logging pipeline 4 years later?

  • @bluejinux
    @bluejinux 14 днів тому

    One of the best presentations on what purpose of data warehouse and data lakehouse and where the future is going for data.

  • @randomhandle307
    @randomhandle307 17 днів тому

    Very nice. Thanks

  • @AndreaMontes_
    @AndreaMontes_ 20 днів тому

    I'm rewatching this talk, the speaker is quite good. Taking some notes to prepare my own talk

  • @hannahnelson4569
    @hannahnelson4569 25 днів тому

    Very cool talk! The idea of learning hueristics was very cool! I didn't quite understand how the criterion for splitting down multiple paths! I will check out the source code! Thank you for hosting this talk!

  • @fb-gu2er
    @fb-gu2er 28 днів тому

    Backend in Python? Yikes

  • @guykerem7874
    @guykerem7874 29 днів тому

    One of the best talks on data in 2024. Thank you Abhi! You never miss a chance to inspire and impress

  • @tessafelice2181
    @tessafelice2181 Місяць тому

    I love the name mother duck. I feel it’s a respectful tribute to the female source of life and code.

  • @CreativeInspireP380
    @CreativeInspireP380 Місяць тому

    This was an extremely informative talk - especially the section on challenges - and one I wish would receive more attention due to how useful it is as an overview to quite a few complex and highly relevant issues. It would be nice if it were re-elaborated and presented in a non-live presentation format.

  • @the-ghost-in-the-machine1108
    @the-ghost-in-the-machine1108 Місяць тому

    thanks

  • @nosh3019
    @nosh3019 Місяць тому

    Great talk 🎉

  • @jayleejw1801
    @jayleejw1801 Місяць тому

    The amount of background noise in this video is absurd.

  • @tratkotratkov126
    @tratkotratkov126 Місяць тому

    Great, very much needed and promising project ! However, it is not quiet clear what do you mean when you are talking about data versioning (DV) - do you version the data as LakeFS does or you are just versioning the source code which is producing this data. Also the diagrams in the presentation (Virtual/Physical layers) I find confusing and not easy to grasp at first glance. It will be nice in the next iteration if you use some real world/practical entities to describe demo objects like customer, product, sales etc. instead of just “source” and wrap the demo in some quick story like “Meet Alex, the data engineer at TechCorp, a rapidly growing tech company. Alex is responsible for managing the company’s data pipelines, ensuring that data from various sources is clean, consistent, and available for analysis” etc. you got the idea. Finally I would suggest you switch the sequence and the time you spend on the theory and the demo part - show your fantastic open source project demo first and how easy is implementing the 3 concepts in meaningful story then after each segment just mention the theoretical part, but don’t allow the theory to consume 75% of your presentation unless you want to be considered as one of the many Data Governance “gurus” which are presenting on this channel. Whishing you all good luck with this fantastic project !

  • @LucasCardoso-mw4ok
    @LucasCardoso-mw4ok Місяць тому

    Hi! Nice video. I'm a little concerned about how I can get my development data from Copilot.

  • @KC53557
    @KC53557 Місяць тому

    A good example of not getting AI right is the creation of the Maga loon and Jan 6.

  • @68sahil56
    @68sahil56 Місяць тому

    30:29

  • @68sahil56
    @68sahil56 Місяць тому

    18:19

  • @VipulVaibhaw
    @VipulVaibhaw Місяць тому

    Fantastic talk!

  • @allthingsdata
    @allthingsdata Місяць тому

    Loved it.

  • @AshishKumar-ll2mt
    @AshishKumar-ll2mt Місяць тому

    Looks like this field never took off the way it should have

  • @yogeshbharadwaj6200
    @yogeshbharadwaj6200 Місяць тому

    Very nice demo..Tks..

  • @compilation_exe3821
    @compilation_exe3821 Місяць тому

    Amazing

  • @timothymcglynn1935
    @timothymcglynn1935 Місяць тому

    HI 👋

  • @HikarusVibrator
    @HikarusVibrator Місяць тому

    If someone can explain to me how you’re supposed to do a major version DB upgrade with a Debezium connector. It’s such an unbelievable pain that it’s a total dealbreaker. Unless I’m missing something

  • @Eriddoch
    @Eriddoch Місяць тому

    Dang, Miriah you are an AMAZING speaker, and as someone who works on data engineering systems but doesn't own them (MLOps), this is really valuable.

  • @420_gunna
    @420_gunna Місяць тому

    bullshit buzzwords "cognitive analytics" vomit and a saccharine exhortative tone "quantum computing + graphene + ai" come on

  • @paoloogr
    @paoloogr 2 місяці тому

    Nice talk! Thanks.

  • @ex-cursion
    @ex-cursion 2 місяці тому

    I loved this and wish there was more of it. Thank you! But as noted: 'invoice reconciliation is boring'. I feel like the survival of our species will pivot not on our curiosity, but on our capacity to constrain our desire for novelty enough to solve boring problems.

  • @matthewborn
    @matthewborn 2 місяці тому

    This is an excellent talk. Thank you, Abhi!

  • @malcolmgdavis
    @malcolmgdavis 2 місяці тому

    Pointer vs. Value discussion: Based on the Method vs. Function discussion, ADT should be strictly adhered to. Operations that modify the ADT are modeled as functions that take the old state as an argument and return the new state as part of the result. In other words, a function should enforce immutability. The ADT approach helps with concurrency, making the code cleaner and easier to read. As an API user, I shouldn't worry about the state changing when I pass a structure. Of course, the pure ADT model's problem is memory consumption. That's why ADT models are generally implemented in VMs that can routinely find old structures without references and remove them from memory.

  • @malcolmgdavis
    @malcolmgdavis 2 місяці тому

    The method vs. function debate is absurd. The presenter needs to learn or spend time with OO programming. Class methods don't have to be logically connected to states. I developed in C during the 80s. The problem with structs is that the data is the point of coupling. The class hides data. In OO, the focus is on behavior and not the state. The OO state can be anywhere and can change. The strategy allows the implementation of the module to be changed without disturbing the client programs.

  • @1988YUVAL
    @1988YUVAL 2 місяці тому

    Very interesting presentation. Looks like a very well thought out solution for managing data transformations. I wonder if it will take off like dbt.

  • @Jack-lg9mq
    @Jack-lg9mq 2 місяці тому

    Good presentation. Also nice to see that Jimmi Simpson is expanding his horizons.

  • @mattbahr228
    @mattbahr228 2 місяці тому

    Awesome presentation!

  • @wonlee4138
    @wonlee4138 2 місяці тому

    Thanks for the great presentation!

  • @prashant776
    @prashant776 2 місяці тому

    Really good and informative. I congratulate PeerDB for their recent seed round secured . I see there is a lot of potential in PeerDB where organisations are looking to stream their data to warehouse. I have had a very unique need , I wish PeerDB was a wonderful choice back then.

  • @AndreaMontes_
    @AndreaMontes_ 2 місяці тому

    Great speaker 👏👏

  • @thrawn01
    @thrawn01 2 місяці тому

    This was super useful, I learned a lot, Thank you!

  • @IbraheemFaiq
    @IbraheemFaiq 2 місяці тому

    Great

  • @samhughes1747
    @samhughes1747 2 місяці тому

    I really enjoyed this. It was high-level, but hey, a hype-free, facts-only talk about working with generative models? I'll take it!

  • @Shikara_Animals
    @Shikara_Animals 2 місяці тому

    Best teacher ❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤❤

  • @VijayasarathyMuthu
    @VijayasarathyMuthu 2 місяці тому

    You should include LightDash

  • @whatSriBishnusRajDharmaN-ek1hl
    @whatSriBishnusRajDharmaN-ek1hl 2 місяці тому

    mother chods what doing here canot learn me detect leran mine concern your life risk at usa houston

  • @clarkylifehacks8220
    @clarkylifehacks8220 2 місяці тому

    This is great. Not the same context (not data), but I do 3 of the 4 roles under incident management, it can get messy!

  • @HwansungMedicalCharitySe-pn4vf
    @HwansungMedicalCharitySe-pn4vf 2 місяці тому

    Beautiful topic, ugly tune. Reason for less likes. Suggestion: improve your tune and try to relax and be calmed.