ICML 2025 Workshop on


DataWorld: Unifying Data Curation Frameworks Across Domains





Overview

One area that remains relatively underexplored is how data-centric methods perform across different data modalities, domains, and downstream applications. A key question is: which lessons can be shared across these settings, and which are domain-specific? Understanding these nuances is critical for building data-centric methods that are robust, efficient, and adaptable across domains.

This ICML 2025 workshop aims to bring together researchers and practitioners to bridge the gap between domain-specific data-centric approaches and identify generalizable principles. Participants will explore theoretical frameworks, empirical findings, and practical tools that enable effective knowledge transfer across the diverse landscape of data-centric AI.

Call for Papers

We invite submissions from researchers in the field of data-centric ML. Our topics of interest include, but are not limited to:

  • Domain-specific data issues: Challenges and best practices in data curation across modalities and domains.
  • Human-in-the-loop: Standards and trade-offs in annotation quality, crowd-sourcing vs. expert labeling.
  • Data & Society: Ethical sourcing, privacy, fairness, and adversarial risks in real-world datasets.
  • Benchmarks & evaluation: Rigorous evaluation of data pipelines and generalizable data quality metrics.

Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.

Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the XXX Style Files or ZZZ Style Files including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.

Important Dates:

  • Paper Submission Deadline: TBD
  • Author Notification: TBD
  • Camera-ready Deadline: TBD

If you have any questions, please send us an email at data_world@googlegroups.com


Schedule

Morning Session


Time Type of Event Speakers
08:55 - 09:05 Opening Remarks Organizers
09:05 - 09:35 Invited Talk Pang Wei Koh
09:35 - 10:05 Invited Talk Alex Dimakis
10:05 - 11:20 Poster Session Accepted poster presenters
11:20 - 11:30 Coffee Break
11:30 - 12:15 Panel Invited panelists & moderator
12:15 - 12:45 Oral Presentations (2 papers) Accepted oral presenters
12:45 - 13:30 Lunch Break

Afternoon Session


13:30 - 14:00 Invited Talk Aditi Raghunathan
14:00 - 14:30 Invited Talk Ari Morcos
14:30 - 15:00 Oral Presentations (2 papers) Accepted oral presenters
15:00 - 15:10 Coffee Break
15:10 - 16:20 Poster Session Accepted poster presenters
16:20 - 16:50 Invited Talk James Zou
16:50 - 17:00 Closing Remarks Organizers
 

To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via ZZZ.

Invited Speakers




Pang Wei Koh

University of Washington

Alex Dimakis

UT Austin, Bespoke Labs

Ari Morcos

DatologyAI

Aditi Raghunathan

Carnegie Mellon University

James Zou

Stanford University

Panelists




Irene Chen

UC Berkeley

Workshop Organizers




Sara Beery

Massachusetts Institute of Technology

Benjamin Feuer

New York University

Neha Hulkund

Massachusetts Institute of Technology

Thao Nguyen

University of Washington, Meta AI Research

Sewoong Oh

University of Washington, Google



Ludwig Schmidt

Stanford University, Anthropic

Serena Yeung-Levy

Stanford University

Yuhui Zhang

Stanford University

Niv Cohen

New York University