ICML 2025 Workshop on


DataWorld: Unifying Data Curation Frameworks Across Domains





Overview

One area that remains relatively underexplored is how data-centric methods perform across different data modalities, domains, and downstream applications. A key question is: which lessons can be shared across these settings, and which are domain-specific? Understanding these nuances is critical for building data-centric methods that are robust, efficient, and adaptable across domains.

This ICML 2025 workshop aims to bring together researchers and practitioners to bridge the gap between domain-specific data-centric approaches and identify generalizable principles. Participants will explore theoretical frameworks, empirical findings, and practical tools that enable effective knowledge transfer across the diverse landscape of data-centric AI.

📅 Workshop Date: Saturday, July 19     📍 Room: Meeting Room 208-209

Call for Papers

We invite submissions from researchers in the field of data-centric ML. Our topics of interest include, but are not limited to:

  • Domain-specific data issues: Challenges and best practices in data curation across modalities and domains.
  • Human-in-the-loop: Standards and trade-offs in annotation quality, crowd-sourcing vs. expert labeling.
  • Data & Society: Ethical sourcing, privacy, fairness, and adversarial risks in real-world datasets.
  • Benchmarks & evaluation: Rigorous evaluation of data pipelines and generalizable data quality metrics.

Non-Archival: This workshop will not have formal proceedings.

Submission URL: Please submit your work via OpenReview. To help maintain the quality of the review process, we kindly ask you to nominate a potential reviewer by providing their email address in the OpenReview submission.

Length and Formatting: Submitted papers must be between 4 - 9 pages in PDF format using the ICML 25 Style Files including figures and tables. Authors are permitted to upload unlimited supplementary materials and references with their submissions. We will use a double-blind review process.

Important Dates:

  • Paper Submission Deadline: May 24
  • Author Notification: June 17
  • Camera-ready Deadline: June 27

We are sponsored by DatologyAI and will be awarding prizes to the best submissions!

  • 🥇 $1,000 Best Paper Award
  • 🥈 $250 Honorable Mention (×2)

If you have any questions, please send us an email at data_world@googlegroups.com


Schedule

Morning Session


Time Type of Event Speakers
08:55 - 09:05 Opening Remarks Organizers
09:05 - 09:35 Invited Talk Pang Wei Koh
09:35 - 10:05 Invited Talk Alex Dimakis
10:05 - 11:20 Poster Session Accepted poster presenters
11:20 - 11:30 Coffee Break
11:30 - 12:15 Panel Invited panelists & moderator
12:15 - 12:45 Oral Presentations (2 papers) Accepted oral presenters
12:45 - 13:30 Lunch Break

Afternoon Session


13:30 - 14:00 Invited Talk Aditi Raghunathan
14:00 - 14:30 Invited Talk Ari Morcos
14:30 - 15:00 Oral Presentations (2 papers) Accepted oral presenters
15:00 - 15:10 Coffee Break
15:10 - 16:20 Poster Session Accepted poster presenters
16:20 - 16:50 Invited Talk James Zou
16:50 - 17:00 Closing Remarks Organizers
 

To ensure the accessibility of our workshop for virtual attendees, we will stream all presentations and facilitate questions from online attendees via TBD.

Invited Speakers




Pang Wei Koh

University of Washington

Alex Dimakis

UC Berkeley, Bespoke Labs

Ari Morcos

DatologyAI

Aditi Raghunathan

Carnegie Mellon University

James Zou

Stanford University

Panelists




Irene Chen

UC Berkeley

Liam Parker

Polymathic AI

Priya L. Donti

Massachusetts Institute of Technology

Workshop Organizers




Sara Beery

Massachusetts Institute of Technology

Benjamin Feuer

New York University

Neha Hulkund

Massachusetts Institute of Technology

Thao Nguyen

University of Washington, Meta AI Research

Sewoong Oh

University of Washington



Ludwig Schmidt

Stanford University, Anthropic

Serena Yeung-Levy

Stanford University

Yuhui Zhang

Stanford University

Niv Cohen

New York University

Sponsored by