A DataComp and Bespoke Labs community effort to curate the best open reasoning datasets.
Our first goal is to curate a reasoning dataset to train state of the art small reasoning models that surpass DeepSeek-R1-Distill-32B and DeepSeek-R1-Distill-7B on math and code reasoning benchmarks.