Xiao et al: "Chain-of-Experts: When LLMs Meet Complex Operation Research", ICLR(2024).
- abstract modeling
- contains 37 instances collected from both industrial and academic scenarios
AhmadiTeshnizi et al: "OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models", arXiv preprint arXiv:2407.19633(2024).
- abstract modeling
- extends the number of instances to 269
Wang et al: "OptiBench: Benchmarking Large Language Models in Optimization Modeling with Equivalence-Detection Evaluation", ICLR(2025).
- abstract modeling
- offers a collection of 816 instances
Huang et al: "ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling", arXiv preprint arXiv:2405.17743(2024).
- covers a variety of problem types, including MIP and NIP
- features descriptions with or without tabular data
- suffers from quality con-trol issues, which result in a high error rate.
Huang et al: "Mamo: a Mathematical Modeling Benchmark with Solvers", arXiv preprint arXiv:2405.13144(2024).
- includes optimal variable information, offering additional perspectives for evaluating model correctness
- categorizes problems into three classes:EasyLP,ComplexLP and ODE.
Ramamonjison et al: "NL4Opt Competition: Formulating Optimization Problems Based on Their Natural Language Descriptions", NeurIPS(2022).
- primarily focuses on simple optimization modeling problems.
- the first optimization modeling benchmark proposed in acompetition
- features a test set of 289 instances.
Yang et al: "OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling", ICML(2024).
- introduces a com-prehensive framework that applies multiple filters to remove erroneous cases
- expands the test set to 605 instances.
Parashar et al: "WIQOR: A dataset for what-if analysis of Operations Research problems", ICLR(2025).
- employs what-if analysis to assess performance