intercode-benchmark.github.io - InterCode

Description: InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

Example domain paragraphs

Build interactive code environments for training, testing, and augmenting code and decision making agents

Overview of InterCode . Setting up an interactive code environment with InterCode requires a Dockerfile, dataset, reward function definition, and a small amount of subclass implementation. The interactive loop between agent and environment closely mirrors real world software development processes.

Humans write code in a fundamentally interactive manner and rely on constant execution feedback to correct errors, resolve ambiguities, and decompose tasks. While LLMs have recently exhibited promising coding capabilities, current coding benchmarks mostly consider a static instruction-to-code sequence transduction process, which has the potential for error propagation and a disconnect between the generated code and its final execution environment. To address this gap, we introduce InterCode, a lightweight,

Links to intercode-benchmark.github.io (3)

ysymyth.github.io About – Shunyu Yao – 姚顺雨
os-world.github.io OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
aksh555.github.io Akshara Prabhakar