os-world.github.io - OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Description: OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

human computer interaction (156) general-purpose agent (1)

Example domain paragraphs

Key statistics of OSWorld . The “Supp. tasks” refers to the Windows-based tasks, that could only be used after activation due to copyright restrictions.

Distribution of task instructions in OSWorld based on the app domains and operation types to showcase the content intuitively.

t=1.0, top-p=0.9

Links to os-world.github.io (8)