AdaReasoner Logo

Dynamic Tool Orchestration for Iterative Visual Reasoning

1Fudan University, 2Tongji University, 3National University of Singapore, 4University of Washington, 5The Chinese University of Hong Kong
AdaReasoner Overview

AdaReasoner performs adaptive and generalized tool-using. The model learns to adopt beneficial tools, discard irrelevant ones, and modulate tool usage frequency based on task demands.

Video Demo

Abstract

While augmenting Multimodal Large Language Models (MLLMs) with tools is a promising direction, existing approaches face critical limitations. They often rely on single, atomic tools, lack support for multi-turn planning, and fail to equip models with the ability to select and coordinate effective tool combinations for complex tasks. To overcome these challenges, we introduce AdaReasoner, a new family of models designed for dynamic tool orchestration in iterative visual reasoning. Our paradigm integrates three key components: a scalable data curation methodology, a tailored Tool-GRPO optimization algorithm, and an adaptive learning mechanism. As a result, AdaReasoner achieves state-of-the-art performance, delivering substantial improvements over its base models (e.g., +24.9% on average for the 7B variant) and even surpassing strong proprietary models such as GPT-5 on challenging benchmarks like VSP and Jigsaw.

Key Features

🎯 Adaptive Tool Selection

Learns to autonomously adopt beneficial tools, discard irrelevant ones, and modulate usage frequency.

🔄 Multi-Turn Planning

Supports complex multi-turn tool interactions with reflection and backtracking capabilities.

🌐 Strong Generalization

Generalizes to unseen tools and novel tasks beyond training distribution.

Main Results

Main Results

AdaReasoner achieves substantial gains across diverse benchmarks, outperforming both base models and strong proprietary systems like GPT-5 and Claude Sonnet 4.

Method Overview

Method Overview

Our framework consists of three main stages: (a) Tool Cold Start (TC) with high-quality trajectory data curation, (b) Adaptive Learning for improved generalization, and (c) Multi-Turn Tool GRPO (TG) for reinforcement learning with tool interaction.

Qualitative Examples

Qualitative Examples

AdaReasoner-7B demonstrates advanced capabilities for multi-turn, tool-assisted reasoning and reflection.

Vision Toolset

Tool Description Type
Point Precise object localization Online
DRAW2DPATH Visualization and verification Offline
ASTAR Shortest path planning Offline
OCR Text recognition Online
CROP Region extraction Offline
DETECTBLACKAREA Missing region detection Offline
INSERTIMAGE Hypothesis testing Offline

BibTeX

@article{adareasoner2025,
  title={AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning},
  author={Song, Mingyang and Sun, Haoyu and Gu, Jiawei and Li, Linjie and Krishna, Ranjay and Cheng, Yu},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2025}
}