A state-of-the-art versatile data science agent

November 6, 2025

In-depth analysis of DS-STAR

Next, we conducted ablation studies to verify the effectiveness of DS-STAR’s individual components and analyze the impact of the number of refinement rounds, specifically by measuring the iterations required to generate a sufficient plan.

Data File Analyzer: This agent is essential for high performance. Without the descriptions it generates (Variant 1), DS-STAR’s accuracy on difficult tasks within the DABStep benchmark sharply dropped to 26.98%, underscoring the importance of rich data context for effective planning and implementation.

Router: The Router agent’s ability to determine if a new step is needed or to fix an incorrect step is vital. When we removed it (Variant 2), DS-STAR only added new steps sequentially, leading to worse performance on both easy and hard tasks. This demonstrated that it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps.

Generalizability Across LLMs: We also tested DS-STAR’s adaptability by using GPT-5 as the base model. This yielded promising results on the DABStep benchmark, indicating the framework’s generalizability. Interestingly, DS-STAR with GPT-5 performed better on easy tasks, while the Gemini-2.5-Pro version performed better on hard tasks.

MIT in the media: 2025 in review | MIT News

December 24, 2025

aitoolsadmin

“At MIT, innovation ranges from awe-inspiring technology to down-to-Earth creativity,” noted Chronicle, during a campus visit this year for an episode of the program. In 2025, MIT researchers made headlines across…

Artificial Intelligence

Evaluating Perplexity on Language Models

December 24, 2025

aitoolsadmin

Share Post Share A language model is a probability distribution over sequences of tokens. When you train a language model, you want to measure how accurately it predicts human language…

A state-of-the-art versatile data science agent

In-depth analysis of DS-STAR

Leave a Reply Cancel reply

MIT in the media: 2025 in review | MIT News

Evaluating Perplexity on Language Models

The Role of AI and Robotics in Improving Efficiency in Manufacturing

The Future of Work: How AI and Robotics are Changing Industries

The Impact of Artificial Intelligence and Robotics on Society

Advancements in AI and Robotics: What to Expect in the Coming Years

How to use Trello as a CRM

n8n pricing: Is it worth it?

Lindy vs. Zapier: Which is best? [2026]

December 24, 2025 – Rates Rise – Forbes Advisor

How to use Trello as a CRM

Lindy vs. Zapier: Which is best? [2026]