A state-of-the-art versatile data science agent


In-depth analysis of DS-STAR

Next, we conducted ablation studies to verify the effectiveness of DS-STAR’s individual components and analyze the impact of the number of refinement rounds, specifically by measuring the iterations required to generate a sufficient plan.

Data File Analyzer: This agent is essential for high performance. Without the descriptions it generates (Variant 1), DS-STAR’s accuracy on difficult tasks within the DABStep benchmark sharply dropped to 26.98%, underscoring the importance of rich data context for effective planning and implementation.

Router: The Router agent’s ability to determine if a new step is needed or to fix an incorrect step is vital. When we removed it (Variant 2), DS-STAR only added new steps sequentially, leading to worse performance on both easy and hard tasks. This demonstrated that it is more effective to correct mistakes in a plan than to keep adding potentially flawed steps.

Generalizability Across LLMs: We also tested DS-STAR’s adaptability by using GPT-5 as the base model. This yielded promising results on the DABStep benchmark, indicating the framework’s generalizability. Interestingly, DS-STAR with GPT-5 performed better on easy tasks, while the Gemini-2.5-Pro version performed better on hard tasks.

Leave a Reply

Your email address will not be published. Required fields are marked *