SCIENTIST - Research Agent

The SCIENTIST agent conducts research on the selected competition. It analyzes leaderboards, searches for relevant papers and notebooks, and summarizes strategies.

Role

Analyze leaderboard score distribution
Surface relevant notebooks and research
Summarize dataset characteristics
Recommend modeling approaches

Tools

Tool	Purpose
`analyze_leaderboard`	Summarize leaderboard stats
`get_kaggle_notebooks`	Find top notebooks for the competition
`analyze_data_characteristics`	Inspect dataset structure
`compute_baseline_estimate`	Estimate a baseline score
`kaggle_*` toolset	Kaggle API helper tools
`web_search` (builtin)	Retrieve papers and discussions
`memory` (builtin)	Shared notes (Anthropic only)

Basic Usage

from agent_k.agents.scientist import ScientistDeps, scientist_agent

competition = await kaggle_adapter.get_competition("titanic")

deps = ScientistDeps(
    http_client=http,
    platform_adapter=kaggle_adapter,
    competition=competition,
)

run_result = await scientist_agent.run(
    "Research the provided competition and summarize approaches",
    deps=deps,
)

output = run_result.output
print(output.recommended_approaches)

Dependencies

from dataclasses import dataclass, field
from typing import Any
import httpx

@dataclass
class ScientistDeps:
    """Dependencies for the SCIENTIST agent."""

    http_client: httpx.AsyncClient
    platform_adapter: PlatformAdapter
    competition: Competition
    leaderboard: list[LeaderboardEntry] = field(default_factory=list)
    research_cache: dict[str, Any] = field(default_factory=dict)

Output Model

class ResearchFinding(BaseModel):
    """Individual research finding."""

    category: str
    title: str
    summary: str
    relevance_score: float
    sources: list[str]

class LeaderboardAnalysis(BaseModel):
    """Leaderboard statistics summary."""

    top_score: float
    median_score: float
    score_distribution: str
    common_approaches: list[str]
    improvement_opportunities: list[str]

class ResearchReport(BaseModel):
    """Output from SCIENTIST research."""

    competition_id: str
    domain_findings: list[ResearchFinding]
    technique_findings: list[ResearchFinding]
    leaderboard_analysis: LeaderboardAnalysis | None
    recommended_approaches: list[str]
    estimated_baseline_score: float | None
    key_challenges: list[str]

Notes

The SCIENTIST output is converted into a simplified ResearchFindings object by the mission graph.
The memory tool is only available for Anthropic models.