Business Agent

AI agent for autonomous business decisions via MCTS and Bayesian inference

A simulated business environment where an AI agent autonomously makes weekly pricing, marketing budget, and inventory decisions. Designed as a controlled experimental laboratory to answer the question: which components of AI intelligence — inference, planning, and action generation — actually contribute to business decision performance, and under what conditions?

Inspired by the PlanU paper, the system uses Monte Carlo Tree Search with distributional (quantile) value estimates, Bayesian belief tracking via particle filters, and LLM-based action proposals via the Claude API.

Developed April to May 2026. Built iteratively with Claude Code.

Simplified architecture: each week the agent observes the environment, updates its beliefs via Bayesian inference, generates candidate actions from proposers, and selects an action via planning (Greedy or MCTS) using a forward model.

Architecture

Environment — A multi-product retail business simulator with constant-elasticity demand, Poisson noise, seasonal effects, cross-elasticities, and inventory with reorder lag, holding costs, and stockout penalties. Four progressive complexity phases.

Inference — Bayesian belief tracking over hidden demand parameters. Implementations include point estimation (MLE), grid approximation, particle filtering (bootstrap SMC with systematic resampling), and an oracle baseline. Uses Poisson log-likelihood with right-censoring for stockout weeks.

Planning — Action selection given beliefs and candidate proposals. A greedy 1-step planner, and a full MCTS implementation (~400 lines) with PUCT selection, quantile-based backup (PlanU-style), random-policy rollouts, and root parallelism.

Proposals — Candidate action generation for planners to score. Random sampling, a fixed heuristic policy, an LLM proposer (Claude API with formatted state summaries), and a structured mixture proposer with promo-aware reorder strategies.

Forward Model — The agent’s internal demand model, used by both Greedy and MCTS for one-step simulation and multi-step tree search. Each MCTS simulation draws one coherent parameter set from the inference posterior.

Experimental Framework

The project defines a systematic baseline comparison matrix where each adjacent comparison isolates exactly one variable:

Arm Planning Uncertainty Proposals Tests
B1 Greedy Point estimate Random Floor baseline
B2 Greedy Point estimate LLM Value of LLM proposals
B3 Greedy Bayesian LLM Value of Bayesian inference
B4 MCTS Point estimate LLM Value of multi-step planning
B5 MCTS Bayesian Random MCTS+Bayesian without LLM
Full MCTS Bayesian LLM All components

Results

Codebase

~3,500 lines of source code plus ~3,100 lines of tests. Built with Python, NumPy, SciPy, and the Anthropic SDK (Claude API for LLM proposals).

GitHub repository