Auto-Setup Wizard

The Auto-Setup Wizard is the fastest way to initialize a robust, AgentCommander-compatible experiment environment.

Using the UI Wizard (Recommended)

The easiest way to start is using the built-in Experiment Setup wizard directly in the web UI.

Auto-Setup Wizard

Navigate to "Setup": Click the "Experiment Setup" tab in the UI sidebar.
Select a Template:
- [Case: You only have Dataset]: Corresponds to Scenario 1 below.
- [Case: You have Training Code]: Corresponds to Scenario 2 below.
Configure: Fill in the required fields (e.g., Project Name, Absolute Path to Data).
Launch: Click 🚀 Run Setup Script.
- The integrated console will show the setup progress as it creates directories, splits data, and generates the initial evaluator.py.

Use Case: You have a dataset (X.npy, Y.npy) but no model code. You want the Agent to build a model from scratch.

Input: Path to your data directory.
Splitting: The script runs split_data.py to create X_train.npy, X_test.npy, etc.
Generation: The Agent generates an initial strategy.py and metric.py based on your description.
Verification: Runs a dry run to ensure the generated code runs.

Use Case: You already have a training script (strategy.py) and want the Agent to optimize hyperparameters or architecture.

To use this mode, your code must adhere to a simple interface contract so the Evaluator can judge it:

Weight Saving: Your script must save the best model weights to a file (e.g., best_fast.pt).

Loading Interface: You must implement a factory function:

def load_trained_model(path, device):
    # Return the loaded model object
    return model

Data Protocol: Your code should load data using the shared experiment_setup.py module (generated by the wizard) to ensure train/test splits are consistent between the Player (Strategy) and the Judge (Evaluator).

experiment_setup.py: Locks the random seed and data splits. Immutable.
evaluator_ref.py: A template evaluator that loads your model and tests it against the reserved test set.

Metric Standardization: Both modes verify that the evaluator prints Best metric: {val} (lowercase 'm') for the workflow to parse.
Safety Checks: Both modes include anti-cheating checks (e.g., verifying y_test was not modified in memory).