Tutorial

This tutorial uses the covid_contagion minimal example to demonstrate the Melodie architecture. The code is kept simple but covers all core components. The project structure is as follows:

examples/covid_contagion
├── core
│   ├── agent.py
│   ├── environment.py
│   ├── model.py
│   ├── scenario.py
│   └── data_collector.py
├── data
│   ├── input
│   │   ├── SimulatorScenarios.csv
│   │   └── ID_HealthState.csv
│   └── output
│       ├── Result_Simulator_Agents.csv
│       └── Result_Simulator_Environment.csv
├── main.py
└── README.md

Conceptually, Melodie encourages separation of concerns in an ABM. In this example, the main components are:

  • Model (core/model.py) wires all components together and defines the main time loop.

  • Scenario (core/scenario.py) is the single entry point for all model inputs. It loads all data from data/input (including SimulatorScenarios), and other components access this data via self.scenario.xxx.

  • Environment (core/environment.py) implements the model’s macro-level logic. It coordinates agent interactions and calculates population-level statistics.

  • Agents (core/agent.py) implement the model’s micro-level logic. They only know their own state and how to update it based on scenario parameters or environment instructions.

  • DataCollector (core/data_collector.py) specifies what data to record from agents and the environment, and saves it to data/output.

The covid_contagion example is deliberately simple to make this architecture clear. It contains one type of agent, one environment, one model, and one scenario table.

Model

The Model creates components, drives the simulation steps, and collects outputs. It orchestrates the simulation but does not contain domain-specific logic itself—that belongs in the Agent and Environment classes.

 1from typing import TYPE_CHECKING
 2
 3from Melodie import Model
 4from .agent import CovidAgent
 5from .environment import CovidEnvironment
 6from .data_collector import CovidDataCollector
 7from .scenario import CovidScenario
 8
 9if TYPE_CHECKING:
10    from Melodie import AgentList
11
12
13class CovidModel(Model):
14    scenario: "CovidScenario"
15    agents: "AgentList[CovidAgent]"
16    environment: CovidEnvironment
17    data_collector: CovidDataCollector
18
19    def create(self) -> None:
20        """
21        This method is called once at the beginning of the simulation
22        to create the core components of the model.
23        """
24        self.agents = self.create_agent_list(CovidAgent)
25        self.environment = self.create_environment(CovidEnvironment)
26        self.data_collector = self.create_data_collector(CovidDataCollector)
27
28    def setup(self) -> None:
29        """
30        This method is called once after `create` to set up the initial state
31        of the model components.
32        """
33        self.agents.setup_agents(self.scenario.agent_num)
34        self.environment.setup_infection(self.agents)
35
36    def run(self) -> None:
37        """
38        This method defines the main simulation loop that runs for `scenario.period_num` periods.
39        """
40        for t in self.iterator(self.scenario.period_num):
41            self.environment.agents_interaction(self.agents)
42            self.environment.agents_recover(self.agents)
43            self.environment.update_population_stats(self.agents)
44            self.data_collector.collect(t)
45        self.data_collector.save()

Notes:

  • create: Builds the AgentList, Environment, and DataCollector.

  • setup: Initializes agents and sets up the initial infection.

  • run: Executes the main simulation loop, calling the environment for agent interaction and recovery, and then saving the results.

Think of the Model as the orchestrator of components: it decides what components exist, in what order they are called each period, and delegates the concrete simulation logic to them. This allows you to reuse Agent and Environment classes in different experimental setups by only changing the run logic in the Model.

Scenario Definition

The Scenario defines parameters and loads static data tables. It acts as the single source of truth for all configuration and input data, allowing the other components to focus on behavior rather than I/O.

 1from Melodie import Scenario
 2
 3
 4class CovidScenario(Scenario):
 5    """
 6    Defines the parameters for each simulation run of the virus contagion model.
 7    It inherits from the base `Melodie.Scenario`.
 8
 9    Special Melodie attributes (automatically populated from `SimulatorScenarios`):
10    - `id`: A unique identifier for the scenario.
11    - `run_num`: How many times to repeat this scenario (for stochastic models). Defaults to 1.
12    - `period_num`: How many simulation steps (periods) for each run. Defaults to 0.
13    """
14
15    def setup(self) -> None:
16        """
17        Declares custom scenario parameters for type hinting and clarity.
18        These are automatically populated from columns in `SimulatorScenarios`.
19        """
20        self.agent_num: int = 0
21        self.initial_infected_percentage: float = 0.0
22        self.infection_prob: float = 0.0
23        self.recovery_prob: float = 0.0
24
25    def load_data(self) -> None:
26        """
27        This method is automatically called by Melodie after scenario parameters are set.
28        It is the recommended place to load all static input dataframes, making them
29        accessible via `self.scenario.*` from `Model`, `Agent`, and `Environment`.
30        """
31        self.health_states = self.load_dataframe("ID_HealthState.csv")

Some column names in SimulatorScenarios have a special meaning in Melodie:

  • id: A unique identifier for the scenario, used in output files.

  • run_num: How many times to repeat the same scenario. This is useful for analyzing stochastic models to observe the variability of outcomes. Defaults to 1 if omitted.

  • period_num: How many periods the model will run. Defaults to 0 if omitted (the simulation loop will not execute).

These special attributes are defined in the base Melodie.Scenario class. They are typically controlled by adding corresponding columns to SimulatorScenarios, but you do not need to redeclare them in your subclass’s setup method unless you want explicit type hints. Other columns are user-defined.

The load_data method is a special hook that is automatically called by Melodie, so you should not change its name. It runs after parameters are loaded from the current SimulatorScenarios row. Inside load_data, you can load any other required data tables and attach them as attributes to the scenario instance. All other components (Model, Agent, Environment) can then access this data uniformly via self.scenario.xxx. For data-heavy models, you can also pre-process large tables in load_data (e.g., into dictionaries) to improve performance.

Note

Melodie automatically loads several standard scenario tables by recognizing their filenames. These tables are essential for the Simulator, Trainer, and Calibrator to function correctly. The recognized names are:

  • SimulatorScenarios

  • CalibratorScenarios

  • CalibratorParamsScenarios

  • TrainerScenarios

  • TrainerParamsScenarios

These files are automatically loaded and attached to each Scenario object, becoming accessible via attributes like self.scenario.simulator_scenarios or self.scenario.calibrator_params_scenarios. You do not need to load them manually in load_data.

Environment Logic

The Environment coordinates interactions, the initial infection, and recovery. It acts as the “director” that arranges agent interactions and maintains macro-level summaries.

 1import random
 2from typing import TYPE_CHECKING
 3
 4from Melodie import Environment
 5from .agent import CovidAgent
 6from .scenario import CovidScenario
 7
 8if TYPE_CHECKING:
 9    from Melodie import AgentList
10
11
12class CovidEnvironment(Environment):
13    scenario: "CovidScenario"
14
15    def setup(self) -> None:
16        # Macro-level counters for population statistics.
17        # These are updated each period by `update_population_stats` and recorded by the DataCollector.
18        self.num_susceptible: int = 0
19        self.num_infected: int = 0
20        self.num_recovered: int = 0
21
22    def setup_infection(self, agents: "AgentList[CovidAgent]") -> None:
23        # Sets the initial percentage of infected agents based on scenario parameters.
24        for agent in agents:
25            if (
26                agent.health_state == 0
27                and random.random() < self.scenario.initial_infected_percentage
28            ):
29                agent.health_state = 1
30
31    def agents_interaction(self, agents: "AgentList[CovidAgent]") -> None:
32        # Simulates random interactions where infected agents can spread the virus.
33        for agent in agents:
34            if agent.health_state == 1:
35                # Randomly meet another agent
36                # Note: Melodie's AgentList.random_sample returns a list
37                other_agent = agents.random_sample(1)[0]
38                if other_agent.health_state == 0:
39                    if random.random() < self.scenario.infection_prob:
40                        other_agent.health_state = 1
41    
42    def agents_recover(self, agents: "AgentList[CovidAgent]") -> None:
43        # Triggers the recovery process for all infected agents.
44        for agent in agents:
45            agent.health_state_update(self.scenario.recovery_prob)
46
47    def update_population_stats(self, agents: "AgentList[CovidAgent]") -> None:
48        # Aggregates agent states into environment-level population counts.
49        self.num_susceptible = 0
50        self.num_infected = 0
51        self.num_recovered = 0
52
53        for agent in agents:
54            if agent.health_state == 0:
55                self.num_susceptible += 1
56            elif agent.health_state == 1:
57                self.num_infected += 1
58            elif agent.health_state == 2:
59                self.num_recovered += 1

Notes:

  • setup_infection: Sets the initial number of infected agents via a Bernoulli trial for each susceptible agent.

  • agents_interaction: A simple mean-field interaction where each infected agent randomly meets one other agent and may spread the virus.

  • agents_recover: Delegates the recovery logic to each agent.

  • update_population_stats: Aggregates micro-level agent states into macro-level population counts.

Agent Behavior

The Agent defines micro-level state and behavior. Agents in Melodie are deliberately lightweight: they only store their own state and expose methods for state transitions. The Environment decides when these methods are called.

 1import random
 2from Melodie import Agent
 3
 4
 5class CovidAgent(Agent):
 6    def setup(self) -> None:
 7        """
 8        Initializes the agent's state.
 9        `health_state`: 0 = susceptible, 1 = infected, 2 = recovered.
10        """
11        self.health_state: int = 0  # All agents start as susceptible.
12
13    def health_state_update(self, recovery_prob: float) -> None:
14        """
15        Agent-level logic for transitioning from infected to recovered.
16        """
17        if self.health_state == 1 and random.random() < recovery_prob:
18            self.health_state = 2
19

Notes:

  • State: health_state (0: susceptible, 1: infected, 2: recovered).

  • Method: health_state_update contains the logic for an agent to recover from infection.

This “smart environment, simple agents” pattern is a core design principle in Melodie. It keeps agent classes simple and testable, centralizes interaction logic in the Environment, and makes it easier to modify interaction rules (e.g., from random-mixing to a grid-based model) without changing agent code.

Data Collection Setup

The Data Collector specifies which micro and macro results to save to data/output.

 1from Melodie import DataCollector
 2
 3
 4class CovidDataCollector(DataCollector):
 5    def setup(self) -> None:
 6        """
 7        Registers which properties of agents and the environment should be recorded.
 8
 9        In this minimal example we record:
10        - micro-level results: each agent's ``health_state``;
11        - macro-level results: population counts on the environment
12          (``num_susceptible``, ``num_infected``, ``num_recovered``).
13        """
14        self.add_agent_property("agents", "health_state")
15        self.add_environment_property("num_susceptible")
16        self.add_environment_property("num_infected")
17        self.add_environment_property("num_recovered")
18

The key design idea is that data collection is explicit: you register exactly which properties to track, and Melodie handles the indexing by scenario, run, period, and agent ID in a consistent format. This separates the simulation logic from the data storage logic.

Run the model

To run the example from the repository root (after activating your virtual environment):

python -m examples.covid_contagion.main

This main.py file is the entry-point module that loads the configuration and starts the simulation. The same module-style invocation pattern can be used for the other examples under examples/.

 1import os
 2from Melodie import Config, Simulator
 3from examples.covid_contagion.core.model import CovidModel
 4from examples.covid_contagion.core.scenario import CovidScenario
 5
 6if __name__ == "__main__":
 7    config = Config(
 8        project_name="covid_contagion",
 9        project_root=os.path.dirname(os.path.abspath(__file__)),
10        input_folder=os.path.join(
11            os.path.dirname(os.path.abspath(__file__)), "data", "input"
12        ),
13        output_folder=os.path.join(
14            os.path.dirname(os.path.abspath(__file__)), "data", "output"
15        ),
16    )
17
18    simulator = Simulator(
19        config=config,
20        model_cls=CovidModel,
21        scenario_cls=CovidScenario,
22    )
23    simulator.run()
24    # simulator.run_parallel(cores=8)
25    # simulator.run_parallel(cores=8, parallel_mode="thread")

Notes:

  • It is recommended to use absolute paths for input/output folders to avoid ambiguity.

  • The Simulator automatically finds the scenario table by its name, SimulatorScenarios (both .csv and .xlsx are supported).

  • The Config object tells Melodie where data lives on disk.

  • The Simulator object knows how to iterate over scenarios and runs.

When you start a simulation, the Simulator automatically calls a set of lifecycle methods on your components in a fixed order:

  • On each Scenario: setup(), then load_data(), then setup_data() (if implemented).

  • On the Model: create(), then setup(), then run().

  • On components like Environment, AgentList, and DataCollector: their respective setup() methods.

This is why these method names are a fixed convention. You rarely call these methods directly—the Simulator manages the full execution loop for you.

After a run, the data/output folder will contain CSV files for analysis:

  • Result_Simulator_Agents.csv: Per-period agent states.

  • Result_Simulator_Environment.csv: Per-period environment aggregates (the macro metrics registered in the DataCollector).

Parallel Execution

For large-scale experiments, running simulations sequentially can be time-consuming. Melodie provides a single parallel entry point on the Simulator object.

``run_parallel()``: Auto-Selected Parallelism

This is the recommended high-level entry point for parallelization in Melodie.

  • Mechanism: If you do not specify a mode, Melodie automatically chooses thread-based execution on Python 3.13+ and process-based execution on older Python versions. You can still override the choice manually with parallel_mode="process" or parallel_mode="thread".

  • Use Case: This is the default API to use when you want Melodie to choose a sensible parallel backend for the current interpreter.

  • Usage:

# In main.py, instead of simulator.run():
simulator.run_parallel(cores=4)  # Use 4 CPU cores

# Or override the automatic choice explicitly:
simulator.run_parallel(cores=4, parallel_mode="process")
simulator.run_parallel(cores=4, parallel_mode="thread")

Performance Comparison: A Quick Case Study

To demonstrate the difference, we ran the covid_contagion example with 24 scenarios (each with run_num = 1) on a machine with a Python 3.11 environment and 8 cores.

  • `run_parallel(cores=8)`:
    • Total Time: ~0.90 seconds

    • Mechanism: Spawns 8 separate Python processes. Each process has its own memory and GIL, allowing them to run computations on different cores simultaneously. This is highly effective for CPU-bound tasks like ABM.

  • `run_parallel(cores=8, parallel_mode=”thread”)`:
    • Total Time: ~1.01 seconds

    • Mechanism: Spawns 8 threads within a single Python process. In Python versions before 3.13, the GIL prevents these threads from executing Python code on more than one core at a time. The overhead of thread management can even make it slightly slower than the process-based approach.

Update: Test Results on Python 3.14.2

On Python 3.14.2, the performance dynamic shifts because the thread-based backend avoids process startup and pickling overhead. We ran the same test on the same 8-core machine:

  • `run_parallel(cores=8)`:
    • Total Time: ~5.37 seconds

    • Mechanism: Still effective, but the overhead of creating 8 separate processes and pickling data for communication is now more apparent compared to the lightweight thread-based alternative.

  • `run_parallel(cores=8, parallel_mode=”thread”)`:
    • Total Time: ~1.56 seconds

    • Mechanism: The 8 threads now run on 8 cores in true parallelism within a single process. By avoiding the overhead of process creation and data serialization, this method is now over 3.4x faster for this specific task.

Conclusion: For most use cases, call run_parallel() and let Melodie choose the backend automatically. If you want to force a specific backend, use parallel_mode=... explicitly.

A Note on Performance Trade-offs

An astute observer might notice that the absolute execution times on Python 3.14.2 for both methods were slower than on Python 3.11. This is an expected trade-off. The newer interpreter has a higher startup overhead for each process in this micro-benchmark.

In our test, the simulation task for each scenario is very short (milliseconds). Consequently, the overhead of creating new Python processes for run_parallel() becomes a significant portion of the total time, causing its slowdown from ~0.9s to ~5.37s.

The key takeaway is not the absolute speed on this micro-task, but the relative speedup. The test clearly demonstrates that on Python 3.14 and newer builds, run_parallel(..., parallel_mode="thread") effectively eliminates this high process-creation overhead, making it the superior architecture for computationally intensive models where the simulation time far outweighs the initial setup time.

A Note on Paths and Parallel Execution

All examples in the examples/ directory should be executed from the repository root as Python modules, for example python -m examples.covid_contagion.main.

This keeps imports consistent for both normal execution and parallel worker processes. The example directories are structured as packages, so no sys.path manipulation is needed.