Tutorial
This tutorial uses the covid_contagion minimal example to demonstrate the Melodie architecture. The code is kept simple but covers all core components. The project structure is as follows:
examples/covid_contagion
├── core
│ ├── agent.py
│ ├── environment.py
│ ├── model.py
│ ├── scenario.py
│ └── data_collector.py
├── data
│ ├── input
│ │ ├── SimulatorScenarios.csv
│ │ └── ID_HealthState.csv
│ └── output
│ ├── Result_Simulator_Agents.csv
│ └── Result_Simulator_Environment.csv
├── main.py
└── README.md
Conceptually, Melodie encourages separation of concerns in an ABM. In this example, the main components are:
Model (
core/model.py) wires all components together and defines the main time loop.Scenario (
core/scenario.py) is the single entry point for all model inputs. It loads all data fromdata/input(includingSimulatorScenarios), and other components access this data viaself.scenario.xxx.Environment (
core/environment.py) implements the model’s macro-level logic. It coordinates agent interactions and calculates population-level statistics.Agents (
core/agent.py) implement the model’s micro-level logic. They only know their own state and how to update it based on scenario parameters or environment instructions.DataCollector (
core/data_collector.py) specifies what data to record from agents and the environment, and saves it todata/output.
The covid_contagion example is deliberately simple to make this architecture clear. It contains one type of agent, one environment, one model, and one scenario table.
Model
The Model creates components, drives the simulation steps, and collects outputs. It orchestrates the simulation but does not contain domain-specific logic itself—that belongs in the Agent and Environment classes.
1from typing import TYPE_CHECKING
2
3from Melodie import Model
4from .agent import CovidAgent
5from .environment import CovidEnvironment
6from .data_collector import CovidDataCollector
7from .scenario import CovidScenario
8
9if TYPE_CHECKING:
10 from Melodie import AgentList
11
12
13class CovidModel(Model):
14 scenario: "CovidScenario"
15 agents: "AgentList[CovidAgent]"
16 environment: CovidEnvironment
17 data_collector: CovidDataCollector
18
19 def create(self) -> None:
20 """
21 This method is called once at the beginning of the simulation
22 to create the core components of the model.
23 """
24 self.agents = self.create_agent_list(CovidAgent)
25 self.environment = self.create_environment(CovidEnvironment)
26 self.data_collector = self.create_data_collector(CovidDataCollector)
27
28 def setup(self) -> None:
29 """
30 This method is called once after `create` to set up the initial state
31 of the model components.
32 """
33 self.agents.setup_agents(self.scenario.agent_num)
34 self.environment.setup_infection(self.agents)
35
36 def run(self) -> None:
37 """
38 This method defines the main simulation loop that runs for `scenario.period_num` periods.
39 """
40 for t in self.iterator(self.scenario.period_num):
41 self.environment.agents_interaction(self.agents)
42 self.environment.agents_recover(self.agents)
43 self.environment.update_population_stats(self.agents)
44 self.data_collector.collect(t)
45 self.data_collector.save()
Notes:
create: Builds the AgentList, Environment, and DataCollector.setup: Initializes agents and sets up the initial infection.run: Executes the main simulation loop, calling the environment for agent interaction and recovery, and then saving the results.
Think of the Model as the orchestrator of components: it decides what components exist, in what order they are called each period, and delegates the concrete simulation logic to them. This allows you to reuse Agent and Environment classes in different experimental setups by only changing the run logic in the Model.
Scenario Definition
The Scenario defines parameters and loads static data tables. It acts as the single source of truth for all configuration and input data, allowing the other components to focus on behavior rather than I/O.
1from Melodie import Scenario
2
3
4class CovidScenario(Scenario):
5 """
6 Defines the parameters for each simulation run of the virus contagion model.
7 It inherits from the base `Melodie.Scenario`.
8
9 Special Melodie attributes (automatically populated from `SimulatorScenarios`):
10 - `id`: A unique identifier for the scenario.
11 - `run_num`: How many times to repeat this scenario (for stochastic models). Defaults to 1.
12 - `period_num`: How many simulation steps (periods) for each run. Defaults to 0.
13 """
14
15 def setup(self) -> None:
16 """
17 Declares custom scenario parameters for type hinting and clarity.
18 These are automatically populated from columns in `SimulatorScenarios`.
19 """
20 self.agent_num: int = 0
21 self.initial_infected_percentage: float = 0.0
22 self.infection_prob: float = 0.0
23 self.recovery_prob: float = 0.0
24
25 def load_data(self) -> None:
26 """
27 This method is automatically called by Melodie after scenario parameters are set.
28 It is the recommended place to load all static input dataframes, making them
29 accessible via `self.scenario.*` from `Model`, `Agent`, and `Environment`.
30 """
31 self.health_states = self.load_dataframe("ID_HealthState.csv")
Some column names in SimulatorScenarios have a special meaning in Melodie:
id: A unique identifier for the scenario, used in output files.run_num: How many times to repeat the same scenario. This is useful for analyzing stochastic models to observe the variability of outcomes. Defaults to1if omitted.period_num: How many periods the model will run. Defaults to0if omitted (the simulation loop will not execute).
These special attributes are defined in the base Melodie.Scenario class. They are typically controlled by adding corresponding columns to SimulatorScenarios, but you do not need to redeclare them in your subclass’s setup method unless you want explicit type hints. Other columns are user-defined.
The load_data method is a special hook that is automatically called by Melodie, so you should not change its name. It runs after parameters are loaded from the current SimulatorScenarios row. Inside load_data, you can load any other required data tables and attach them as attributes to the scenario instance. All other components (Model, Agent, Environment) can then access this data uniformly via self.scenario.xxx. For data-heavy models, you can also pre-process large tables in load_data (e.g., into dictionaries) to improve performance.
Note
Melodie automatically loads several standard scenario tables by recognizing
their filenames. These tables are essential for the Simulator,
Trainer, and Calibrator to function correctly. The recognized names
are:
SimulatorScenariosCalibratorScenariosCalibratorParamsScenariosTrainerScenariosTrainerParamsScenarios
These files are automatically loaded and attached to each Scenario
object, becoming accessible via attributes like
self.scenario.simulator_scenarios or
self.scenario.calibrator_params_scenarios. You do not need to load
them manually in load_data.
Environment Logic
The Environment coordinates interactions, the initial infection, and recovery. It acts as the “director” that arranges agent interactions and maintains macro-level summaries.
1import random
2from typing import TYPE_CHECKING
3
4from Melodie import Environment
5from .agent import CovidAgent
6from .scenario import CovidScenario
7
8if TYPE_CHECKING:
9 from Melodie import AgentList
10
11
12class CovidEnvironment(Environment):
13 scenario: "CovidScenario"
14
15 def setup(self) -> None:
16 # Macro-level counters for population statistics.
17 # These are updated each period by `update_population_stats` and recorded by the DataCollector.
18 self.num_susceptible: int = 0
19 self.num_infected: int = 0
20 self.num_recovered: int = 0
21
22 def setup_infection(self, agents: "AgentList[CovidAgent]") -> None:
23 # Sets the initial percentage of infected agents based on scenario parameters.
24 for agent in agents:
25 if (
26 agent.health_state == 0
27 and random.random() < self.scenario.initial_infected_percentage
28 ):
29 agent.health_state = 1
30
31 def agents_interaction(self, agents: "AgentList[CovidAgent]") -> None:
32 # Simulates random interactions where infected agents can spread the virus.
33 for agent in agents:
34 if agent.health_state == 1:
35 # Randomly meet another agent
36 # Note: Melodie's AgentList.random_sample returns a list
37 other_agent = agents.random_sample(1)[0]
38 if other_agent.health_state == 0:
39 if random.random() < self.scenario.infection_prob:
40 other_agent.health_state = 1
41
42 def agents_recover(self, agents: "AgentList[CovidAgent]") -> None:
43 # Triggers the recovery process for all infected agents.
44 for agent in agents:
45 agent.health_state_update(self.scenario.recovery_prob)
46
47 def update_population_stats(self, agents: "AgentList[CovidAgent]") -> None:
48 # Aggregates agent states into environment-level population counts.
49 self.num_susceptible = 0
50 self.num_infected = 0
51 self.num_recovered = 0
52
53 for agent in agents:
54 if agent.health_state == 0:
55 self.num_susceptible += 1
56 elif agent.health_state == 1:
57 self.num_infected += 1
58 elif agent.health_state == 2:
59 self.num_recovered += 1
Notes:
setup_infection: Sets the initial number of infected agents via a Bernoulli trial for each susceptible agent.agents_interaction: A simple mean-field interaction where each infected agent randomly meets one other agent and may spread the virus.agents_recover: Delegates the recovery logic to each agent.update_population_stats: Aggregates micro-level agent states into macro-level population counts.
Agent Behavior
The Agent defines micro-level state and behavior. Agents in Melodie are deliberately lightweight: they only store their own state and expose methods for state transitions. The Environment decides when these methods are called.
1import random
2from Melodie import Agent
3
4
5class CovidAgent(Agent):
6 def setup(self) -> None:
7 """
8 Initializes the agent's state.
9 `health_state`: 0 = susceptible, 1 = infected, 2 = recovered.
10 """
11 self.health_state: int = 0 # All agents start as susceptible.
12
13 def health_state_update(self, recovery_prob: float) -> None:
14 """
15 Agent-level logic for transitioning from infected to recovered.
16 """
17 if self.health_state == 1 and random.random() < recovery_prob:
18 self.health_state = 2
19
Notes:
State:
health_state(0: susceptible, 1: infected, 2: recovered).Method:
health_state_updatecontains the logic for an agent to recover from infection.
This “smart environment, simple agents” pattern is a core design principle in Melodie. It keeps agent classes simple and testable, centralizes interaction logic in the Environment, and makes it easier to modify interaction rules (e.g., from random-mixing to a grid-based model) without changing agent code.
Data Collection Setup
The Data Collector specifies which micro and macro results to save to data/output.
1from Melodie import DataCollector
2
3
4class CovidDataCollector(DataCollector):
5 def setup(self) -> None:
6 """
7 Registers which properties of agents and the environment should be recorded.
8
9 In this minimal example we record:
10 - micro-level results: each agent's ``health_state``;
11 - macro-level results: population counts on the environment
12 (``num_susceptible``, ``num_infected``, ``num_recovered``).
13 """
14 self.add_agent_property("agents", "health_state")
15 self.add_environment_property("num_susceptible")
16 self.add_environment_property("num_infected")
17 self.add_environment_property("num_recovered")
18
The key design idea is that data collection is explicit: you register exactly which properties to track, and Melodie handles the indexing by scenario, run, period, and agent ID in a consistent format. This separates the simulation logic from the data storage logic.
Run the model
To run the example from the repository root (after activating your virtual environment):
python examples/covid_contagion/main.py
This main.py file is the entry point that loads the configuration and starts the simulation. It mirrors how you would run any Melodie project from a script.
1import os
2from Melodie import Config, Simulator
3from examples.covid_contagion.core.model import CovidModel
4from examples.covid_contagion.core.scenario import CovidScenario
5
6if __name__ == "__main__":
7 config = Config(
8 project_name="covid_contagion",
9 project_root=os.path.dirname(os.path.abspath(__file__)),
10 input_folder=os.path.join(
11 os.path.dirname(os.path.abspath(__file__)), "data", "input"
12 ),
13 output_folder=os.path.join(
14 os.path.dirname(os.path.abspath(__file__)), "data", "output"
15 ),
16 )
17
18 simulator = Simulator(
19 config=config,
20 model_cls=CovidModel,
21 scenario_cls=CovidScenario,
22 )
23 simulator.run()
24 # simulator.run_parallel(cores=8)
25 # simulator.run_parallel_multithread(cores=8)
Notes:
It is recommended to use absolute paths for input/output folders to avoid ambiguity.
The
Simulatorautomatically finds the scenario table by its name,SimulatorScenarios(both.csvand.xlsxare supported).The Config object tells Melodie where data lives on disk.
The Simulator object knows how to iterate over scenarios and runs.
When you start a simulation, the Simulator automatically calls a set of lifecycle methods on your components in a fixed order:
On each
Scenario:setup(), thenload_data(), thensetup_data()(if implemented).On the
Model:create(), thensetup(), thenrun().On components like Environment, AgentList, and DataCollector: their respective
setup()methods.
This is why these method names are a fixed convention. You rarely call these methods directly—the Simulator manages the full execution loop for you.
After a run, the data/output folder will contain CSV files for analysis:
Result_Simulator_Agents.csv: Per-period agent states.Result_Simulator_Environment.csv: Per-period environment aggregates (the macro metrics registered in the DataCollector).
Parallel Execution
For large-scale experiments, running simulations sequentially can be time-consuming. Melodie provides two methods for parallel execution on multi-core machines, available on the Simulator object.
1. ``run_parallel()``: Process-Based Parallelism
This is the recommended and most robust method for parallelization in Melodie.
Mechanism: It uses Python’s
multiprocessingmodule to spawn multiple independent worker processes. Each worker runs a subset of the simulation scenarios/runs on a separate CPU core.Use Case: Ideal for any substantial simulation task. It scales well as it bypasses Python’s Global Interpreter Lock (GIL), allowing for true parallel computation on CPU-bound models.
Usage:
# In main.py, instead of simulator.run():
simulator.run_parallel(cores=4) # Use 4 CPU cores
2. ``run_parallel_multithread()``: Thread-Based Parallelism (Experimental)
This method is an experimental feature designed to leverage modern Python versions (3.13+).
Mechanism: It uses a thread pool instead of a process pool. This avoids the overhead of creating new processes and serializing (pickling) data between them.
- Use Case:
Python 3.13+ (with free-threading mode): This method can offer significant performance gains over
run_parallel()by running threads on multiple cores without the GIL.Older Python Versions: It will run concurrently but will be limited by the GIL. For CPU-bound ABM simulations, it is unlikely to provide a speedup and may even be slower than a sequential run.
Usage:
# In main.py, for experiments on Python 3.13+
simulator.run_parallel_multithread(cores=4)
Performance Comparison: A Quick Case Study
To demonstrate the difference, we ran the covid_contagion example with 24 scenarios (each with run_num = 1) on a machine with a Python 3.11 environment and 8 cores.
- `run_parallel(cores=8)`:
Total Time: ~0.90 seconds
Mechanism: Spawns 8 separate Python processes. Each process has its own memory and GIL, allowing them to run computations on different cores simultaneously. This is highly effective for CPU-bound tasks like ABM.
- `run_parallel_multithread(cores=8)`:
Total Time: ~1.01 seconds
Mechanism: Spawns 8 threads within a single Python process. In Python versions before 3.13, the GIL prevents these threads from executing Python code on more than one core at a time. The overhead of thread management can even make it slightly slower than the process-based approach.
Update: Test Results on Python 3.14.2 (Free-Threaded)
With the official support for a free-threaded model (No-GIL) in Python 3.14+, the performance dynamic has shifted dramatically as predicted. We ran the same test on the same 8-core machine using Python 3.14.2:
- `run_parallel(cores=8)`:
Total Time: ~5.37 seconds
Mechanism: Still effective, but the overhead of creating 8 separate processes and pickling data for communication is now more apparent compared to the lightweight thread-based alternative.
- `run_parallel_multithread(cores=8)`:
Total Time: ~1.56 seconds
Mechanism: The 8 threads now run on 8 cores in true parallelism within a single process. By avoiding the overhead of process creation and data serialization, this method is now over 3.4x faster for this specific task.
Conclusion: For CPU-bound agent-based models, run_parallel() remains a robust choice for all Python versions. However, if you are using Python 3.14+, run_parallel_multithread() is now the highly recommended method for achieving superior performance on multi-core systems, thanks to the removal of the GIL.
A Note on Performance Trade-offs
An astute observer might notice that the absolute execution times on Python 3.14.2 for both methods were slower than on Python 3.11. This is an expected trade-off. The Python 3.14+ interpreter is more complex to support features like free-threading, leading to a higher startup overhead for each process.
In our test, the simulation task for each scenario is very short (milliseconds). Consequently, the overhead of creating new Python processes for run_parallel() becomes a significant portion of the total time, causing its slowdown from ~0.9s to ~5.37s.
The key takeaway is not the absolute speed on this micro-task, but the relative speedup. The test clearly demonstrates that for Python 3.14+, run_parallel_multithread() effectively eliminates this high process-creation overhead, making it the superior architecture for computationally intensive models where the simulation time far outweighs the initial setup time.
A Note on Paths and Parallel Execution
You may have noticed that some examples in the examples/ directory require special handling to run in parallel, such as being executed as a module (e.g., python -m examples.covid_contagion.main) and having __init__.py files in their directories.
This is a specific consequence of these examples being nested inside a larger project structure (the Melodie repository itself). When run_parallel() creates new processes, those processes need to be able to import the model’s code (like core.model). The path manipulation ensures they can find the examples package from the project’s root.