CovidContagionCalibrator
This example demonstrates how to use Melodie’s Calibrator module. It extends the base covid_contagion model by using a genetic algorithm (GA) to find the value of the infection_prob parameter that results in a target final infected ratio.
Calibrator: Project Structure
examples/covid_contagion_calibrator
│ ├── core/
│ │ ├── calibrator.py
│ │ ├── scenario.py
│ │ ├── data_collector.py
│ │ └── model.py
│ ├── data/
│ │ ├── CalibratorScenarios.csv
│ │ └── CalibratorParamsScenarios.csv
│ └── output/
└── main.py
Calibrator: Key Changes
The primary changes for the calibrator example are:
A ``calibrator.py`` Module: This new file defines the
CovidCalibratorclass, which inherits fromMelodie.Calibrator.Calibrator Scenarios: The input data now includes
CalibratorScenarios.csvandCalibratorParamsScenarios.csvto control the calibration process.CalibratorScenarios.csvis similar toSimulatorScenariosbut does not use arun_numcolumn.Path Modification in ``main.py``: A special modification to
sys.pathis included. This is necessary because theCalibratoruses multiprocessing for parallel execution. Worker processes are spawned in a new environment and need to be able to find and import the project’s modules (e.g., from examples.covid_contagion_calibrator.core…). This path modification is a robust way to ensure modules are discoverable when running examples directly from the Melodie repository, which is not a standard installed package.
Calibrator: GA Concepts
The primary goal of the Calibrator is to automatically tune one or more model parameters to match observed, real-world data. It treats this as an optimization problem: finding the set of parameters that minimizes the “distance” between the model’s output and a specified target.
Core Idea: Minimizing Distance
Target: You define a target outcome. In this example, the target is for the final proportion of the population that has been infected to be 80%. This target is hard-coded in the
distancefunction.Parameters: You specify which scenario parameter(s) the Calibrator should adjust. Here, it’s the
infection_prob.Distance Function: You must implement a
distance(model)method. This function is the core of the calibration. It runs after a simulation is complete, calculates a metric from the model’s final state, and returns a single float value representing how far off that metric is from the target. The Calibrator’s goal is to make this value as close to zero as possible.
Genetic Algorithm (GA) for Optimization
Melodie uses a genetic algorithm to search the parameter space efficiently.
Chromosome: A single set of the parameters being calibrated (in this case, just a single value for
infection_prob) is treated as a “chromosome.”Population: The GA starts with a “population” of these chromosomes, i.e., many different random values for
infection_prob.Fitness: For each chromosome, the model is run, and the
distancefunction is calculated. This distance is the fitness score (where lower is better).Evolution: The GA then proceeds through “generations.” In each generation, it selects the best-performing chromosomes (those that produced the smallest distance), “breeds” them (crossover), and introduces random changes (mutation) to create a new population of parameter sets to test. This process iteratively converges toward the parameter set that produces the best fit to the target.
Parameter Encoding: From Float to Binary
A key aspect of the genetic algorithm is how it represents continuous parameters like infection_prob (a float between 0.0 and 1.0). The GA does not work with the float values directly; instead, it converts them into a binary string of a fixed length. This process is called discretization.
Encoding: The range of a parameter (e.g., from infection_prob_min to infection_prob_max) is mapped to a sequence of integers, which are then represented as binary numbers. For instance, a float value is converted into a binary string like
01101.GA Operations: All genetic operations, such as crossover (swapping parts of two binary strings) and mutation (flipping a random bit, e.g.,
01101->01111), are performed on these binary representations.- Decoding: When a simulation needs to be run, the binary string is decoded back into its corresponding float value. This is done in three steps:
The binary string is converted to its decimal integer representation.
This integer is normalized to a float between 0.0 and 1.0 by dividing it by the maximum possible integer for that bit length (which is 2length - 1).
This normalized float is linearly mapped to the parameter’s defined range (e.g., from infection_prob_min to infection_prob_max).
Precision: The
strategy_param_code_lengthparameter in the.csvfile determines the length of this binary string. A longer string allows for a more fine-grained representation of the parameter space (more steps between the min and max bounds), thus offering higher precision at the cost of a larger search space.
Parameter Configuration (`CalibratorParamsScenarios.csv`)
This file controls the behavior of the genetic algorithm:
id: Unique identifier for a set of GA parameters.path_num: How many independent calibration processes (paths) to run. Each path is a complete run of the GA from a random start, which helps ensure the result is robust and not just a local minimum.generation_num: The number of generations the GA will run for in each path.strategy_population: The size of the population in each generation (how many different parameter sets are tested).mutation_prob: The probability of a random mutation occurring during breeding.strategy_param_code_length: The precision of the parameters, defined by the bit-length of the chromosome.infection_prob_min,infection_prob_max: The lower and upper bounds for theinfection_probparameter search space.
Calibrator: Input Data
In this example, the distance is calculated based on the model’s state at the end of the simulation. If you need to use data from all periods to calculate the target metric, you can compute and store the required value in an environment property throughout the simulation, ensuring it holds the final desired value at the last period.
Calibrator: Running the Model
You can run the calibrator using the main script:
python examples/covid_contagion_calibrator/main.py
This will execute the genetic algorithm, running multiple simulations in parallel to find the optimal infection_prob. The results, including the progression of parameters and distances across generations, are saved to the data/output folder.
Parallel Execution Mode
The Calibrator supports two parallelization modes, controlled by the parallel_mode parameter when creating the calibrator instance:
``parallel_mode=”process”`` (default): Uses subprocess-based parallelism via
multiprocessing. This is the traditional approach and works on all Python versions. It is recommended for most use cases.``parallel_mode=”thread”``: Uses thread-based parallelism via
ThreadPoolExecutor. This mode is recommended for Python 3.13+ (free-threaded/No-GIL builds) as it can provide better performance by avoiding the overhead of process creation and data serialization. In older Python versions, this mode will still run but may be limited by the Global Interpreter Lock (GIL).
You can specify the mode when creating the calibrator:
calibrator = CovidCalibrator(
config=cfg,
model_cls=CovidModel,
scenario_cls=CovidScenario,
processors=8,
parallel_mode="thread", # or "process" (default)
)
Calibrator: Code
This section shows the key code implementation for the calibrator model. Files that are identical to the base covid_contagion model are noted.
Calibrator Definition
Defined in core/calibrator.py.
1import os
2from typing import List
3
4import pandas as pd
5from Melodie import Calibrator
6
7from examples.covid_contagion_calibrator.core.model import CovidModel
8from examples.covid_contagion_calibrator.core.scenario import CovidScenario
9
10
11class CovidCalibrator(Calibrator):
12 """
13 Simple calibrator that tunes `infection_prob` so the final infected ratio matches a target.
14
15 In this example, the target infected ratio is hardcoded to 0.8 inside the `distance` method.
16 The calibrator minimizes the squared difference between the actual infected ratio and this target.
17 """
18
19 scenario_cls: type[CovidScenario]
20 model_cls: type[CovidModel]
21
22 def setup(self) -> None:
23 """
24 Setup the calibrator by defining which properties to tune and which to record.
25 """
26 # Calibrate the `infection_prob` parameter in the scenario.
27 # This tells the Genetic Algorithm (GA) to optimize this specific attribute.
28 self.add_scenario_calibrating_property("infection_prob")
29
30 # Record environment-level data for analysis.
31 # 'num_susceptible' will be recorded in `Result_Calibrator_Environment.csv`
32 # allowing us to verify the calibration result.
33 self.add_environment_property("num_susceptible")
34
35 def distance(self, model: CovidModel) -> float:
36 """
37 Calculates the distance (error) between the model's result and the target.
38 The GA minimizes this value.
39 """
40 env = model.environment
41
42 # Calculate the ratio of agents who were infected (including recovered)
43 # 1 - (susceptible / total)
44 infected_ratio = 1 - env.num_susceptible / env.scenario.agent_num
45
46 # Return squared error.
47 # Target is hardcoded as 0.8 (80% infection rate).
48 return (infected_ratio - 0.8) ** 2
49
Scenario Definition
Defined in core/scenario.py.
1from Melodie import Scenario
2
3
4class CovidScenario(Scenario):
5 """
6 Defines the parameters for each simulation run of the virus contagion model.
7 It inherits from the base `Melodie.Scenario`.
8
9 Special Melodie attributes (automatically populated from `SimulatorScenarios`):
10 - `id`: A unique identifier for the scenario.
11 - `run_num`: How many times to repeat this scenario (for stochastic models). Defaults to 1.
12 - `period_num`: How many simulation steps (periods) for each run. Defaults to 0.
13 """
14
15 def setup(self) -> None:
16 """
17 Declares custom scenario parameters for type hinting and clarity.
18 These are automatically populated from columns in `SimulatorScenarios`.
19 """
20 self.agent_num: int = 0
21 self.initial_infected_percentage: float = 0.0
22 self.infection_prob: float = 0.0
23 self.recovery_prob: float = 0.0
24
25 def load_data(self) -> None:
26 """
27 This method is automatically called by Melodie after scenario parameters are set.
28 It is the recommended place to load all static input dataframes, making them
29 accessible via `self.scenario.*` from `Model`, `Agent`, and `Environment`.
30 """
31 self.health_states = self.load_dataframe("ID_HealthState.csv")
Model Structure
Identical to the base model. Defined in core/model.py.
1from typing import TYPE_CHECKING
2
3from Melodie import Model
4from .agent import CovidAgent
5from .environment import CovidEnvironment
6from .data_collector import CovidDataCollector
7from .scenario import CovidScenario
8
9if TYPE_CHECKING:
10 from Melodie import AgentList
11
12
13class CovidModel(Model):
14 scenario: "CovidScenario"
15 agents: "AgentList[CovidAgent]"
16 environment: CovidEnvironment
17 data_collector: CovidDataCollector
18
19 def create(self) -> None:
20 """
21 This method is called once at the beginning of the simulation
22 to create the core components of the model.
23 """
24 self.agents = self.create_agent_list(CovidAgent)
25 self.environment = self.create_environment(CovidEnvironment)
26 self.data_collector = self.create_data_collector(CovidDataCollector)
27
28 def setup(self) -> None:
29 """
30 This method is called once after `create` to set up the initial state
31 of the model components.
32 """
33 self.agents.setup_agents(self.scenario.agent_num)
34 self.environment.setup_infection(self.agents)
35
36 def run(self) -> None:
37 """
38 This method defines the main simulation loop that runs for `scenario.period_num` periods.
39 """
40 for t in self.iterator(self.scenario.period_num):
41 self.environment.agents_interaction(self.agents)
42 self.environment.agents_recover(self.agents)
43 self.environment.update_population_stats(self.agents)
44 self.data_collector.collect(t)
45 self.data_collector.save()
Environment Logic
Identical to the base model. Defined in core/environment.py.
1import random
2from typing import TYPE_CHECKING
3
4from Melodie import Environment
5from .agent import CovidAgent
6from .scenario import CovidScenario
7
8if TYPE_CHECKING:
9 from Melodie import AgentList
10
11
12class CovidEnvironment(Environment):
13 scenario: "CovidScenario"
14
15 def setup(self) -> None:
16 # Macro-level counters for population statistics.
17 # These are updated each period by `update_population_stats` and recorded by the DataCollector.
18 self.num_susceptible: int = 0
19 self.num_infected: int = 0
20 self.num_recovered: int = 0
21
22 def setup_infection(self, agents: "AgentList[CovidAgent]") -> None:
23 # Sets the initial percentage of infected agents based on scenario parameters.
24 for agent in agents:
25 if (
26 agent.health_state == 0
27 and random.random() < self.scenario.initial_infected_percentage
28 ):
29 agent.health_state = 1
30
31 def agents_interaction(self, agents: "AgentList[CovidAgent]") -> None:
32 # Simulates random interactions where infected agents can spread the virus.
33 for agent in agents:
34 if agent.health_state == 1:
35 # Randomly meet another agent
36 # Note: Melodie's AgentList.random_sample returns a list
37 other_agent = agents.random_sample(1)[0]
38 if other_agent.health_state == 0:
39 if random.random() < self.scenario.infection_prob:
40 other_agent.health_state = 1
41
42 def agents_recover(self, agents: "AgentList[CovidAgent]") -> None:
43 # Triggers the recovery process for all infected agents.
44 for agent in agents:
45 agent.health_state_update(self.scenario.recovery_prob)
46
47 def update_population_stats(self, agents: "AgentList[CovidAgent]") -> None:
48 # Aggregates agent states into environment-level population counts.
49 self.num_susceptible = 0
50 self.num_infected = 0
51 self.num_recovered = 0
52
53 for agent in agents:
54 if agent.health_state == 0:
55 self.num_susceptible += 1
56 elif agent.health_state == 1:
57 self.num_infected += 1
58 elif agent.health_state == 2:
59 self.num_recovered += 1
Agent Behavior
Identical to the base model. Defined in core/agent.py.
1import random
2from Melodie import Agent
3
4
5class CovidAgent(Agent):
6 def setup(self) -> None:
7 """
8 Initializes the agent's state.
9 `health_state`: 0 = susceptible, 1 = infected, 2 = recovered.
10 """
11 self.health_state: int = 0 # All agents start as susceptible.
12
13 def health_state_update(self, recovery_prob: float) -> None:
14 """
15 Agent-level logic for transitioning from infected to recovered.
16 """
17 if self.health_state == 1 and random.random() < recovery_prob:
18 self.health_state = 2
19
Data Collection Setup
Identical to the base model. Defined in core/data_collector.py.
1from Melodie import DataCollector
2
3
4class CovidDataCollector(DataCollector):
5 def setup(self) -> None:
6 """
7 Registers which properties of agents and the environment should be recorded.
8
9 In this minimal example we record:
10 - micro-level results: each agent's ``health_state``;
11 - macro-level results: population counts on the environment
12 (``num_susceptible``, ``num_infected``, ``num_recovered``).
13 """
14 self.add_agent_property("agents", "health_state")
15 self.add_environment_property("num_susceptible")
16 self.add_environment_property("num_infected")
17 self.add_environment_property("num_recovered")
18