CovidContagionCalibrator

This example demonstrates how to use Melodie’s Calibrator module. It extends the base covid_contagion model by using a genetic algorithm (GA) to find the value of the infection_prob parameter that results in a target final infected ratio.

Calibrator: Project Structure

examples/covid_contagion_calibrator
│   ├── core/
│   │   ├── calibrator.py
│   │   ├── scenario.py
│   │   ├── data_collector.py
│   │   └── model.py
│   ├── data/
│   │   ├── CalibratorScenarios.csv
│   │   └── CalibratorParamsScenarios.csv
│   └── output/
└── main.py

Calibrator: Key Changes

The primary changes for the calibrator example are:

  • A ``calibrator.py`` Module: This new file defines the CovidCalibrator class, which inherits from Melodie.Calibrator.

  • Calibrator Scenarios: The input data now includes CalibratorScenarios.csv and CalibratorParamsScenarios.csv to control the calibration process. CalibratorScenarios.csv is similar to SimulatorScenarios but does not use a run_num column.

  • Path Modification in ``main.py``: A special modification to sys.path is included. This is necessary because the Calibrator uses multiprocessing for parallel execution. Worker processes are spawned in a new environment and need to be able to find and import the project’s modules (e.g., from examples.covid_contagion_calibrator.core…). This path modification is a robust way to ensure modules are discoverable when running examples directly from the Melodie repository, which is not a standard installed package.

Calibrator: GA Concepts

The primary goal of the Calibrator is to automatically tune one or more model parameters to match observed, real-world data. It treats this as an optimization problem: finding the set of parameters that minimizes the “distance” between the model’s output and a specified target.

Core Idea: Minimizing Distance

  • Target: You define a target outcome. In this example, the target is for the final proportion of the population that has been infected to be 80%. This target is hard-coded in the distance function.

  • Parameters: You specify which scenario parameter(s) the Calibrator should adjust. Here, it’s the infection_prob.

  • Distance Function: You must implement a distance(model) method. This function is the core of the calibration. It runs after a simulation is complete, calculates a metric from the model’s final state, and returns a single float value representing how far off that metric is from the target. The Calibrator’s goal is to make this value as close to zero as possible.

Genetic Algorithm (GA) for Optimization

Melodie uses a genetic algorithm to search the parameter space efficiently.

  • Chromosome: A single set of the parameters being calibrated (in this case, just a single value for infection_prob) is treated as a “chromosome.”

  • Population: The GA starts with a “population” of these chromosomes, i.e., many different random values for infection_prob.

  • Fitness: For each chromosome, the model is run, and the distance function is calculated. This distance is the fitness score (where lower is better).

  • Evolution: The GA then proceeds through “generations.” In each generation, it selects the best-performing chromosomes (those that produced the smallest distance), “breeds” them (crossover), and introduces random changes (mutation) to create a new population of parameter sets to test. This process iteratively converges toward the parameter set that produces the best fit to the target.

Parameter Encoding: From Float to Binary

A key aspect of the genetic algorithm is how it represents continuous parameters like infection_prob (a float between 0.0 and 1.0). The GA does not work with the float values directly; instead, it converts them into a binary string of a fixed length. This process is called discretization.

  • Encoding: The range of a parameter (e.g., from infection_prob_min to infection_prob_max) is mapped to a sequence of integers, which are then represented as binary numbers. For instance, a float value is converted into a binary string like 01101.

  • GA Operations: All genetic operations, such as crossover (swapping parts of two binary strings) and mutation (flipping a random bit, e.g., 01101 -> 01111), are performed on these binary representations.

  • Decoding: When a simulation needs to be run, the binary string is decoded back into its corresponding float value. This is done in three steps:
    1. The binary string is converted to its decimal integer representation.

    2. This integer is normalized to a float between 0.0 and 1.0 by dividing it by the maximum possible integer for that bit length (which is 2length - 1).

    3. This normalized float is linearly mapped to the parameter’s defined range (e.g., from infection_prob_min to infection_prob_max).

  • Precision: The strategy_param_code_length parameter in the .csv file determines the length of this binary string. A longer string allows for a more fine-grained representation of the parameter space (more steps between the min and max bounds), thus offering higher precision at the cost of a larger search space.

Parameter Configuration (`CalibratorParamsScenarios.csv`)

This file controls the behavior of the genetic algorithm:

  • id: Unique identifier for a set of GA parameters.

  • path_num: How many independent calibration processes (paths) to run. Each path is a complete run of the GA from a random start, which helps ensure the result is robust and not just a local minimum.

  • generation_num: The number of generations the GA will run for in each path.

  • strategy_population: The size of the population in each generation (how many different parameter sets are tested).

  • mutation_prob: The probability of a random mutation occurring during breeding.

  • strategy_param_code_length: The precision of the parameters, defined by the bit-length of the chromosome.

  • infection_prob_min, infection_prob_max: The lower and upper bounds for the infection_prob parameter search space.

Calibrator: Input Data

In this example, the distance is calculated based on the model’s state at the end of the simulation. If you need to use data from all periods to calculate the target metric, you can compute and store the required value in an environment property throughout the simulation, ensuring it holds the final desired value at the last period.

Calibrator: Running the Model

You can run the calibrator using the main script:

python examples/covid_contagion_calibrator/main.py

This will execute the genetic algorithm, running multiple simulations in parallel to find the optimal infection_prob. The results, including the progression of parameters and distances across generations, are saved to the data/output folder.

Parallel Execution Mode

The Calibrator supports two parallelization modes, controlled by the parallel_mode parameter when creating the calibrator instance:

  • ``parallel_mode=”process”`` (default): Uses subprocess-based parallelism via multiprocessing. This is the traditional approach and works on all Python versions. It is recommended for most use cases.

  • ``parallel_mode=”thread”``: Uses thread-based parallelism via ThreadPoolExecutor. This mode is recommended for Python 3.13+ (free-threaded/No-GIL builds) as it can provide better performance by avoiding the overhead of process creation and data serialization. In older Python versions, this mode will still run but may be limited by the Global Interpreter Lock (GIL).

You can specify the mode when creating the calibrator:

calibrator = CovidCalibrator(
    config=cfg,
    model_cls=CovidModel,
    scenario_cls=CovidScenario,
    processors=8,
    parallel_mode="thread",  # or "process" (default)
)

Calibrator: Code

This section shows the key code implementation for the calibrator model. Files that are identical to the base covid_contagion model are noted.

Calibrator Definition

Defined in core/calibrator.py.

 1import os
 2from typing import List
 3
 4import pandas as pd
 5from Melodie import Calibrator
 6
 7from examples.covid_contagion_calibrator.core.model import CovidModel
 8from examples.covid_contagion_calibrator.core.scenario import CovidScenario
 9
10
11class CovidCalibrator(Calibrator):
12    """
13    Simple calibrator that tunes `infection_prob` so the final infected ratio matches a target.
14    
15    In this example, the target infected ratio is hardcoded to 0.8 inside the `distance` method.
16    The calibrator minimizes the squared difference between the actual infected ratio and this target.
17    """
18
19    scenario_cls: type[CovidScenario]
20    model_cls: type[CovidModel]
21
22    def setup(self) -> None:
23        """
24        Setup the calibrator by defining which properties to tune and which to record.
25        """
26        # Calibrate the `infection_prob` parameter in the scenario.
27        # This tells the Genetic Algorithm (GA) to optimize this specific attribute.
28        self.add_scenario_calibrating_property("infection_prob")
29        
30        # Record environment-level data for analysis.
31        # 'num_susceptible' will be recorded in `Result_Calibrator_Environment.csv`
32        # allowing us to verify the calibration result.
33        self.add_environment_property("num_susceptible")
34
35    def distance(self, model: CovidModel) -> float:
36        """
37        Calculates the distance (error) between the model's result and the target.
38        The GA minimizes this value.
39        """
40        env = model.environment
41        
42        # Calculate the ratio of agents who were infected (including recovered)
43        # 1 - (susceptible / total)
44        infected_ratio = 1 - env.num_susceptible / env.scenario.agent_num
45        
46        # Return squared error. 
47        # Target is hardcoded as 0.8 (80% infection rate).
48        return (infected_ratio - 0.8) ** 2
49

Scenario Definition

Defined in core/scenario.py.

 1from Melodie import Scenario
 2
 3
 4class CovidScenario(Scenario):
 5    """
 6    Defines the parameters for each simulation run of the virus contagion model.
 7    It inherits from the base `Melodie.Scenario`.
 8
 9    Special Melodie attributes (automatically populated from `SimulatorScenarios`):
10    - `id`: A unique identifier for the scenario.
11    - `run_num`: How many times to repeat this scenario (for stochastic models). Defaults to 1.
12    - `period_num`: How many simulation steps (periods) for each run. Defaults to 0.
13    """
14
15    def setup(self) -> None:
16        """
17        Declares custom scenario parameters for type hinting and clarity.
18        These are automatically populated from columns in `SimulatorScenarios`.
19        """
20        self.agent_num: int = 0
21        self.initial_infected_percentage: float = 0.0
22        self.infection_prob: float = 0.0
23        self.recovery_prob: float = 0.0
24
25    def load_data(self) -> None:
26        """
27        This method is automatically called by Melodie after scenario parameters are set.
28        It is the recommended place to load all static input dataframes, making them
29        accessible via `self.scenario.*` from `Model`, `Agent`, and `Environment`.
30        """
31        self.health_states = self.load_dataframe("ID_HealthState.csv")

Model Structure

Identical to the base model. Defined in core/model.py.

 1from typing import TYPE_CHECKING
 2
 3from Melodie import Model
 4from .agent import CovidAgent
 5from .environment import CovidEnvironment
 6from .data_collector import CovidDataCollector
 7from .scenario import CovidScenario
 8
 9if TYPE_CHECKING:
10    from Melodie import AgentList
11
12
13class CovidModel(Model):
14    scenario: "CovidScenario"
15    agents: "AgentList[CovidAgent]"
16    environment: CovidEnvironment
17    data_collector: CovidDataCollector
18
19    def create(self) -> None:
20        """
21        This method is called once at the beginning of the simulation
22        to create the core components of the model.
23        """
24        self.agents = self.create_agent_list(CovidAgent)
25        self.environment = self.create_environment(CovidEnvironment)
26        self.data_collector = self.create_data_collector(CovidDataCollector)
27
28    def setup(self) -> None:
29        """
30        This method is called once after `create` to set up the initial state
31        of the model components.
32        """
33        self.agents.setup_agents(self.scenario.agent_num)
34        self.environment.setup_infection(self.agents)
35
36    def run(self) -> None:
37        """
38        This method defines the main simulation loop that runs for `scenario.period_num` periods.
39        """
40        for t in self.iterator(self.scenario.period_num):
41            self.environment.agents_interaction(self.agents)
42            self.environment.agents_recover(self.agents)
43            self.environment.update_population_stats(self.agents)
44            self.data_collector.collect(t)
45        self.data_collector.save()

Environment Logic

Identical to the base model. Defined in core/environment.py.

 1import random
 2from typing import TYPE_CHECKING
 3
 4from Melodie import Environment
 5from .agent import CovidAgent
 6from .scenario import CovidScenario
 7
 8if TYPE_CHECKING:
 9    from Melodie import AgentList
10
11
12class CovidEnvironment(Environment):
13    scenario: "CovidScenario"
14
15    def setup(self) -> None:
16        # Macro-level counters for population statistics.
17        # These are updated each period by `update_population_stats` and recorded by the DataCollector.
18        self.num_susceptible: int = 0
19        self.num_infected: int = 0
20        self.num_recovered: int = 0
21
22    def setup_infection(self, agents: "AgentList[CovidAgent]") -> None:
23        # Sets the initial percentage of infected agents based on scenario parameters.
24        for agent in agents:
25            if (
26                agent.health_state == 0
27                and random.random() < self.scenario.initial_infected_percentage
28            ):
29                agent.health_state = 1
30
31    def agents_interaction(self, agents: "AgentList[CovidAgent]") -> None:
32        # Simulates random interactions where infected agents can spread the virus.
33        for agent in agents:
34            if agent.health_state == 1:
35                # Randomly meet another agent
36                # Note: Melodie's AgentList.random_sample returns a list
37                other_agent = agents.random_sample(1)[0]
38                if other_agent.health_state == 0:
39                    if random.random() < self.scenario.infection_prob:
40                        other_agent.health_state = 1
41    
42    def agents_recover(self, agents: "AgentList[CovidAgent]") -> None:
43        # Triggers the recovery process for all infected agents.
44        for agent in agents:
45            agent.health_state_update(self.scenario.recovery_prob)
46
47    def update_population_stats(self, agents: "AgentList[CovidAgent]") -> None:
48        # Aggregates agent states into environment-level population counts.
49        self.num_susceptible = 0
50        self.num_infected = 0
51        self.num_recovered = 0
52
53        for agent in agents:
54            if agent.health_state == 0:
55                self.num_susceptible += 1
56            elif agent.health_state == 1:
57                self.num_infected += 1
58            elif agent.health_state == 2:
59                self.num_recovered += 1

Agent Behavior

Identical to the base model. Defined in core/agent.py.

 1import random
 2from Melodie import Agent
 3
 4
 5class CovidAgent(Agent):
 6    def setup(self) -> None:
 7        """
 8        Initializes the agent's state.
 9        `health_state`: 0 = susceptible, 1 = infected, 2 = recovered.
10        """
11        self.health_state: int = 0  # All agents start as susceptible.
12
13    def health_state_update(self, recovery_prob: float) -> None:
14        """
15        Agent-level logic for transitioning from infected to recovered.
16        """
17        if self.health_state == 1 and random.random() < recovery_prob:
18            self.health_state = 2
19

Data Collection Setup

Identical to the base model. Defined in core/data_collector.py.

 1from Melodie import DataCollector
 2
 3
 4class CovidDataCollector(DataCollector):
 5    def setup(self) -> None:
 6        """
 7        Registers which properties of agents and the environment should be recorded.
 8
 9        In this minimal example we record:
10        - micro-level results: each agent's ``health_state``;
11        - macro-level results: population counts on the environment
12          (``num_susceptible``, ``num_infected``, ``num_recovered``).
13        """
14        self.add_agent_property("agents", "health_state")
15        self.add_environment_property("num_susceptible")
16        self.add_environment_property("num_infected")
17        self.add_environment_property("num_recovered")
18