Tutorial

This tutorial provides a detailed explanation for an example model developed with Melodie: CovidContagion, which models the contagion process of Covid-19 in a population of agents. You can find the code of the model in this repo.

We make the following assumptions in the model:

  • Each agent has two attributes: health_state and age_group.

  • We consider four health states numbered from 0 to 3, meaning “not infected”, “infected”, “recovered”, and “dead”, respectively.

  • We consider two age groups numbered from 0 to 1, meaning “young” and “old”, respectively. A young person has a higher probability to recover from infection, and an old person has a lower probability to recover from infection.

  • A “not infected” person can be infected by a “infected” person. The probability is an exogenous parameter infection_prob. When 10% of the people are infected, we assume a “not infected” person has 0.1 probability to contact with a “infected” person, so the total infection probability is 0.1 \(\times\) infection_prob.

With these assumptions, this CovidContagion model is a minimum example of Melodie but shows a clear project structure and the use of most important modules.

Project Structure

The full structure of the project is as below, including the produced database and figures.

CovidContagion
├── data
│   ├── input
│   │   ├── SimulatorScenarios.xlsx
│   │   ├── ID_HealthState.xlsx
│   │   ├── ID_AgeGroup.xlsx
│   │   └── Parameter_AgeGroup_TransitionProb.xlsx
│   └── output
│       ├── CovidContagion.sqlite
│       ├── PopulationInfection_S0R0.png
│       └── PopulationInfection_S1R0.png
├── source
│   ├── agent.py
│   ├── environment.py
│   ├── data_collector.py
│   ├── data_info.py
│   ├── data_loader.py
│   ├── scenario.py
│   ├── model.py
│   └── analyzer.py
├── config.py
├── run_simulator.py
├── run_analyzer.py
└── readme.md

In the config.py, you can define how the input and output files are organized.

config.py
1import os
2from Melodie import Config
3
4config = Config(
5    project_name="CovidContagion",
6    project_root=os.path.dirname(__file__),
7    input_folder="data/input",
8    output_folder="data/output"
9)

If the config.project_name attribute is different, then the name of CovidContagion.sqlite will also be changed accordingly.

Agent

To create the CovidAgent class, Melodie provides the Agent class that can be inherited. In Line 6, CovidAgent.setup overrides the Agent.setup function from Melodie and will be automatically called when setting up the agent objects.

agent.py
1from Melodie import Agent
2
3
4class CovidAgent(Agent):
5
6    def setup(self):
7        self.health_state: int = 0
8        self.age_group: int = 0

The attributes of the agent should be defined in this setup function. But the values do not matter, as they will be initialized (changed) later.

Scenario

As introduced in the Introduction section, scenario contains all the input data that is needed to run the model, and can be accessed by the model, the environment, the data_collector, and each agent. All the data are stored in dataframes, which are

  • First, registered in the data_info.py;

  • Second, generated or loaded in the data_loader.py.

Generate agent_params

To initialize the two attributes of all the agents, a dataframe agent_params is first registered in the data_info.py and then generated in the data_loader.py. Each row of this dataframe contains the values of health_state and age_group to initialize one agent.

The figure below shows the first 19 rows of agent_params.

_images/agent_params.png

In the file data_info.py, agent_params is registered as an instance of the DataFrameInfo class.

data_info.py
 1import sqlalchemy
 2
 3from Melodie import DataFrameInfo
 4
 5
 6agent_params = DataFrameInfo(
 7    df_name="Parameter_AgentParams",
 8    columns={
 9        "id_scenario": sqlalchemy.Integer(),  # id of each scenario
10        "id": sqlalchemy.Integer(),  # id of each agent
11        "health_state": sqlalchemy.Integer(),
12        "age_group": sqlalchemy.Integer()
13
14    },
15)

As shown, agent_params includes an id_scenario column. This applies to the cases when agents’ attributes are scenario-dependently initialized. Melodie supports batching scenario runs and can automatically select the right part of agent_params for each scenario and initialize the agents.

This CovidContagion model is exactly an example of the case when “agents’ attributes are scenario-dependently initialized”. The values of agents’ health_state and age_group rely on two parameters of the scenario: initial_infected_percentage and young_percentage.

So, we need to write how agent_params is generated based on the scenario object. This is done in the data_loader.py file, as shown below, in Line 35-47.

data_loader.py
 1from typing import TYPE_CHECKING, Dict, Any
 2
 3import numpy as np
 4
 5from Melodie import DataLoader
 6from source import data_info
 7
 8if TYPE_CHECKING:
 9    from source.scenario import CovidScenario
10
11
12class CovidDataLoader(DataLoader):
13
14    def setup(self):
15        self.load_dataframe(data_info.simulator_scenarios)
16        self.load_dataframe(data_info.id_health_state)
17        self.load_dataframe(data_info.id_age_group)
18        self.load_dataframe(data_info.transition_prob)
19        self.generate_agent_dataframe()
20
21    @staticmethod
22    def init_health_state(scenario: "CovidScenario"):
23        state = 0
24        if np.random.uniform(0, 1) <= scenario.initial_infected_percentage:
25            state = 1
26        return state
27
28    @staticmethod
29    def init_age_group(scenario: "CovidScenario"):
30        age_group = 0
31        if np.random.uniform(0, 1) > scenario.young_percentage:
32            age_group = 1
33        return age_group
34
35    def generate_agent_dataframe(self):
36        with self.dataframe_generator(
37            data_info.agent_params, lambda scenario: scenario.agent_num
38        ) as g:
39
40            def generator_func(scenario: "CovidScenario") -> Dict[str, Any]:
41                return {
42                    "id": g.increment(),
43                    "health_state": self.init_health_state(scenario),
44                    "age_group": self.init_age_group(scenario)
45                }
46
47            g.set_row_generator(generator_func)

To generate agent_params, Melodie provides the dataframe_generator (Line 36-38), which takes three inputs:

  • data_info.agent_params (Line 37), which contains the information of agent_params.

  • lambda scenario: scenario.agent_num (Line 37), based on which, in Line 42, the g.increment function is provided by the dataframe_generator to generate the id for all the agents.

  • generator_func (Line 47), which takes the scenario object as the parameter and returns a dictionary, i.e., one row in agent_params.

The generate_agent_dataframe function is attached to CovidDataLoader.setup (Line 19). So, it is also called automatically by Melodie. Please note that, this whole agent_params dataframe is generated by the data_loader for all the scenarios before running any of them.

Load simulator_scenarios

In Line 15-18 of data_loader.py, the other input dataframes are loaded into the model. Taking simulator_scenarios as example, it includes the parameters to initialize a scenario object. Before being loaded, it also needs to be registered in the data_info.py file.

data_info.py
 1import sqlalchemy
 2
 3from Melodie import DataFrameInfo
 4
 5
 6simulator_scenarios = DataFrameInfo(
 7    df_name="simulator_scenarios",
 8    file_name="SimulatorScenarios.xlsx",
 9    columns={
10        "id": sqlalchemy.Integer(),
11        "run_num": sqlalchemy.Integer(),
12        "period_num": sqlalchemy.Integer(),
13        "agent_num": sqlalchemy.Integer(),
14        "initial_infected_percentage": sqlalchemy.Float(),
15        "young_percentage": sqlalchemy.Float(),
16        "infection_prob": sqlalchemy.Float(),
17    },
18)

The figure shows the content of simulator_scenarios.

_images/simulator_scenarios.png

Please note that,

  • First, since simulator_scenarios is “loaded” not “generated”. The attribute file_name needs to be assigned with the excel file name in the input folder (Line 8), so Melodie can find the file. But, the df_name attribute must be “simulator_scenarios” so it can be recognized by Melodie.

  • Second, since Melodie supports batching the scenario runs, simulator_scenarios can contain multiple rows for different scenarios. Besides, for each scenario, there is also a default attribute run_num, which means Melodie will run the model with this scenario for run_num times to evaluate the model uncertainty afterwards.

  • Third, the column names in the excel file must be exactly the same with the scenario attributes defined in the CovidScenario.setup function below, or an error will be thrown out.

  • Fourth, the attributes id and run_num can be ignored when defining the CovidScenario.setup function, because they are already included in the Melodie.Scenario class.

  • Finally, if the initialization of agents’ attributes is not scenario-dependent, you can also “load” a dataframe instead of generating one.

scenario.py
 1from Melodie import Scenario
 2from source import data_info
 3
 4
 5class CovidScenario(Scenario):
 6
 7    def setup(self):
 8        self.period_num: int = 0
 9        self.agent_num: int = 0
10        self.initial_infected_percentage: float = 0.0
11        self.young_percentage: float = 0.0
12        self.infection_prob: float = 0.0

Finally, as introduced in the Modelling Manager section and shown below, the CovidScenario and CovidDataLoader class variables are used to construct the simulator. So, Melodie will initialize all the scenarios defined in simulator_scenarios dataframe automatically. Then, the model will be run with these scenarios one by one.

run_simulator.py
 1from Melodie import Simulator
 2from config import config
 3from source.model import CovidModel
 4from source.scenario import CovidScenario
 5from source.data_loader import CovidDataLoader
 6
 7if __name__ == "__main__":
 8    simulator = Simulator(
 9        config=config,
10        model_cls=CovidModel,
11        scenario_cls=CovidScenario,
12        data_loader_cls=CovidDataLoader
13    )
14    simulator.run()

Model

After defining the CovidAgent and CovidScenario classes, registering and loading/generating their dataframes, and initializing the scenario object by Melodie, we are now finally ready to initialize all the agents, i.e. their health_state and age_group. This is done in the CovidModel class.

As shown below, the two functions CovidModel.create and CovidModel.setup are inherited from Melodie.Model. In Line 18, agents: "AgentList[CovidAgent]" is created by create_agent_list, then the agents’ parameters are initialized in Line 23-26, with the AgentList.setup_agents function in Melodie. As shown, the initialized scenario is already used by the model as one of its attributes.

model.py
 1from typing import TYPE_CHECKING
 2
 3from Melodie import Model
 4from source import data_info
 5from source.agent import CovidAgent
 6from source.data_collector import CovidDataCollector
 7from source.environment import CovidEnvironment
 8from source.scenario import CovidScenario
 9
10if TYPE_CHECKING:
11    from Melodie import AgentList
12
13
14class CovidModel(Model):
15    scenario: "CovidScenario"
16
17    def create(self):
18        self.agents: "AgentList[CovidAgent]" = self.create_agent_list(CovidAgent)
19        self.environment: = self.create_environment(CovidEnvironment)
20        self.data_collector = self.create_data_collector(CovidDataCollector)
21
22    def setup(self):
23        self.agents.setup_agents(
24            agents_num=self.scenario.agent_num,
25            params_df=self.scenario.get_dataframe(data_info.agent_params),
26        )
27
28    def run(self):
29        for period in self.iterator(self.scenario.period_num):
30            self.environment.agents_infection(self.agents)
31            self.environment.agents_health_state_transition(self.agents)
32            self.environment.calc_population_infection_state(self.agents)
33            self.data_collector.collect(period)
34        self.data_collector.save()

Besides, in Line 19-20, environment and data_collector are also created. But, without their own parameters, they don’t have to be initialized in the setup function. Why? In brief, because in an ABM, only the agents have micro-level attributes that cannot be easily carried by scenario.

Finally, the CovidModel.run function (Line 28) describes the timeline of the simulation, and it is called automatically when running the simulator.run above. In each period,

  • first, the environment, the coordinator of the agents’ decision-making and interaction process, “asks” the agents to infect each other;

  • second, the environment “asks” the agents to update their health states;

  • third, the environment calculates the infection state of the whole population;

  • fourth, the data_collector records the attributes’ values of the environment and the agents.

Finally, after simulating all the periods, the data_collector will save everything into the database.

Environment

The CovidEnvironment class is defined as below.

In the setup function (Line 10), four attributes are defined to save the number of agents in each health state. As shown, they are updated in the calc_population_infection_state function in each period (Line 27).

Similar to the cases in the CovidAgent and CovidScenario classes, the CovidEnvironment.setup function will also be automatically called by running CovidModel.create_environment. But, the four attributes are (macro-level) variables, not parameters. So, they are not initialized with exogenous input.

environment.py
 1from Melodie import Environment
 2from Melodie import AgentList
 3from source.agent import CovidAgent
 4from source.scenario import CovidScenario
 5
 6
 7class CovidEnvironment(Environment):
 8    scenario: "CovidScenario"
 9
10    def setup(self):
11        self.s0 = 0
12        self.s1 = 0
13        self.s2 = 0
14        self.s3 = 0
15
16    def agents_infection(self, agents: "AgentList[CovidAgent]"):
17        infection_prob = (self.s1 / self.scenario.agent_num) * self.scenario.infection_prob
18        for agent in agents:
19            if agent.health_state == 0:
20                agent.infection(infection_prob)
21
22    @staticmethod
23    def agents_health_state_transition(agents: "AgentList[CovidAgent]"):
24        for agent in agents:
25            agent.health_state_transition()
26
27    def calc_population_infection_state(self, agents: "AgentList[CovidAgent]"):
28        self.setup()
29        for agent in agents:
30            if agent.health_state == 0:
31                self.s0 += 1
32            elif agent.health_state == 1:
33                self.s1 += 1
34            elif agent.health_state == 2:
35                self.s2 += 1
36            else:
37                self.s3 += 1

As shown in the agents_infection function, the environment has access to scenario and can get necessary data.

Besides, as introduced in the Melodie Framework section, the environment coordinates the agents’ decision-making and interaction processes. This is why, in the model.run function, the functions of environment are called instead of the agents being called directly.

So, corresponding to the functions agents_infection and agents_health_state_transition in the CovidEnvironment, we need to define the infection and health_state_transition functions in the CovidAgent class as below.

agent.py
 1from Melodie import Agent
 2
 3
 4class CovidAgent(Agent):
 5
 6        def setup(self):
 7            self.health_state: int = 0
 8            self.age_group: int = 0
 9
10        def infection(self, infection_prob: float):
11            if random.uniform(0, 1) <= infection_prob:
12                self.health_state = 1
13
14        def health_state_transition(self):
15            if self.health_state == 1:
16                transition_probs: dict = self.scenario.get_transition_probs(self.age_group)
17                rand = random.uniform(0, 1)
18                if rand <= transition_probs["s1_s1"]:
19                    pass
20                elif transition_probs["s1_s1"] < rand <= transition_probs["s1_s1"] + transition_probs["s1_s2"]:
21                    self.health_state = 2
22                else:
23                    self.health_state = 3

As shown in the health_state_transition function, the agent also has access to scenario and can get necessary data.

On the other side, the CovidScenario class needs to prepare the data in a structure that is easy to use, as shown in the function setup_transition_probs below (Line 14). Besides, Melodie.Scenario has a function get_dataframe to read registered and loaded dataframes from the database (Line 15). The data_info.transition_prob refers to an input table as below.

_images/transition_probs.png

The corresponding code in the CovidScenario class is as follows.

scenario.py
 1from Melodie import Scenario
 2from source import data_info
 3
 4
 5class CovidScenario(Scenario):
 6
 7    def setup(self):
 8        self.period_num: int = 0
 9        self.agent_num: int = 0
10        self.initial_infected_percentage: float = 0.0
11        self.young_percentage: float = 0.0
12        self.infection_prob: float = 0.0
13
14    def setup_transition_probs(self):
15        df = self.get_dataframe(data_info.transition_prob)
16        self.transition_probs = {
17            0: {
18                "s1_s1": df.at[0, "prob_s1_s1"],
19                "s1_s2": df.at[0, "prob_s1_s2"],
20                "s1_s3": df.at[0, "prob_s1_s3"],
21            },
22            1: {
23                "s1_s1": df.at[1, "prob_s1_s1"],
24                "s1_s2": df.at[1, "prob_s1_s2"],
25                "s1_s3": df.at[1, "prob_s1_s3"],
26            }
27        }
28
29    def get_transition_probs(self, id_age_group: int):
30        return self.transition_probs[id_age_group]

In summary, the idea of the Scenario class in the Melodie framework is

  • to use it as the channel for other objects accessing input data;

  • to easily iterate through a batch of scenarios.

If you recall the Scenario Cluster introduced in the Melodie Framework section, the Scenario and DataLoader classes focus on formatting, importing, and delivering the input data to the model. The DataFrameInfo and MatrixInfo are just pre-defined data structure to store the information of the input data, so that the functions of Scenario and DataLoader can work with the data easily.

DataCollector

Finally, to collect all the micro- and macro-level results stored by the agents and the environment and save them into the database, the CovidDataCollector class is defined as below.

data_collector.py
 1from Melodie import DataCollector
 2
 3
 4class CovidDataCollector(DataCollector):
 5    def setup(self):
 6        self.add_agent_property("agents", "health_state")
 7        self.add_environment_property("s0")
 8        self.add_environment_property("s1")
 9        self.add_environment_property("s2")
10        self.add_environment_property("s3")

The two functions add_agent_property and add_environment_property are provided by Melodie. For add_agent_property, we should also pass in the name of the agent list, so the data_collector knows which agent list to look at. In some ABMs, there can be multiple agent lists (e.g., wolves, sheep, etc.).

With the data_collector, the results will saved in the CovidContagion.sqlite file in a pre-defined format. The macro-level results are indexed with id_scenario, id_run, and period. The micro-level results are further indexed with the id of agents. After running the model, you can find two tables in CovidContagion.sqlite named as environment_result and agents_result.

Example of environment_result:

_images/environment_result.png

Example of agents_result:

_images/agents_result.png

In the example project, we also prepared a simple analyzer.py file that produces two figures based on the results showing the population of different health_state. The two figures will be saved in the data/output folder together with the .sqlite file. Similarly, users can also define other post-processing functions in the analyzer.py file.

Since it is mainly based on other packages instead of Melodie, we won’t introduce the details here. Here is an example of the results from the model.

_images/population_infection.png

Last Words

If the Melodie Framework section was too brief to follow, I hope this tutorial can give you a clearer picture about (1) why the modules are organized into those clusters, and (2) how they fit together.

As said before, for simplicity, not all the modules are used in this example, but it does show a clear structure of an ABM developed with Melodie. You can find more examples using other modules in the Model Gallery section.

So, that’s it :)

We really hope this tutorial is clear and useful, and most importantly, brings you the interest to join the ABM community! If you have any questions, please don’t hesitate to contact us!