Tutorial
This tutorial provides a detailed explanation for an example model developed with Melodie: CovidContagion, which models the contagion process of Covid-19 in a population of agents. You can find the code of the model in this repo.
We make the following assumptions in the model:
Each agent has two attributes:
health_state
andage_group
.We consider four health states numbered from 0 to 3, meaning “not infected”, “infected”, “recovered”, and “dead”, respectively.
We consider two age groups numbered from 0 to 1, meaning “young” and “old”, respectively. A young person has a higher probability to recover from infection, and an old person has a lower probability to recover from infection.
A “not infected” person can be infected by a “infected” person. The probability is an exogenous parameter
infection_prob
. When 10% of the people are infected, we assume a “not infected” person has 0.1 probability to contact with a “infected” person, so the total infection probability is 0.1 \(\times\)infection_prob
.
With these assumptions, this CovidContagion model is a minimum example of Melodie but shows a clear project structure and the use of most important modules.
Project Structure
The full structure of the project is as below, including the produced database and figures.
CovidContagion
├── data
│ ├── input
│ │ ├── SimulatorScenarios.xlsx
│ │ ├── ID_HealthState.xlsx
│ │ ├── ID_AgeGroup.xlsx
│ │ └── Parameter_AgeGroup_TransitionProb.xlsx
│ └── output
│ ├── CovidContagion.sqlite
│ ├── PopulationInfection_S0R0.png
│ └── PopulationInfection_S1R0.png
├── source
│ ├── agent.py
│ ├── environment.py
│ ├── data_collector.py
│ ├── data_info.py
│ ├── data_loader.py
│ ├── scenario.py
│ ├── model.py
│ └── analyzer.py
├── config.py
├── run_simulator.py
├── run_analyzer.py
└── readme.md
In the config.py
, you can define how the input and output files are organized.
1import os
2from Melodie import Config
3
4config = Config(
5 project_name="CovidContagion",
6 project_root=os.path.dirname(__file__),
7 input_folder="data/input",
8 output_folder="data/output"
9)
If the config.project_name
attribute is different,
then the name of CovidContagion.sqlite
will also be changed accordingly.
Agent
To create the CovidAgent
class, Melodie
provides the Agent
class that can be inherited.
In Line 6, CovidAgent.setup
overrides the Agent.setup
function from Melodie
and will be automatically called when setting up the agent objects.
1from Melodie import Agent
2
3
4class CovidAgent(Agent):
5
6 def setup(self):
7 self.health_state: int = 0
8 self.age_group: int = 0
The attributes of the agent should be defined in this setup
function.
But the values do not matter, as they will be initialized (changed) later.
Scenario
As introduced in the Introduction section, scenario
contains all the input data that is needed to run the model,
and can be accessed by the model
, the environment
, the data_collector
, and each agent
.
All the data are stored in dataframes, which are
First, registered in the
data_info.py
;Second, generated or loaded in the
data_loader.py
.
Generate agent_params
To initialize the two attributes of all the agents, a dataframe agent_params
is first registered in the data_info.py
and then generated in the data_loader.py
.
Each row of this dataframe contains the values of health_state
and age_group
to initialize one agent.
The figure below shows the first 19 rows of agent_params
.
In the file data_info.py
, agent_params
is registered as an instance of the DataFrameInfo
class.
1import sqlalchemy
2
3from Melodie import DataFrameInfo
4
5
6agent_params = DataFrameInfo(
7 df_name="Parameter_AgentParams",
8 columns={
9 "id_scenario": sqlalchemy.Integer(), # id of each scenario
10 "id": sqlalchemy.Integer(), # id of each agent
11 "health_state": sqlalchemy.Integer(),
12 "age_group": sqlalchemy.Integer()
13
14 },
15)
As shown, agent_params
includes an id_scenario
column.
This applies to the cases when agents’ attributes are scenario-dependently initialized.
Melodie
supports batching scenario runs and can automatically select the right part of agent_params
for each scenario and initialize the agents.
This CovidContagion model is exactly an example of the case when “agents’ attributes are scenario-dependently initialized”.
The values of agents’ health_state
and age_group
rely on two parameters of the scenario:
initial_infected_percentage
and young_percentage
.
So, we need to write how agent_params
is generated based on the scenario
object.
This is done in the data_loader.py
file, as shown below, in Line 35-47.
1from typing import TYPE_CHECKING, Dict, Any
2
3import numpy as np
4
5from Melodie import DataLoader
6from source import data_info
7
8if TYPE_CHECKING:
9 from source.scenario import CovidScenario
10
11
12class CovidDataLoader(DataLoader):
13
14 def setup(self):
15 self.load_dataframe(data_info.simulator_scenarios)
16 self.load_dataframe(data_info.id_health_state)
17 self.load_dataframe(data_info.id_age_group)
18 self.load_dataframe(data_info.transition_prob)
19 self.generate_agent_dataframe()
20
21 @staticmethod
22 def init_health_state(scenario: "CovidScenario"):
23 state = 0
24 if np.random.uniform(0, 1) <= scenario.initial_infected_percentage:
25 state = 1
26 return state
27
28 @staticmethod
29 def init_age_group(scenario: "CovidScenario"):
30 age_group = 0
31 if np.random.uniform(0, 1) > scenario.young_percentage:
32 age_group = 1
33 return age_group
34
35 def generate_agent_dataframe(self):
36 with self.dataframe_generator(
37 data_info.agent_params, lambda scenario: scenario.agent_num
38 ) as g:
39
40 def generator_func(scenario: "CovidScenario") -> Dict[str, Any]:
41 return {
42 "id": g.increment(),
43 "health_state": self.init_health_state(scenario),
44 "age_group": self.init_age_group(scenario)
45 }
46
47 g.set_row_generator(generator_func)
To generate agent_params
, Melodie
provides the dataframe_generator
(Line 36-38), which takes three inputs:
data_info.agent_params
(Line 37), which contains the information ofagent_params
.lambda scenario: scenario.agent_num
(Line 37), based on which, in Line 42, theg.increment
function is provided by thedataframe_generator
to generate theid
for all the agents.generator_func
(Line 47), which takes thescenario
object as the parameter and returns a dictionary, i.e., one row inagent_params
.
The generate_agent_dataframe
function is attached to CovidDataLoader.setup
(Line 19).
So, it is also called automatically by Melodie
.
Please note that, this whole agent_params
dataframe is generated by the data_loader
for all the scenarios before running any of them.
Load simulator_scenarios
In Line 15-18 of data_loader.py
, the other input dataframes are loaded into the model.
Taking simulator_scenarios
as example, it includes the parameters to initialize a scenario
object.
Before being loaded, it also needs to be registered in the data_info.py
file.
1import sqlalchemy
2
3from Melodie import DataFrameInfo
4
5
6simulator_scenarios = DataFrameInfo(
7 df_name="simulator_scenarios",
8 file_name="SimulatorScenarios.xlsx",
9 columns={
10 "id": sqlalchemy.Integer(),
11 "run_num": sqlalchemy.Integer(),
12 "period_num": sqlalchemy.Integer(),
13 "agent_num": sqlalchemy.Integer(),
14 "initial_infected_percentage": sqlalchemy.Float(),
15 "young_percentage": sqlalchemy.Float(),
16 "infection_prob": sqlalchemy.Float(),
17 },
18)
The figure shows the content of simulator_scenarios
.
Please note that,
First, since
simulator_scenarios
is “loaded” not “generated”. The attributefile_name
needs to be assigned with the excel file name in the input folder (Line 8), soMelodie
can find the file. But, thedf_name
attribute must be “simulator_scenarios” so it can be recognized byMelodie
.Second, since
Melodie
supports batching the scenario runs,simulator_scenarios
can contain multiple rows for different scenarios. Besides, for each scenario, there is also a default attributerun_num
, which meansMelodie
will run the model with this scenario forrun_num
times to evaluate the model uncertainty afterwards.Third, the column names in the excel file must be exactly the same with the scenario attributes defined in the
CovidScenario.setup
function below, or an error will be thrown out.Fourth, the attributes
id
andrun_num
can be ignored when defining theCovidScenario.setup
function, because they are already included in theMelodie.Scenario
class.Finally, if the initialization of agents’ attributes is not scenario-dependent, you can also “load” a dataframe instead of generating one.
1from Melodie import Scenario
2from source import data_info
3
4
5class CovidScenario(Scenario):
6
7 def setup(self):
8 self.period_num: int = 0
9 self.agent_num: int = 0
10 self.initial_infected_percentage: float = 0.0
11 self.young_percentage: float = 0.0
12 self.infection_prob: float = 0.0
Finally, as introduced in the Modelling Manager section and shown below,
the CovidScenario
and CovidDataLoader
class variables are used to construct the simulator
.
So, Melodie
will initialize all the scenarios defined in simulator_scenarios
dataframe automatically.
Then, the model will be run with these scenarios one by one.
1from Melodie import Simulator
2from config import config
3from source.model import CovidModel
4from source.scenario import CovidScenario
5from source.data_loader import CovidDataLoader
6
7if __name__ == "__main__":
8 simulator = Simulator(
9 config=config,
10 model_cls=CovidModel,
11 scenario_cls=CovidScenario,
12 data_loader_cls=CovidDataLoader
13 )
14 simulator.run()
Model
After defining the CovidAgent
and CovidScenario
classes, registering and loading/generating their dataframes,
and initializing the scenario
object by Melodie
,
we are now finally ready to initialize all the agents, i.e. their health_state
and age_group
.
This is done in the CovidModel
class.
As shown below, the two functions CovidModel.create
and CovidModel.setup
are inherited from Melodie.Model
.
In Line 18, agents: "AgentList[CovidAgent]"
is created by create_agent_list
,
then the agents’ parameters are initialized in Line 23-26, with the AgentList.setup_agents
function in Melodie
.
As shown, the initialized scenario
is already used by the model as one of its attributes.
1from typing import TYPE_CHECKING
2
3from Melodie import Model
4from source import data_info
5from source.agent import CovidAgent
6from source.data_collector import CovidDataCollector
7from source.environment import CovidEnvironment
8from source.scenario import CovidScenario
9
10if TYPE_CHECKING:
11 from Melodie import AgentList
12
13
14class CovidModel(Model):
15 scenario: "CovidScenario"
16
17 def create(self):
18 self.agents: "AgentList[CovidAgent]" = self.create_agent_list(CovidAgent)
19 self.environment: = self.create_environment(CovidEnvironment)
20 self.data_collector = self.create_data_collector(CovidDataCollector)
21
22 def setup(self):
23 self.agents.setup_agents(
24 agents_num=self.scenario.agent_num,
25 params_df=self.scenario.get_dataframe(data_info.agent_params),
26 )
27
28 def run(self):
29 for period in self.iterator(self.scenario.period_num):
30 self.environment.agents_infection(self.agents)
31 self.environment.agents_health_state_transition(self.agents)
32 self.environment.calc_population_infection_state(self.agents)
33 self.data_collector.collect(period)
34 self.data_collector.save()
Besides, in Line 19-20, environment
and data_collector
are also created.
But, without their own parameters, they don’t have to be initialized in the setup
function. Why?
In brief, because in an ABM, only the agents have micro-level attributes that cannot be easily carried by scenario
.
Finally, the CovidModel.run
function (Line 28) describes the timeline of the simulation,
and it is called automatically when running the simulator.run
above.
In each period,
first, the
environment
, the coordinator of the agents’ decision-making and interaction process, “asks” theagents
to infect each other;second, the
environment
“asks” theagents
to update their health states;third, the
environment
calculates the infection state of the whole population;fourth, the
data_collector
records the attributes’ values of theenvironment
and theagents
.
Finally, after simulating all the periods, the data_collector
will save everything into the database.
Environment
The CovidEnvironment class is defined as below.
In the setup
function (Line 10), four attributes are defined to save the number of agents in each health state.
As shown, they are updated in the calc_population_infection_state
function in each period (Line 27).
Similar to the cases in the CovidAgent
and CovidScenario
classes,
the CovidEnvironment.setup
function will also be automatically called by running CovidModel.create_environment
.
But, the four attributes are (macro-level) variables, not parameters.
So, they are not initialized with exogenous input.
1from Melodie import Environment
2from Melodie import AgentList
3from source.agent import CovidAgent
4from source.scenario import CovidScenario
5
6
7class CovidEnvironment(Environment):
8 scenario: "CovidScenario"
9
10 def setup(self):
11 self.s0 = 0
12 self.s1 = 0
13 self.s2 = 0
14 self.s3 = 0
15
16 def agents_infection(self, agents: "AgentList[CovidAgent]"):
17 infection_prob = (self.s1 / self.scenario.agent_num) * self.scenario.infection_prob
18 for agent in agents:
19 if agent.health_state == 0:
20 agent.infection(infection_prob)
21
22 @staticmethod
23 def agents_health_state_transition(agents: "AgentList[CovidAgent]"):
24 for agent in agents:
25 agent.health_state_transition()
26
27 def calc_population_infection_state(self, agents: "AgentList[CovidAgent]"):
28 self.setup()
29 for agent in agents:
30 if agent.health_state == 0:
31 self.s0 += 1
32 elif agent.health_state == 1:
33 self.s1 += 1
34 elif agent.health_state == 2:
35 self.s2 += 1
36 else:
37 self.s3 += 1
As shown in the agents_infection
function, the environment
has access to scenario
and can get necessary data.
Besides, as introduced in the Melodie Framework section,
the environment
coordinates the agents’ decision-making and interaction processes.
This is why, in the model.run
function, the functions of environment
are called instead of the agents
being called directly.
So, corresponding to the functions agents_infection
and agents_health_state_transition
in the CovidEnvironment
,
we need to define the infection
and health_state_transition
functions in the CovidAgent
class as below.
1from Melodie import Agent
2
3
4class CovidAgent(Agent):
5
6 def setup(self):
7 self.health_state: int = 0
8 self.age_group: int = 0
9
10 def infection(self, infection_prob: float):
11 if random.uniform(0, 1) <= infection_prob:
12 self.health_state = 1
13
14 def health_state_transition(self):
15 if self.health_state == 1:
16 transition_probs: dict = self.scenario.get_transition_probs(self.age_group)
17 rand = random.uniform(0, 1)
18 if rand <= transition_probs["s1_s1"]:
19 pass
20 elif transition_probs["s1_s1"] < rand <= transition_probs["s1_s1"] + transition_probs["s1_s2"]:
21 self.health_state = 2
22 else:
23 self.health_state = 3
As shown in the health_state_transition
function, the agent
also has access to scenario
and can get necessary data.
On the other side, the CovidScenario
class needs to prepare the data in a structure that is easy to use,
as shown in the function setup_transition_probs
below (Line 14).
Besides, Melodie.Scenario
has a function get_dataframe
to read registered and loaded dataframes from the database (Line 15).
The data_info.transition_prob
refers to an input table as below.
The corresponding code in the CovidScenario
class is as follows.
1from Melodie import Scenario
2from source import data_info
3
4
5class CovidScenario(Scenario):
6
7 def setup(self):
8 self.period_num: int = 0
9 self.agent_num: int = 0
10 self.initial_infected_percentage: float = 0.0
11 self.young_percentage: float = 0.0
12 self.infection_prob: float = 0.0
13
14 def setup_transition_probs(self):
15 df = self.get_dataframe(data_info.transition_prob)
16 self.transition_probs = {
17 0: {
18 "s1_s1": df.at[0, "prob_s1_s1"],
19 "s1_s2": df.at[0, "prob_s1_s2"],
20 "s1_s3": df.at[0, "prob_s1_s3"],
21 },
22 1: {
23 "s1_s1": df.at[1, "prob_s1_s1"],
24 "s1_s2": df.at[1, "prob_s1_s2"],
25 "s1_s3": df.at[1, "prob_s1_s3"],
26 }
27 }
28
29 def get_transition_probs(self, id_age_group: int):
30 return self.transition_probs[id_age_group]
In summary, the idea of the Scenario
class in the Melodie
framework is
to use it as the channel for other objects accessing input data;
to easily iterate through a batch of scenarios.
If you recall the Scenario Cluster introduced in the Melodie Framework section,
the Scenario
and DataLoader
classes focus on formatting, importing, and delivering the input data to the model.
The DataFrameInfo
and MatrixInfo
are just pre-defined data structure to store the information of the input data,
so that the functions of Scenario
and DataLoader
can work with the data easily.
DataCollector
Finally, to collect all the micro- and macro-level results stored by the agents
and the environment
and save them into the database,
the CovidDataCollector
class is defined as below.
1from Melodie import DataCollector
2
3
4class CovidDataCollector(DataCollector):
5 def setup(self):
6 self.add_agent_property("agents", "health_state")
7 self.add_environment_property("s0")
8 self.add_environment_property("s1")
9 self.add_environment_property("s2")
10 self.add_environment_property("s3")
The two functions add_agent_property
and add_environment_property
are provided by Melodie
.
For add_agent_property
, we should also pass in the name of the agent list,
so the data_collector
knows which agent list to look at.
In some ABMs, there can be multiple agent lists (e.g., wolves, sheep, etc.).
With the data_collector
, the results will saved in the CovidContagion.sqlite
file in a pre-defined format.
The macro-level results are indexed with id_scenario
, id_run
, and period
.
The micro-level results are further indexed with the id
of agents. After running the model,
you can find two tables in CovidContagion.sqlite
named as environment_result
and agents_result
.
Example of environment_result
:
Example of agents_result
:
In the example project, we also prepared a simple analyzer.py
file that produces two figures based on the results
showing the population of different health_state
.
The two figures will be saved in the data/output
folder together with the .sqlite
file.
Similarly, users can also define other post-processing functions in the analyzer.py
file.
Since it is mainly based on other packages instead of Melodie
, we won’t introduce the details here.
Here is an example of the results from the model.
Last Words
If the Melodie Framework section was too brief to follow, I hope this tutorial can give you a clearer picture about (1) why the modules are organized into those clusters, and (2) how they fit together.
As said before, for simplicity, not all the modules are used in this example,
but it does show a clear structure of an ABM developed with Melodie
.
You can find more examples using other modules in the Model Gallery section.
So, that’s it :)
We really hope this tutorial is clear and useful, and most importantly, brings you the interest to join the ABM community! If you have any questions, please don’t hesitate to contact us!