Converging Initiatives: Geospatial Insights into Community Health, Agriculture, and WASH
This notebook presents an analysis of geospatial data from a project focusing on three critical thematic areas: Water, Sanitation, and Hygiene (WASH), health, and agriculture. The primary objective is to identify regions where the project's activities are converging and performing well, as well as areas that require further improvement. By mapping these activities across the thematic areas, the analysis aims to provide insights into the project's overall effectiveness and inform strategic interventions.
This project was part of my work during Cohort 1 of the Analytics for a Better World Fellowship (ABW) in 2022. ABW emphasizes the "art of the feasible," equipping individuals from non-profits with the tools and techniques needed to implement data-driven solutions and guide organizations toward making informed decisions. This experience sparked my passion for geospatial data analysis.
The data used in this analysis is a small sample from a real-world project I worked on in Zambia. The geometry files used for mapping are publicly available and can be accessed through a simple online search.
The implementation outlined below assumes the project focuses on three key activities centered around health facilities: forming nutrition support groups, promoting improved agricultural practices, and drilling new boreholes in health facility catchment communities.
The nutrition support groups aim to educate community members about the importance of proper feeding practices for children under five. Improved agricultural activities help households maintain gardens using climate-smart techniques. The produce from these gardens supplements household nutrition, and any surplus can be sold to generate income, which may be invested in small businesses. Profits from these businesses can then be used to purchase other nutrient-rich foods, such as poultry and dairy.
Finally, the drilling of new boreholes ensures that communities have access to clean drinking water, helping to prevent waterborne diseases like diarrhea, which can result from consuming contaminated water.
#import libraries
import pandas as pd
import re
import io
import sys
import folium.features
import matplotlib.pyplot as plt
import numpy as np
import json
import matplotlib
import geopandas as gpd
import folium
import folium.plugins as plugins
import branca.colormap as cm
from folium import FeatureGroup
# Import the necessary libraries
import geopandas as gpd
# Specify the path to the GeoJSON file
# GeoJSON is a format for encoding a variety of geographic data structures using JavaScript Object Notation (JSON).
# It is commonly used to represent geographical features along with their associated non-spatial attributes.
url = "ZMB_adm.json"
district_geo = f"{url}"
# Read the GeoJSON file into a GeoDataFrame
# This will allow us to work with the geographic data in a structured format using GeoPandas.
# try-except block to handle potential errors that might occur if the file is not found or if there are issues with the file format
try:
geoJSON_df = gpd.read_file(district_geo)
except Exception as e:
print(f"An error occurred: {e}")
# Display the first few rows of the GeoDataFrame to inspect the data
# This helps us verify that the data has been loaded correctly and gives us an initial look at the structure of the data.
geoJSON_df.head()
| id | ID_0 | ISO | NAME_0 | ID_1 | NAME_1 | ID_2 | NAME_2 | TYPE_2 | ENGTYPE_2 | NL_NAME_2 | VARNAME_2 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | 255 | ZMB | Zambia | 1 | Central | 1 | Chibombo | District | District | POLYGON ((28.57138 -15.16938, 28.56549 -15.168... | ||
| 1 | None | 255 | ZMB | Zambia | 1 | Central | 2 | Kabwe | District | District | POLYGON ((28.16377 -14.61242, 28.17037 -14.608... | ||
| 2 | None | 255 | ZMB | Zambia | 1 | Central | 3 | Kapiri Mposhi | District | District | POLYGON ((27.10981 -14.39602, 27.11125 -14.377... | ||
| 3 | None | 255 | ZMB | Zambia | 1 | Central | 4 | Mkushi | District | District | POLYGON ((28.81862 -13.61394, 28.83084 -13.596... | ||
| 4 | None | 255 | ZMB | Zambia | 1 | Central | 5 | Mumbwa | District | District | POLYGON ((27.76539 -15.63296, 27.75987 -15.633... |
# Rename the column containing the name of the district to make it easier to remember
# This changes the column name from "NAME_2" to "district" for better readability and easier reference in subsequent analyses.
geoJSON_df = geoJSON_df.rename(columns={"NAME_2": "district"})
# Check the first few rows to confirm that the renaming was successful
geoJSON_df.head()
| id | ID_0 | ISO | NAME_0 | ID_1 | NAME_1 | ID_2 | district | TYPE_2 | ENGTYPE_2 | NL_NAME_2 | VARNAME_2 | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | 255 | ZMB | Zambia | 1 | Central | 1 | Chibombo | District | District | POLYGON ((28.57138 -15.16938, 28.56549 -15.168... | ||
| 1 | None | 255 | ZMB | Zambia | 1 | Central | 2 | Kabwe | District | District | POLYGON ((28.16377 -14.61242, 28.17037 -14.608... | ||
| 2 | None | 255 | ZMB | Zambia | 1 | Central | 3 | Kapiri Mposhi | District | District | POLYGON ((27.10981 -14.39602, 27.11125 -14.377... | ||
| 3 | None | 255 | ZMB | Zambia | 1 | Central | 4 | Mkushi | District | District | POLYGON ((28.81862 -13.61394, 28.83084 -13.596... | ||
| 4 | None | 255 | ZMB | Zambia | 1 | Central | 5 | Mumbwa | District | District | POLYGON ((27.76539 -15.63296, 27.75987 -15.633... |
# Import the necessary libraries
import pandas as pd
import numpy as np
# Import indicators and health facility data
# The na_values parameter specifies additional strings to recognize as NA/NaN.
# The delimiter is set to ',' as the CSV files are comma-separated.
indicators_df = pd.read_csv("indicators.csv", na_values="NA", delimiter=',', header=0, index_col=False)
df_facilities = pd.read_csv("catchment_areas.csv", na_values="NA", delimiter=',', header=0, index_col=False)
# Replace all zero values with NaN to ensure they do not affect average computations
# This is useful for handling cases where zero values may represent missing data rather than actual zero counts.
# Column abbreviations:
# msg_groups - Mother Support Groups (number)
# improved_techs - Improved Agricultural Technologies (number of households practicing)
# new_boreholes - New Boreholes (number)
indicators_df["msg_groups"] = indicators_df["msg_groups"].replace(0, np.nan)
indicators_df["improved_techs"] = indicators_df["improved_techs"].replace(0, np.nan)
indicators_df["new_boreholes"] = indicators_df["new_boreholes"].replace(0, np.nan)
# Display summary statistics of the DataFrame
# This provides an overview of the central tendency, dispersion, and shape of the dataset’s distribution.
indicators_df.describe()
| msg_groups | improved_techs | new_boreholes | latitude | longitude | |
|---|---|---|---|---|---|
| count | 13.000000 | 13.000000 | 11.000000 | 13.000000 | 13.000000 |
| mean | 3205.769231 | 1450.000000 | 8.727273 | -11.948680 | 29.067516 |
| std | 3639.478400 | 916.518139 | 6.724447 | 2.226680 | 1.358950 |
| min | 393.000000 | 36.000000 | 1.000000 | -14.996166 | 26.605040 |
| 25% | 849.000000 | 947.000000 | 3.000000 | -14.134750 | 28.258268 |
| 50% | 1265.000000 | 1091.000000 | 8.000000 | -11.478774 | 28.777910 |
| 75% | 5341.000000 | 1822.000000 | 13.000000 | -10.439146 | 29.927670 |
| max | 12804.000000 | 3664.000000 | 23.000000 | -8.806687 | 31.683861 |
indicators_df.head()
| district | msg_groups | improved_techs | new_boreholes | latitude | longitude | |
|---|---|---|---|---|---|---|
| 0 | Chibombo | 6560 | 2450 | 1.0 | -14.834949 | 28.036740 |
| 1 | Kabwe | 849 | 36 | 23.0 | -14.470852 | 28.352683 |
| 2 | Kapiri Mposhi | 5351 | 1091 | 15.0 | -14.134750 | 28.097150 |
| 3 | Kaputa | 393 | 1822 | NaN | -8.806687 | 29.928590 |
| 4 | Kasama | 1265 | 2113 | 13.0 | -10.439146 | 30.974062 |
# Rename columns in the df_facilities DataFrame
# This changes the column names for better readability and consistency.
df_facilities = df_facilities.rename(columns={
"Country": "country",
"District": "district",
"Province": "province"
})
# Display the first 10 rows of the DataFrame to verify the renaming
df_facilities.head(10)
| code | name | id | country | province | district | longitude | latitude | district_populations | new_boreholes | mother_support_groups | community_gardens | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | kasama_army | Army Clinic | RzLkbnx9fyD | Zambia | Northern | Kasama | 31.183730 | -10.20515 | 306462 | 0.0 | 37.0 | 0.0 |
| 1 | bruneli | Bruneli Health Post | EbzBDFuHPvM | Zambia | Central | Kabwe | 28.632452 | -14.36700 | 234055 | 2.0 | 12.0 | 0.0 |
| 2 | bulambo | Bulambo Health Post | fmWhauKbHg2 | Zambia | Northern | Luwingu | 30.089080 | -10.96715 | 179554 | 0.0 | 6.0 | NaN |
| 3 | bulangililo | Bulangililo Urban Health Centre | SIES0o5kucs | Zambia | Copperbelt | Kitwe | 28.246390 | -12.77889 | 738320 | 0.0 | 1020.0 | 0.0 |
| 4 | bulungu | Bulungu/Mumbwa Health Centre | VRSnCaCXNxW | Zambia | Central | Mumbwa | 27.064960 | -14.98257 | 242480 | 0.0 | 304.0 | 14.0 |
| 5 | buntungwa | Buntungwa Urban Health Centre | jC3fHhoSEuZ | Zambia | Luapula | Mansa | 28.880730 | -11.22584 | 253414 | NaN | 161.0 | NaN |
| 6 | bwacha | Bwacha Urban Health Centre | frjoZHqGJ9e | Zambia | Central | Kabwe | 28.440660 | -14.40797 | 234055 | NaN | 184.0 | NaN |
| 7 | mansa_central | Central Urban Health Centre | k43pfm7F9YJ | Zambia | Luapula | Mansa | 28.949570 | -10.94079 | 253414 | NaN | 127.0 | NaN |
| 8 | chabilikila | Chabilikila Rural Health Centre | cLEnAtHGNM9 | Zambia | Luapula | Nchelenge | 28.706110 | -9.54308 | 203432 | 0.0 | 48.0 | 74.0 |
| 9 | chalele | Chalele Health Facility | gdXqK7FzYoK | Zambia | Northern | Mbala | 31.379060 | -9.26570 | 268774 | 0.0 | 6.0 | 0.0 |
geoData = pd.merge(geoJSON_df, indicators_df, on="district")
# Display the first 5 rows of the DataFrame to verify the renaming
geoData.head()
| id | ID_0 | ISO | NAME_0 | ID_1 | NAME_1 | ID_2 | district | TYPE_2 | ENGTYPE_2 | NL_NAME_2 | VARNAME_2 | geometry | msg_groups | improved_techs | new_boreholes | latitude | longitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | None | 255 | ZMB | Zambia | 1 | Central | 1 | Chibombo | District | District | POLYGON ((28.57138 -15.16938, 28.56549 -15.168... | 6560 | 2450 | 1.0 | -14.834949 | 28.036740 | ||
| 1 | None | 255 | ZMB | Zambia | 1 | Central | 2 | Kabwe | District | District | POLYGON ((28.16377 -14.61242, 28.17037 -14.608... | 849 | 36 | 23.0 | -14.470852 | 28.352683 | ||
| 2 | None | 255 | ZMB | Zambia | 1 | Central | 3 | Kapiri Mposhi | District | District | POLYGON ((27.10981 -14.39602, 27.11125 -14.377... | 5351 | 1091 | 15.0 | -14.134750 | 28.097150 | ||
| 3 | None | 255 | ZMB | Zambia | 1 | Central | 5 | Mumbwa | District | District | POLYGON ((27.76539 -15.63296, 27.75987 -15.633... | 5341 | 947 | 8.0 | -14.996166 | 26.605040 | ||
| 4 | None | 255 | ZMB | Zambia | 2 | Copperbelt | 10 | Kitwe | District | District | POLYGON ((28.48323 -12.75668, 28.48079 -12.754... | 12804 | 1580 | NaN | -12.789445 | 28.258268 |
# Extract latitude and longitude columns from df_facilities
# This creates a DataFrame containing only the latitude and longitude columns.
locations = df_facilities[['latitude', 'longitude']]
# Convert the DataFrame to a list of lists
# Each inner list represents a location with latitude and longitude values.
facility_locationlist = locations.values.tolist()
# Display the length of the list of locations
# This shows the total number of locations in the list.
print(len(facility_locationlist))
# Access and display the 8th entry in the list (index 7, as indexing starts at 0)
# This shows the latitude and longitude of the 8th facility.
print(facility_locationlist[7])
411
[-10.94079, 28.94957]
# Filter df_facilities to include only rows where 'new_boreholes' is greater than or equal to 1
# This creates a new DataFrame df_boreholes that contains facilities with at least one new borehole.
df_boreholes = df_facilities[df_facilities['new_boreholes'] >= 1]
# Inspect the resulting DataFrame
df_boreholes.head()
| code | name | id | country | province | district | longitude | latitude | district_populations | new_boreholes | mother_support_groups | community_gardens | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | bruneli | Bruneli Health Post | EbzBDFuHPvM | Zambia | Central | Kabwe | 28.632452 | -14.367000 | 234055 | 2.0 | 12.0 | 0.0 |
| 13 | chandamukulu | Chandamukulu Rural Health Centre | aaSlxWUsCC7 | Zambia | Northern | Kasama | 31.126944 | -10.820000 | 306462 | 1.0 | 7.0 | NaN |
| 15 | chankalamu | Chankalamu Health Post | SFjORf1oNL9 | Zambia | Central | Kabwe | 28.446314 | -14.541684 | 234055 | 3.0 | 1.0 | NaN |
| 16 | chankomo | Chankomo Rural Health Centre | Uq7cPxwf4aU | Zambia | Central | Kapiri-Mposhi | 29.026560 | -13.905630 | 301722 | 1.0 | 122.0 | NaN |
| 40 | chindwin_camp | Chindwin Camp Urban Health Centre | o5Hz9Axrf2H | Zambia | Central | Kabwe | 28.615370 | -14.336400 | 234055 | 4.0 | 4.0 | NaN |
# how many facilities meet the condition
len(df_boreholes)
48
def scaled_feature(i, column, mx=100, mn=10, data=indicators_df):
# Scales a feature in a DataFrame to a specified range.
# Parameters:
# i (int): The index of the row from which the feature is to be scaled.
# column (str): The name of the column to be scaled.
# mx (float): The maximum value of the scaled range (default is 100).
# mn (float): The minimum value of the scaled range (default is 10).
# data (pd.DataFrame): The DataFrame containing the data (default is indicators_df).
# Returns:
# float: The scaled value.
# Extract the value from the specified row and column
value = data.iloc[i][column]
# Calculate the minimum and maximum of the column
d_max, d_min = data[column].max(), data[column].min()
# Check if d_max is equal to d_min to avoid division by zero
if d_max == d_min:
raise ValueError(f"Column '{column}' has the same min and max values. Scaling cannot be performed.")
# Scale the value to the range [0, 1]
scaled = (value - d_min) / (d_max - d_min)
# Scale the value to the specified range [mn, mx]
return float(scaled * (mx - mn) + mn)
Create a choropleth map using Folium to visualize the number of Nutrition Support Groups across different districts in Zambia, including markers for improved agricultural technologies, new boreholes, and health facilities.
import geopandas as gpd
import folium
import folium.plugins as plugins
import branca.colormap as cm
import pandas as pd
# Set CRS for geoData
geoJSON_df.crs = "EPSG:4326"
# Aggregate data to get total number of boreholes and facilities per district
boreholes_per_district = df_boreholes.groupby('district')['new_boreholes'].sum().reset_index()
facilities_per_district = df_facilities.groupby('district')['name'].count().reset_index()
community_gardens_per_district = df_facilities.groupby('district')['community_gardens'].sum().reset_index()
facilities_per_district.rename(columns={"name": "total_facilities"}, inplace=True)
# Merge aggregated data with geoJSON_df
geoJSON_df = geoJSON_df.merge(indicators_df[['district', 'msg_groups']], on='district', how='left')
geoJSON_df = geoJSON_df.merge(boreholes_per_district, on='district', how='left')
geoJSON_df = geoJSON_df.merge(community_gardens_per_district, on='district', how='left')
geoJSON_df = geoJSON_df.merge(facilities_per_district, on='district', how='left')
# Create the Folium map centered on Zambia
m = folium.Map(location=[-13.1, 27], zoom_start=6.3, width='100%', height='100%', control_scale=True, tiles='CartoDB Positron')
# Choropleth layer with hover functionality
choropleth = folium.Choropleth(
geo_data=geoJSON_df,
data=indicators_df,
columns=['district', 'msg_groups'],
key_on="feature.properties.district",
fill_color="YlGnBu",
fill_opacity=0.7,
line_opacity=0.2,
bins=5,
legend_name="# of Nutrition Support Groups",
name="Nutrition Support Groups Density",
highlight=True
).add_to(m)
# Add hover functionality with GeoJsonTooltip
folium.GeoJson(
geoJSON_df,
style_function=lambda feature: {
'fillColor': '#ffffff00',
'color': '#000000',
'weight': 0.1,
'dashArray': '5, 5',
'fillOpacity': 0,
},
tooltip=folium.GeoJsonTooltip(
fields=['district', 'total_facilities', 'msg_groups', 'new_boreholes', 'community_gardens'],
aliases=['District: ', 'Health Facilities: ', 'Nutrition Support Groups: ', 'New Boreholes: ','Community Gardens: '],
localize=True,
sticky=False,
labels=True,
style="""
background-color: #F0EFEF;
border: 1px solid black;
border-radius: 3px;
box-shadow: 3px;
""",
max_width=300,
)
).add_to(choropleth)
# Improved Agricultural Technologies Layer
group0 = folium.FeatureGroup(name='<span style="color: #007580;">Improved Agricultural Technologies</span>')
for i in range(len(indicators_df)):
folium.CircleMarker(
location=[indicators_df.iloc[i]['latitude'], indicators_df.iloc[i]['longitude']],
popup="Improved Agricultural Technologies " + str(indicators_df.iloc[i]['district']) + ' ' + str(indicators_df.iloc[i]['improved_techs']),
radius=scaled_feature(i, 'improved_techs', mn=5, mx=20),
color='#007580',
fill=True,
fill_color='#007580'
).add_to(group0)
m.add_child(group0)
# New Boreholes Layer
colormap = cm.LinearColormap(colors=['orange', 'blue', 'red'], vmin=0, vmax=5)
for i in range(len(df_boreholes)):
folium.Circle(
location=[df_boreholes.iloc[i]['latitude'], df_boreholes.iloc[i]['longitude']],
radius=20,
fill=True,
color=colormap(df_boreholes.iloc[i]['new_boreholes']),
popup="New Boreholes " + str(df_boreholes.iloc[i]['new_boreholes']),
fill_opacity=0.5
).add_to(m)
m.add_child(colormap)
colormap.caption = '# of Newly Installed Boreholes'
# Health Facilities Layer with Hover Functionality
group2 = folium.FeatureGroup(name='Health Facilities')
marker_cluster = folium.plugins.MarkerCluster().add_to(group2)
for point in range(len(facility_locationlist)):
folium.Marker(
location=facility_locationlist[point],
popup=folium.Popup(df_facilities['name'][point], max_width=200),
tooltip=folium.Tooltip(f"Facility: {df_facilities['name'][point]}")
).add_to(marker_cluster)
group2.add_to(m)
# Add layer control
folium.LayerControl().add_to(m)
# Display the map
m
Summary
The map visualizes health, agriculture, and WASH indicators across 13 districts in Zambia, providing insight into the distribution of various interventions. The base layer is a chloropleth map that color codes districts by the number of Nutrition Support Groups, with darker shades indicating higher numbers. Central and Copperbelt provinces, particularly Kitwe district, have the highest concentration of support groups.
Clustered markers show a higher density of health facilities in Northern and Central provinces, while circle markers represent other key indicators: orange and blue circles for newly installed boreholes, predominantly in Northern Zambia, and green circles for the adoption of improved agricultural technologies.
The Northern and Luapula provinces show significant development efforts across all sectors, suggesting a faster rate of convergence compared to Copperbelt and Central provinces. To ensure uniform intervention coverage across all districts, it is recommended to revisit the implementation strategy and consider adjustments that will promote equitable progress across the country.
Conclusion
This GeoPandas project successfully visualized the spatial distribution of key health, agriculture, and WASH indicators across 13 districts in Zambia. The maps revealed significant regional disparities, with Northern and Luapula provinces showing higher levels of intervention across multiple sectors. This spatial analysis highlights the need for targeted strategies to achieve uniform development across all districts, ensuring that no region is left behind in the pursuit of convergence and sustainable growth.