Converging Initiatives: Geospatial Insights into Community Health, Agriculture, and WASH

This notebook presents an analysis of geospatial data from a project focusing on three critical thematic areas: Water, Sanitation, and Hygiene (WASH), health, and agriculture. The primary objective is to identify regions where the project's activities are converging and performing well, as well as areas that require further improvement. By mapping these activities across the thematic areas, the analysis aims to provide insights into the project's overall effectiveness and inform strategic interventions.

This project was part of my work during Cohort 1 of the Analytics for a Better World Fellowship (ABW) in 2022. ABW emphasizes the "art of the feasible," equipping individuals from non-profits with the tools and techniques needed to implement data-driven solutions and guide organizations toward making informed decisions. This experience sparked my passion for geospatial data analysis.

The data used in this analysis is a small sample from a real-world project I worked on in Zambia. The geometry files used for mapping are publicly available and can be accessed through a simple online search.

The implementation outlined below assumes the project focuses on three key activities centered around health facilities: forming nutrition support groups, promoting improved agricultural practices, and drilling new boreholes in health facility catchment communities.

The nutrition support groups aim to educate community members about the importance of proper feeding practices for children under five. Improved agricultural activities help households maintain gardens using climate-smart techniques. The produce from these gardens supplements household nutrition, and any surplus can be sold to generate income, which may be invested in small businesses. Profits from these businesses can then be used to purchase other nutrient-rich foods, such as poultry and dairy.

Finally, the drilling of new boreholes ensures that communities have access to clean drinking water, helping to prevent waterborne diseases like diarrhea, which can result from consuming contaminated water.

In [55]:

#import libraries
        
        import pandas as pd
        import re
        import io
        import sys
        import folium.features
        import matplotlib.pyplot as plt
        import numpy as np
        import json
        import matplotlib
        import geopandas as gpd
        import folium
        import folium.plugins as plugins
        import branca.colormap as cm
        from folium import FeatureGroup

In [56]:

# Import the necessary libraries
        import geopandas as gpd
        
        # Specify the path to the GeoJSON file
        # GeoJSON is a format for encoding a variety of geographic data structures using JavaScript Object Notation (JSON).
        # It is commonly used to represent geographical features along with their associated non-spatial attributes.
        url = "ZMB_adm.json"
        district_geo = f"{url}"
        
        # Read the GeoJSON file into a GeoDataFrame
        # This will allow us to work with the geographic data in a structured format using GeoPandas.
        # try-except block to handle potential errors that might occur if the file is not found or if there are issues with the file format
        try:
            geoJSON_df = gpd.read_file(district_geo)
        except Exception as e:
            print(f"An error occurred: {e}")
        
        # Display the first few rows of the GeoDataFrame to inspect the data
        # This helps us verify that the data has been loaded correctly and gives us an initial look at the structure of the data.
        geoJSON_df.head()

Out[56]:

	id	ID_0	ISO	NAME_0	ID_1	NAME_1	ID_2	NAME_2	TYPE_2	ENGTYPE_2	geometry
0	None	255	ZMB	Zambia	1	Central	1	Chibombo	District	District	POLYGON ((28.57138 -15.16938, 28.56549 -15.168...
1	None	255	ZMB	Zambia	1	Central	2	Kabwe	District	District	POLYGON ((28.16377 -14.61242, 28.17037 -14.608...
2	None	255	ZMB	Zambia	1	Central	3	Kapiri Mposhi	District	District	POLYGON ((27.10981 -14.39602, 27.11125 -14.377...
3	None	255	ZMB	Zambia	1	Central	4	Mkushi	District	District	POLYGON ((28.81862 -13.61394, 28.83084 -13.596...
4	None	255	ZMB	Zambia	1	Central	5	Mumbwa	District	District	POLYGON ((27.76539 -15.63296, 27.75987 -15.633...

In [57]:

# Rename the column containing the name of the district to make it easier to remember
        # This changes the column name from "NAME_2" to "district" for better readability and easier reference in subsequent analyses.
        geoJSON_df = geoJSON_df.rename(columns={"NAME_2": "district"})

In [58]:

# Check the first few rows to confirm that the renaming was successful
        geoJSON_df.head()

Out[58]:

	id	ID_0	ISO	NAME_0	ID_1	NAME_1	ID_2	district	TYPE_2	ENGTYPE_2	geometry
0	None	255	ZMB	Zambia	1	Central	1	Chibombo	District	District	POLYGON ((28.57138 -15.16938, 28.56549 -15.168...
1	None	255	ZMB	Zambia	1	Central	2	Kabwe	District	District	POLYGON ((28.16377 -14.61242, 28.17037 -14.608...
2	None	255	ZMB	Zambia	1	Central	3	Kapiri Mposhi	District	District	POLYGON ((27.10981 -14.39602, 27.11125 -14.377...
3	None	255	ZMB	Zambia	1	Central	4	Mkushi	District	District	POLYGON ((28.81862 -13.61394, 28.83084 -13.596...
4	None	255	ZMB	Zambia	1	Central	5	Mumbwa	District	District	POLYGON ((27.76539 -15.63296, 27.75987 -15.633...

In [59]:

# Import the necessary libraries
        import pandas as pd
        import numpy as np
        
        # Import indicators and health facility data
        # The na_values parameter specifies additional strings to recognize as NA/NaN.
        # The delimiter is set to ',' as the CSV files are comma-separated.
        indicators_df = pd.read_csv("indicators.csv", na_values="NA", delimiter=',', header=0, index_col=False)
        df_facilities = pd.read_csv("catchment_areas.csv", na_values="NA", delimiter=',', header=0, index_col=False)
        
        # Replace all zero values with NaN to ensure they do not affect average computations
        # This is useful for handling cases where zero values may represent missing data rather than actual zero counts.
        # Column abbreviations:
        # msg_groups - Mother Support Groups (number)
        # improved_techs - Improved Agricultural Technologies (number of households practicing)
        # new_boreholes - New Boreholes (number)
        indicators_df["msg_groups"] = indicators_df["msg_groups"].replace(0, np.nan)
        indicators_df["improved_techs"] = indicators_df["improved_techs"].replace(0, np.nan)
        indicators_df["new_boreholes"] = indicators_df["new_boreholes"].replace(0, np.nan)
        
        # Display summary statistics of the DataFrame
        # This provides an overview of the central tendency, dispersion, and shape of the dataset’s distribution.
        indicators_df.describe()

Out[59]:

	msg_groups	improved_techs	new_boreholes	latitude	longitude
count	13.000000	13.000000	11.000000	13.000000	13.000000
mean	3205.769231	1450.000000	8.727273	-11.948680	29.067516
std	3639.478400	916.518139	6.724447	2.226680	1.358950
min	393.000000	36.000000	1.000000	-14.996166	26.605040
25%	849.000000	947.000000	3.000000	-14.134750	28.258268
50%	1265.000000	1091.000000	8.000000	-11.478774	28.777910
75%	5341.000000	1822.000000	13.000000	-10.439146	29.927670
max	12804.000000	3664.000000	23.000000	-8.806687	31.683861

In [60]:

indicators_df.head()

Out[60]:

	district	msg_groups	improved_techs	new_boreholes	latitude	longitude
0	Chibombo	6560	2450	1.0	-14.834949	28.036740
1	Kabwe	849	36	23.0	-14.470852	28.352683
2	Kapiri Mposhi	5351	1091	15.0	-14.134750	28.097150
3	Kaputa	393	1822	NaN	-8.806687	29.928590
4	Kasama	1265	2113	13.0	-10.439146	30.974062

In [61]:

# Rename columns in the df_facilities DataFrame
        # This changes the column names for better readability and consistency.
        df_facilities = df_facilities.rename(columns={
            "Country": "country",
            "District": "district",
            "Province": "province"
        })
        
        # Display the first 10 rows of the DataFrame to verify the renaming
        df_facilities.head(10)

Out[61]:

	code	name	id	country	province	district	longitude	latitude	district_populations	new_boreholes	mother_support_groups	community_gardens
0	kasama_army	Army Clinic	RzLkbnx9fyD	Zambia	Northern	Kasama	31.183730	-10.20515	306462	0.0	37.0	0.0
1	bruneli	Bruneli Health Post	EbzBDFuHPvM	Zambia	Central	Kabwe	28.632452	-14.36700	234055	2.0	12.0	0.0
2	bulambo	Bulambo Health Post	fmWhauKbHg2	Zambia	Northern	Luwingu	30.089080	-10.96715	179554	0.0	6.0	NaN
3	bulangililo	Bulangililo Urban Health Centre	SIES0o5kucs	Zambia	Copperbelt	Kitwe	28.246390	-12.77889	738320	0.0	1020.0	0.0
4	bulungu	Bulungu/Mumbwa Health Centre	VRSnCaCXNxW	Zambia	Central	Mumbwa	27.064960	-14.98257	242480	0.0	304.0	14.0
5	buntungwa	Buntungwa Urban Health Centre	jC3fHhoSEuZ	Zambia	Luapula	Mansa	28.880730	-11.22584	253414	NaN	161.0	NaN
6	bwacha	Bwacha Urban Health Centre	frjoZHqGJ9e	Zambia	Central	Kabwe	28.440660	-14.40797	234055	NaN	184.0	NaN
7	mansa_central	Central Urban Health Centre	k43pfm7F9YJ	Zambia	Luapula	Mansa	28.949570	-10.94079	253414	NaN	127.0	NaN
8	chabilikila	Chabilikila Rural Health Centre	cLEnAtHGNM9	Zambia	Luapula	Nchelenge	28.706110	-9.54308	203432	0.0	48.0	74.0
9	chalele	Chalele Health Facility	gdXqK7FzYoK	Zambia	Northern	Mbala	31.379060	-9.26570	268774	0.0	6.0	0.0

In [62]:

geoData = pd.merge(geoJSON_df, indicators_df, on="district")

In [63]:

# Display the first 5 rows of the DataFrame to verify the renaming
        geoData.head()

Out[63]:

	id	ID_0	ISO	NAME_0	ID_1	NAME_1	ID_2	district	TYPE_2	ENGTYPE_2	geometry	msg_groups	improved_techs	new_boreholes	latitude	longitude
0	None	255	ZMB	Zambia	1	Central	1	Chibombo	District	District	POLYGON ((28.57138 -15.16938, 28.56549 -15.168...	6560	2450	1.0	-14.834949	28.036740
1	None	255	ZMB	Zambia	1	Central	2	Kabwe	District	District	POLYGON ((28.16377 -14.61242, 28.17037 -14.608...	849	36	23.0	-14.470852	28.352683
2	None	255	ZMB	Zambia	1	Central	3	Kapiri Mposhi	District	District	POLYGON ((27.10981 -14.39602, 27.11125 -14.377...	5351	1091	15.0	-14.134750	28.097150
3	None	255	ZMB	Zambia	1	Central	5	Mumbwa	District	District	POLYGON ((27.76539 -15.63296, 27.75987 -15.633...	5341	947	8.0	-14.996166	26.605040
4	None	255	ZMB	Zambia	2	Copperbelt	10	Kitwe	District	District	POLYGON ((28.48323 -12.75668, 28.48079 -12.754...	12804	1580	NaN	-12.789445	28.258268

In [64]:

# Extract latitude and longitude columns from df_facilities
        # This creates a DataFrame containing only the latitude and longitude columns.
        locations = df_facilities[['latitude', 'longitude']]
        
        # Convert the DataFrame to a list of lists
        # Each inner list represents a location with latitude and longitude values.
        facility_locationlist = locations.values.tolist()
        
        # Display the length of the list of locations
        # This shows the total number of locations in the list.
        print(len(facility_locationlist))
        
        # Access and display the 8th entry in the list (index 7, as indexing starts at 0)
        # This shows the latitude and longitude of the 8th facility.
        print(facility_locationlist[7])

411
        [-10.94079, 28.94957]

In [65]:

# Filter df_facilities to include only rows where 'new_boreholes' is greater than or equal to 1
        # This creates a new DataFrame df_boreholes that contains facilities with at least one new borehole.
        df_boreholes = df_facilities[df_facilities['new_boreholes'] >= 1]
        
        # Inspect the resulting DataFrame
        df_boreholes.head()

Out[65]:

	code	name	id	country	province	district	longitude	latitude	district_populations	new_boreholes	mother_support_groups	community_gardens
1	bruneli	Bruneli Health Post	EbzBDFuHPvM	Zambia	Central	Kabwe	28.632452	-14.367000	234055	2.0	12.0	0.0
13	chandamukulu	Chandamukulu Rural Health Centre	aaSlxWUsCC7	Zambia	Northern	Kasama	31.126944	-10.820000	306462	1.0	7.0	NaN
15	chankalamu	Chankalamu Health Post	SFjORf1oNL9	Zambia	Central	Kabwe	28.446314	-14.541684	234055	3.0	1.0	NaN
16	chankomo	Chankomo Rural Health Centre	Uq7cPxwf4aU	Zambia	Central	Kapiri-Mposhi	29.026560	-13.905630	301722	1.0	122.0	NaN
40	chindwin_camp	Chindwin Camp Urban Health Centre	o5Hz9Axrf2H	Zambia	Central	Kabwe	28.615370	-14.336400	234055	4.0	4.0	NaN

In [66]:

# how many facilities meet the condition
        len(df_boreholes)

Out[66]:

In [67]:

def scaled_feature(i, column, mx=100, mn=10, data=indicators_df):
            
           # Scales a feature in a DataFrame to a specified range.
        
           # Parameters:
           # i (int): The index of the row from which the feature is to be scaled.
           # column (str): The name of the column to be scaled.
           # mx (float): The maximum value of the scaled range (default is 100).
           # mn (float): The minimum value of the scaled range (default is 10).
           # data (pd.DataFrame): The DataFrame containing the data (default is indicators_df).
        
           # Returns:
           # float: The scaled value.
         
            # Extract the value from the specified row and column
            value = data.iloc[i][column]
        
            # Calculate the minimum and maximum of the column
            d_max, d_min = data[column].max(), data[column].min()
        
            # Check if d_max is equal to d_min to avoid division by zero
            if d_max == d_min:
                raise ValueError(f"Column '{column}' has the same min and max values. Scaling cannot be performed.")
        
            # Scale the value to the range [0, 1]
            scaled = (value - d_min) / (d_max - d_min)
        
            # Scale the value to the specified range [mn, mx]
            return float(scaled * (mx - mn) + mn)

Create a choropleth map using Folium to visualize the number of Nutrition Support Groups across different districts in Zambia, including markers for improved agricultural technologies, new boreholes, and health facilities.

In [68]:

import geopandas as gpd
        import folium
        import folium.plugins as plugins
        import branca.colormap as cm
        import pandas as pd
        
        # Set CRS for geoData
        geoJSON_df.crs = "EPSG:4326"
        
        # Aggregate data to get total number of boreholes and facilities per district
        boreholes_per_district = df_boreholes.groupby('district')['new_boreholes'].sum().reset_index()
        facilities_per_district = df_facilities.groupby('district')['name'].count().reset_index()
        community_gardens_per_district = df_facilities.groupby('district')['community_gardens'].sum().reset_index()
        facilities_per_district.rename(columns={"name": "total_facilities"}, inplace=True)
        
        # Merge aggregated data with geoJSON_df
        geoJSON_df = geoJSON_df.merge(indicators_df[['district', 'msg_groups']], on='district', how='left')
        geoJSON_df = geoJSON_df.merge(boreholes_per_district, on='district', how='left')
        geoJSON_df = geoJSON_df.merge(community_gardens_per_district, on='district', how='left')
        geoJSON_df = geoJSON_df.merge(facilities_per_district, on='district', how='left')
        
        # Create the Folium map centered on Zambia
        m = folium.Map(location=[-13.1, 27], zoom_start=6.3, width='100%', height='100%', control_scale=True, tiles='CartoDB Positron')
        
        # Choropleth layer with hover functionality
        choropleth = folium.Choropleth(
            geo_data=geoJSON_df,
            data=indicators_df,
            columns=['district', 'msg_groups'],
            key_on="feature.properties.district",
            fill_color="YlGnBu",
            fill_opacity=0.7,
            line_opacity=0.2,
            bins=5,
            legend_name="# of Nutrition Support Groups",
            name="Nutrition Support Groups Density",
            highlight=True
        ).add_to(m)
        
        # Add hover functionality with GeoJsonTooltip
        folium.GeoJson(
            geoJSON_df,
            style_function=lambda feature: {
                'fillColor': '#ffffff00',
                'color': '#000000',
                'weight': 0.1,
                'dashArray': '5, 5',
                'fillOpacity': 0,
            },
            tooltip=folium.GeoJsonTooltip(
                fields=['district', 'total_facilities', 'msg_groups', 'new_boreholes', 'community_gardens'],
                aliases=['District: ',  'Health Facilities: ', 'Nutrition Support Groups: ', 'New Boreholes: ','Community Gardens: '],
                localize=True,
                sticky=False,
                labels=True,
                style="""
                    background-color: #F0EFEF;
                    border: 1px solid black;
                    border-radius: 3px;
                    box-shadow: 3px;
                """,
                max_width=300,
            )
        ).add_to(choropleth)
        
        # Improved Agricultural Technologies Layer
        group0 = folium.FeatureGroup(name='<span style="color: #007580;">Improved Agricultural Technologies</span>')
        for i in range(len(indicators_df)):
            folium.CircleMarker(
                location=[indicators_df.iloc[i]['latitude'], indicators_df.iloc[i]['longitude']],
                popup="Improved Agricultural Technologies " + str(indicators_df.iloc[i]['district']) + ' ' + str(indicators_df.iloc[i]['improved_techs']),
                radius=scaled_feature(i, 'improved_techs', mn=5, mx=20),
                color='#007580',
                fill=True,
                fill_color='#007580'
            ).add_to(group0)
        m.add_child(group0)
        
        # New Boreholes Layer
        colormap = cm.LinearColormap(colors=['orange', 'blue', 'red'], vmin=0, vmax=5)
        for i in range(len(df_boreholes)):
            folium.Circle(
                location=[df_boreholes.iloc[i]['latitude'], df_boreholes.iloc[i]['longitude']],
                radius=20,
                fill=True,
                color=colormap(df_boreholes.iloc[i]['new_boreholes']),
                popup="New Boreholes " + str(df_boreholes.iloc[i]['new_boreholes']),
                fill_opacity=0.5
            ).add_to(m)
        m.add_child(colormap)
        colormap.caption = '# of Newly Installed Boreholes'
        
        # Health Facilities Layer with Hover Functionality
        group2 = folium.FeatureGroup(name='Health Facilities')
        marker_cluster = folium.plugins.MarkerCluster().add_to(group2)
        
        for point in range(len(facility_locationlist)):
            folium.Marker(
                location=facility_locationlist[point],
                popup=folium.Popup(df_facilities['name'][point], max_width=200),
                tooltip=folium.Tooltip(f"Facility: {df_facilities['name'][point]}")
            ).add_to(marker_cluster)
        
        group2.add_to(m)
        
        # Add layer control
        folium.LayerControl().add_to(m)
        
        # Display the map
        m

Out[68]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Summary

The map visualizes health, agriculture, and WASH indicators across 13 districts in Zambia, providing insight into the distribution of various interventions. The base layer is a chloropleth map that color codes districts by the number of Nutrition Support Groups, with darker shades indicating higher numbers. Central and Copperbelt provinces, particularly Kitwe district, have the highest concentration of support groups.

Clustered markers show a higher density of health facilities in Northern and Central provinces, while circle markers represent other key indicators: orange and blue circles for newly installed boreholes, predominantly in Northern Zambia, and green circles for the adoption of improved agricultural technologies.

The Northern and Luapula provinces show significant development efforts across all sectors, suggesting a faster rate of convergence compared to Copperbelt and Central provinces. To ensure uniform intervention coverage across all districts, it is recommended to revisit the implementation strategy and consider adjustments that will promote equitable progress across the country.

Conclusion

This GeoPandas project successfully visualized the spatial distribution of key health, agriculture, and WASH indicators across 13 districts in Zambia. The maps revealed significant regional disparities, with Northern and Luapula provinces showing higher levels of intervention across multiple sectors. This spatial analysis highlights the need for targeted strategies to achieve uniform development across all districts, ensuring that no region is left behind in the pursuit of convergence and sustainable growth.