Geospatial Data in Python - Interactive Visualization
Geospatial data is data about objects, events, or phenomena that have a location on the surface of the earth. Geospatial data combines location information (usually coordinates on the earth), attribute information (the characteristics of the object, event, or phenomena concerned).
Kristin Stock, Hans Guesgen, in Automating Open Source Intelligence, 2016
Geospatial data is the core component of Spatial Data Science which is a subset of Data Science. Location, distance and spatial interactions are the core aspects in SDS that are treated with specialized methods and software to analyze, visualize and learn from spatial data.
In this tutorial, You'll learn how to work with geospatial data and visualize it on an iteractive leaflet map using Python and Folium library.
Folium is a powerful library that combines the strength of Python in data processing and the strength of Leaflet.js in mapping. Its ease of use allows you to create interactive maps and populate it with data in just few lines of code.
Requirements
To start, let's get the tools ready!
I'm using the latest versions at the time of writing.
Python: 3.9
Folium: 0.12.1
JupyterLab: 3.2.5
You can use Google Colab or Kaggle Kernels where Folium is already installed for you, if you're using JupyterLab locally, you can easily install Folium with the following command:
pip install folium
Getting the data
In this workshop, we're going to work with Hospitals locations in the US using the dataset from HIFLD open data portal under public domain license.
- Source: HIFLD | Download from here or here.
- Content: Hospitals in USA. (Locations and other information).
- Records: 7596.
- Last update: 8th Dec 2020.
- License: Public Domain.
The source data is available in a variety of formats (pdfs, tables, webpages, etc.), and it includes wide variety of useful features. But for the purpose of this workshop we'll use the .csv
format and we'll focus only on fewer columns:
ADDRESS: string
STATE: string
TYPE: string
STATUS: boolean
POPULATION: integer
LATITUDE: decimal
LONGITUDE: decimal
The pairs (LATITUDE
, LONGITUDE
) are used to place the locations on the map, while other columns like STATE
, TYPE
and STATUS
are used for filtering, and finally ADDRESS
and POPULATION
are used as metadata for customizing the markers on the map. (If you're practicing with the code in this lab on your own, you can include more columns if you want to apply further customization on the map.)
Now, let's start coding!
Let's first define some useful constants like the list containing our targeted column names WORKING_COLS
, the file path FILE_PATH
, and the state name we're working on STATE
.
FILE_PATH = "__PATH__TO__CSV__FILE__"
WORKING_COLS = ["ADDRESS", "STATE", "TYPE", "STATUS", "POPULATION", "LATITUDE", "LONGITUDE"]
STATE = "CA"
Then, load the data and keep only the columns listed above for the given state.
hosp_df = pd.read_csv(FILE_PATH)
hosp_df = hosp_df.loc[hosp_df["STATE"] == STATE, WORKING_COLS]
Here's how the data looks like when after loading:
Before we start using the data, let's explore it and see if we need to clean it or to apply some preprocessing.
Missing values
First, we can check if we have any missing values (NaN
), this can be verified with hosp_df.isna().sum().sum()
that gives 0
which means there's no missing values.
Numeric values
To check the consistency of numeric values, some useful statistics on numerical features like POPULATION
, LATITUDE
and LONGITUDE
can be computed using the dataframe's describe
method hosp_df.describe()
:
It's noticeable that the population
column has some negative values since the min=-999
, and it's clear that population cannot be a negative value, so it must be fixed either by adjusting the values to 0
or by dropping the rows containing negative values.
hosp_df = hosp_df[hosp_df["POPULATION"] >= 0]
Finite values
The column STATUS
contains two unique values "CLOSED"
or "OPEN"
, this can be verified by hosp_df["STATUS"].unique()
.
Plotting interactive maps
Folium provides severl useful features to plot and cusomize interactive maps, we'll start with plotting a basic map and customize it with our data and advanced features.
Plotting a basic map with Folium
Folium provides the class folium.Map()
which takes location
parameter as a list containing one pair of latitude and longitude, and it generates a map around the given location, to automatically center the generated map around our data, we can pass the mean value of latitude and longitude values in the data:
m=folium.Map(
location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
zoom_start=6)
m
The generated map is interactive, you can zoom in and out using the buttons on the top-left corner or the mouse wheel.
Adding tiles to a map
The default tileset in Folium is OpenStreetMap, for different representations, we can add layers with different tiles like Stamen Terrain, Stamen Water Color, CartoDB Positron, and more. Each tileset is used to show different features of a map.
It's possible to add multiple tile layers to a single map in Folium using the class folium.TileLayer
, and switch between them interactively using a layer control panel folium.LayerControl
.
m=folium.Map(
location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
zoom_start=6)
folium.TileLayer('cartodbdark_matter').add_to(m)
folium.TileLayer('cartodbpositron').add_to(m)
folium.TileLayer('Stamen Terrain').add_to(m)
folium.TileLayer('Stamen Toner').add_to(m)
folium.TileLayer('Stamen Water Color').add_to(m)
folium.LayerControl().add_to(m)
m
The folium.LayerControl
provides an icon on the top-right corner to pop up a radio group for switching between different layers.
Adding markers to a map
Markers are important in an interactive map to specify a location. Folium provides folium.Marker
class to create a marker in a given location that can be added to a map.
Basic Markers
It's possible to plot all the points from our data by passing that process as a lambda
function to the apply
method on our dataframe.
m=folium.Map(
location=[hosp_df["LATITUDE"].mean(), hosp_df["LONGITUDE"].mean()],
zoom_start=8)
hosp_df.apply(
lambda row: folium.Marker(
location=[row['LATITUDE'], row['LONGITUDE']]
).add_to(m),
axis=1)
m
Customized markers
To customize the markers, more parameters can be passed to folium.Marker
:
popup
:folium.Popup
orstr
(can behtml
) content to be displayed when clicking on a marker.tooltip
:folium.Tooltip
orstr
(can behtml
) content to be displayed when hovering on a marker.icon
:folium.CustomIcon
,folium.Icon
orfolium.DivIcon
– the Icon plugin to use to render the marker.
The popup and tooltip content can be customized by either providing a plain formatted text or html block, it's also possible to change the appearance of the marker by providing one of the classes folium.CustomIcon
, folium.Icon
or folium.DivIcon
to the icon
parameter. The folium.Icon
class takes an icon name alongside with the provider's prefix ("fa"
or "glyphicon"
which is by default) and the color name or code, the list of glyphicon can be found here.
m=folium.Map(
location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
zoom_start=8)
def get_icon(status):
if status == "OPEN":
return folium.Icon(icon='heart',
color='black',
icon_color='#2ecc71'
)
else:
return folium.Icon(icon='glyphicon-off',
color='red')
hosp_df.apply(
lambda row: folium.Marker(
location=[row['LATITUDE'], row['LONGITUDE']],
popup=row['ADDRESS'],
tooltip='<h5>Click here for more info</h5>',
icon=get_icon(row['STATUS']),
).add_to(m),
axis=1)
m
Bubble map
To represent numeric values on a map, we can plot circles of different sizes by binding the circle radius to its value in the dataset, in our case, we're representing the covered population by each center with a circle of radius proportional to its POPULATION
value.
The folium.CircleMarker
class takes a required parameter radius
in addition to more inherited parameters to customize its appearance.
radius
:number
- Radius of the circle marker, in pixels. (default:10
)stroke
:boolean
- Wether to draw stroke along the path or not. (default:True
)color
:str
- Stroke color.weight
:number
- The width of the stroke in pixels. (default:3
)opacity
:number
- Stroke opacity from0
to1.0
. (default:1.0
)fill
:boolean
- Whether to fill the path with color. (default:True
)fill_color
:str
- Fill color. Defaults to the value of thecolor
parameter.fill_opacity
:number
- Fill opacity. (default:0.2
)
PS. for simplicity, I'm just multiplying the population values by a factor of 1/20
. A more reliable way would be mapping the population values with a specific radius range.
m=folium.Map(
location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
zoom_start=8)
def get_radius(pop):
return int(pop / 20)
hosp_df.apply(
lambda row: folium.CircleMarker(
location=[row['LATITUDE'], row['LONGITUDE']],
radius=get_radius(row['POPULATION']),
popup=row['ADDRESS'],
tooltip='<h5>Click here for more info</h5>',
stroke=True,
weight=1,
color="#3186cc",
fill=True,
fill_color="#3186cc",
opacity=0.9,
fill_opacity=0.3,
).add_to(m),
axis=1)
m
Marker Clusters
When working on an intensive map, it can be useful to use marker clusters in order to avoid the mess caused by many nearby markers overlapping each other.
Folium provides an easy way to set up marker clusters, so instead of adding markers directly to the map, they are added to a folium.plugins.MarkerCluster
instance which is then added to the map.
m=folium.Map(
location=[hosp_df['LATITUDE'].mean(), hosp_df['LONGITUDE'].mean()],
zoom_start=8)
cluster = MarkerCluster(name="Hospitals")
def get_icon(status):
if status == "OPEN":
return folium.Icon(icon='heart',
color='black',
icon_color='#2ecc71'
)
else:
return folium.Icon(icon='glyphicon-off',
color='red')
hosp_df.apply(
lambda row: folium.Marker(
location=[row['LATITUDE'], row['LONGITUDE']],
popup=row['ADDRESS'],
tooltip='<h5>Click here for more info</h5>',
icon=get_icon(row['STATUS']),
).add_to(cluster),
axis=1)
cluster.add_to(m)
m
When hovering on a cluster, it shows the bounds of the area covered by that cluster. this default behavior can be omitted by setting showCoverageOnHover
option to false as follows:
cluster = MarkerCluster(name="Hospitals", options={"showCoverageOnHover": False})
Finally !
Folium provides more options to discover and use for visualizing your geospatial data on interactive maps, so it's worth reading the documentation and get hands-on experience for your further projects.
You can find the Jupyter-Notebook here to reproduce the results in this tutorial.
Thank you! A very well done tutorial with lots of great info and clearly presented. One error, you should have the import statements shown as not all folks doing the tutorial will know to import the MarkerCluster plugin for folium though they will infer the ones for pandas and folium by your explanation. Other than that, pretty much perfecto. ¡Muchas gracias!
import pandas as pd
import folium
from folium.plugins import MarkerCluster
Finally someone who talked about this, thank you so much.
You’re welcome). I’m glad it helped