Title: | An Implementation of Crime Analysis Methods |
---|---|
Description: | An implementation of functions for the analysis of crime incident or records management system data. The package implements analysis algorithms scaled for city or regional crime analysis units. The package provides functions for kernel density estimation for crime heat maps, geocoding using the 'Google Maps' API, identification of repeat crime incidents, spatio-temporal map comparison across time intervals, time series analysis (forecasting and decomposition), detection of optimal parameters for the identification of near repeat incidents, and near repeat analysis with crime network linkage. |
Authors: | Jamie Spaulding and Keith Morris |
Maintainer: | Jamie Spaulding <[email protected]> |
License: | GPL-3 |
Version: | 0.5.0 |
Built: | 2025-03-10 05:51:40 UTC |
Source: | https://github.com/jsspaulding/rcrimeanalysis |
A sample dataset of crime incidents in Chicago, IL from 2017-2019.
crimes
crimes
A data frame with 25000 rows and 22 variables.
Unique identifier for the record.
The Chicago Police Department Records Division Number, which is unique to the incident.
Date when the incident occurred.
Partially redacted address where the incident occurred.
Illinois Unifrom Crime Reporting code (directly linked to primary_type and description)
The primary description of the IUCR code.
The secondary description of the IUCR code, a subcategory of the primary description.
Description of the location where the incident occurred.
Indicates whether an arrest was made.
Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
Indicates the police beat where the incident occurred.
Indicates the police district where the incident occurred.
The ward (City Council district) where the incident occurred.
Indicates the community area where the incident occurred.
Indicates the National Incident-Based Reporting System (NIBRS) crime classification.
X coordinate of the incident location (State Plane Illinois East NAD 1983 projection).
Y coordinate of the incident location (State Plane Illinois East NAD 1983 projection).
Year the incident occurred.
Date and time the record was last updated.
The latitude of the location where the incident occurred.
The longitude of the location where the incident occurred.
Concatenation of latitude and longitude.
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
Geocodes a location (determines latitude and longitude from physical address) using the Google Maps API. Note that the Google Maps API requires registered credentials (Google Cloud Platform), see the ggmap package for more details at https://github.com/dkahle/ggmap. Note that when using this function you are agreeing to the Google Maps API Terms of Service at https://cloud.google.com/maps-platform/terms/.
geocode_address(location)
geocode_address(location)
location |
a character vector of physical addresses (e.g. 1600 University Ave., Morgantown, WV) |
Returns a two column matrix with the latitude and longitude of each location queried.
Jamie Spaulding, Keith Morris
library(ggmap) #needed to register Google Cloud Credentials register_google("**Google Cloud Credentials Here**") addresses <- c("Milan Puskar Stadium, Morgantown, WV","Woodburn Hall, Morgantown, WV") geocode_address(addresses)
library(ggmap) #needed to register Google Cloud Credentials register_google("**Google Cloud Credentials Here**") addresses <- c("Milan Puskar Stadium, Morgantown, WV","Woodburn Hall, Morgantown, WV") geocode_address(addresses)
This function identifies crime incidents which occur at the same location and returns a list of such incidents where each data frame in the list contains the RMS data for the repeat crime incidents. The data is based on the Chicago Police Department RMS structure.
id_repeat(data)
id_repeat(data)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
A list where each data frame contains repeat crime incidents for a given location.
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) crimes <- head(crimes, n = 1000) out <- id_repeat(crimes)
#Using provided dataset from Chicago Data Portal: data(crimes) crimes <- head(crimes, n = 1000) out <- id_repeat(crimes)
This function calculates and compares the kernel density estimate (heat maps) of crime incident locations from two given intervals. The function returns a net difference raster which illustrates net changes between the spatial crime distributions across the specified intervals.
kde_int_comp(data, start1, end1, start2, end2)
kde_int_comp(data, start1, end1, start2, end2)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
start1 |
Beginning date for the first interval of comparison |
end1 |
Final date for the first interval of comparison |
start2 |
Beginning date for the second interval of comparison |
end2 |
Final date for the second interval of comparison |
Returns a shiny.tag.list object which contains three leaflet widgets: a widget with the calculated KDE from interval 1, a widget with the calculated KDE from interval 2, and a widget with a raster of the net differences between the KDE (heat maps) of each specified interval.
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) int_out <- kde_int_comp(crimes, start1="1/1/2017", end1="3/1/2017", start2="1/1/2018", end2="3/1/2018")
#Using provided dataset from Chicago Data Portal: data(crimes) int_out <- kde_int_comp(crimes, start1="1/1/2017", end1="3/1/2017", start2="1/1/2018", end2="3/1/2018")
This function computes a kernel density estimate of crime incident locations and returns a 'Leaflet' map of the incidents. The data is based on the Chicago Police Department RMS structure and populates pop-up windows with the incident location for each incident.
kde_map(data, pts = NULL)
kde_map(data, pts = NULL)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
pts |
Either true or false. Dictates whether the incident points will
be plotted on the map widget. If |
A Leaflet map with three layers: an 'ESRI' base-map, all crime incidents plotted (with incident info pop-up windows), and a kernel density estimate of those points.
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) crimes <- head(crimes, 1000) library('leaflet') # needed to install basemap providers kde_map(crimes)
#Using provided dataset from Chicago Data Portal: data(crimes) crimes <- head(crimes, 1000) library('leaflet') # needed to install basemap providers kde_map(crimes)
This function performs near repeat analysis for a set of incident locations. The user specifies distance and time thresholds which are utilized to search all other incidents and find other near repeat incidents. From this an adjacency matrix is created for incidents which are related under the thresholds. The adjacency matrix is then used to create an igraph graph which illustrates potentially related or linked incidents (under the near repeat thresholds).
near_repeat_analysis( data, epsg, dist_thresh = NULL, time_thresh = NULL, tz = NULL )
near_repeat_analysis( data, epsg, dist_thresh = NULL, time_thresh = NULL, tz = NULL )
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
epsg |
The EPSG Geodetic Parameter code for the area being considered. The EPSG code is used for identifying projections and performing coordinate transformations. If needed, the EPSG for an area can be found at https://spatialreference.org. |
dist_thresh |
The spatial distance (in meters) which defines a near repeat incident. By default this value is set to 1000 meters. |
time_thresh |
The temporal distance (in days) which defines a near repeat incident. By default this value is set to 7 days. |
tz |
Time zone for which the area being examined. By default this value is assigned as the same time zone of the system. For more information about time zones within R, see https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/timezones. |
Returns a list of all near repeat series identified within the input data as igraph graph objects. This list can be used to generate plots of each series and to discern the near repeat linkages between the crime incidents.
Jamie Spaulding, Keith Morris
data(crimes) nr_data <- head(crimes, n = 1000) #truncate dataset for near repeat analysis out <- near_repeat_analysis(data=nr_data,tz="America/Chicago",epsg="32616")
data(crimes) nr_data <- head(crimes, n = 1000) #truncate dataset for near repeat analysis out <- near_repeat_analysis(data=nr_data,tz="America/Chicago",epsg="32616")
This function performs an evaluation of given crime incidents to reccomend parameters for near repeat analysis. A series of time and distance parameters are tested using a full factorial design using the set of incident locations to determine the frequency of occurrence given each set of parameters. The results of the full factorial assessment are then modeled through interpolation and the second derivative is calculated to determine the inflection point. The inflection point represents the change in frequency of detected incidents which near repeat. Determination of the inflection point is completed for both the time and distance domains.
near_repeat_eval(data, epsg, tz = NULL)
near_repeat_eval(data, epsg, tz = NULL)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
epsg |
The EPSG Geodetic Parameter code for the area being considered. The EPSG code is used for identifying projections and performing coordinate transformations. If needed, the EPSG for an area can be found at https://spatialreference.org. |
tz |
Time zone for which the area being examined. By default this value is assigned as the same time zone of the system. For more information about time zones within R, see https://www.rdocumentation.org/packages/base/versions/3.6.1/topics/timezones. |
Returns a data frame with one instance (row) of two fields (columns). The fields are: distance and time. The instance indicates the optimal near repeat parameters for each. Note that distance is given in meters and time is given as days.
Jamie Spaulding, Keith Morris
data(crimes) nr_dat <- subset(crimes, crimes$primary_type == "BURGLARY") pars <- near_repeat_eval(data=nr_dat, tz="America/Chicago", epsg="32616") pars
data(crimes) nr_dat <- subset(crimes, crimes$primary_type == "BURGLARY") pars <- near_repeat_eval(data=nr_dat, tz="America/Chicago", epsg="32616") pars
This function transforms daily crime count data and plots the resultant components of a time series which has been decomposed into seasonal, trend, and irregular components using Loess smoothing. Holt Winters exponential smoothing is also performed for inproved trend resolution since data is in a daily format.
ts_daily_decomp(data, start)
ts_daily_decomp(data, start)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
start |
Start date for the time series being analyzed. The format is as follows: c('year', 'month', 'day'). See example below for reference. |
Returns an object of class "stl" with the following components:
time.series: a multiple time series with columns seasonal, trend and remainder.
weights: the final robust weights (all one if fitting is not done robustly).
call: the matched call.
win: integer (length 3 vector) with the spans used for the "s", "t", and "l" smoothers.
deg: integer (length 3) vector with the polynomial degrees for these smoothers.
jump: integer (length 3) vector with the 'jumps' (skips) used for these smoothers.
inner: number of inner iterations
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) test <- ts_daily_decomp(data = crimes, start = c(2017, 1, 1)) plot(test)
#Using provided dataset from Chicago Data Portal: data(crimes) test <- ts_daily_decomp(data = crimes, start = c(2017, 1, 1)) plot(test)
This function transforms traditional crime data into a time series and forecasts future incident counts based on the input data over a specified duration. The forecast is computed using simple exponential smoothing with additive errors. Returned is a plot of the time series, trend, and the upper and lower prediction limits for the forecast.
ts_forecast(data, start, duration = NULL)
ts_forecast(data, start, duration = NULL)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
start |
Start date for the time series being analyzed. The format is as follows: c('year', 'month', 'day'). See example below for reference. |
duration |
Number of days for the forecast. If |
Returns a plot of the time series entered (black), a forecast over the specified duration (blue), the exponentially smoothed trend for both the input data (red) and forecast (orange), and the upper and lower bounds for the prediction interval (grey).
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) ts_forecast(crimes, start = c(2017, 1, 1))
#Using provided dataset from Chicago Data Portal: data(crimes) ts_forecast(crimes, start = c(2017, 1, 1))
This function transforms traditional crime data and plots the resultant components of a time series which has been decomposed into seasonal, trend and irregular components using Loess smoothing.
ts_month_decomp(data, start)
ts_month_decomp(data, start)
data |
Data frame of crime or RMS data. See provided Chicago Data Portal example for reference |
start |
The year in which the time series data starts. The time series is assumed to be composed of solely monthly count data |
Returns an object of class "stl" with the following components:
time.series: a multiple time series with columns seasonal, trend and remainder.
weights: the final robust weights (all one if fitting is not done robustly).
call: the matched call.
win: integer (length 3 vector) with the spans used for the "s", "t", and "l" smoothers.
deg: integer (length 3) vector with the polynomial degrees for these smoothers.
jump: integer (length 3) vector with the 'jumps' (skips) used for these smoothers.
inner: number of inner iterations
Jamie Spaulding, Keith Morris
#Using provided dataset from Chicago Data Portal: data(crimes) test <- ts_month_decomp(crimes, 2017) plot(test)
#Using provided dataset from Chicago Data Portal: data(crimes) test <- ts_month_decomp(crimes, 2017) plot(test)