How close am I to an Amazon, Wal-Mart or Target Warehouse? Visualizing geospatial Information in R.

This post was originally published at ODSC. Check out their curated blog posts!

 

In the last post you saw the “Traveling Santa Problem” which attempts to efficiently route Santa to all US warehouses from Amazon, Wal-Mart and Target.  Now that Santa has made his deliveries I want to know where to go to pick up my gift.  My wish list has a ticket to ODSC West and an Amazon Echo on it so I am hoping he dropped it off on his “Traveling Salesperson Problem” route shown below.

Santa’s route brought him close to my home in Boston to drop off my ODSC ticket and Echo!  Learn how to calculate an efficient route between a set of points, in this case warehouses, in the last post.

Set Up

If you have read any of my other blog posts you have probably seen these packages before.  Just in case I will run through them quickly.  The ggmap library is used to grab information from Google Maps API and the outputs can be used in ggplot2 visualizations.  So I also use ggplot2 and a helper package called ggthemes for predefined aesthetics.  The last two packages, sp and leaflet are less typical but very useful for geospatial data.  The sp library provides functions for create spatial data frame objects and manipulating them.  The leaflet package is a wrapper for the JavaScript leaflet package and will create an interactive map of warehouses.

library(ggmap)

library(ggplot2)

library(ggthemes)

library(sp)

library(leaflet)

I specify the decimals points should print out to the 15th decimal point.  This is helpful when examining latitude and longitude coordinates.

options(digits=15)

Get the Data & Geocode Your Location

Grab the CSV file here.  Once you have it downloaded, put it into your working directory.  In setwd, enter the folder path and change the slashes if you are on Windows. When I was first starting out I always forgot to switch my slashes!  One of my favorite functions is dir.  This will return any files in the folder that meet a pattern condition.    Once the files are printed to your console simply copy/paste the file you want to load into the read.csv function.  Often this is faster than manually typing a file name.  After the warehouses object is created you need to change the vector classes so here is an example of nesting as.character and as.numeric on $sq_ft.  This will change the vector from a factor to text then to a number.  Withoutas.character the factor is changed to numeric in a sequence in which is appears in the data set.

setwd('C:/Users/someUser/Desktop/Folder')

dir(pattern='*.csv')

warehouses<-read.csv('warehouse_stats_w_latlon.csv')

warehouses$sq_ft<-as.numeric(as.character(warehouses$sq_ft))

I live in Boston, Massachusetts so I need to get the exact latitude and longitude combination.  Luckily  the geocode function from ggmap accepts a text location to search the google maps API.  Keep in mind there is a 2500 geocoding limit per day so try not to geocode a large vector or you could try another service like Bing Maps (25000 daily limit).  To see the returned object call loc.coords in your console. The object is a small data frame with lon and lat columns and the corresponding values for “Boston, Ma.”

my.loc<-'boston, ma'

loc.coords<-geocode(my.loc)

loc.coords

Since I manually collected the warehouse information some of it was hard to come by and verify.  Sometimes I had to enter NA but this can cause problems in some functions.  To remove lat/lon combinations with NA use subset.  Subset accepts the data frame to be subset and a logical condition.  In this example the logical expression uses complete.cases applied to the $lat vector.  This function will return a logical vector TRUE or FALSE is the row is complete (has no missing or NA values).    So complete.latlon is a subsection of the original data frame where $lat values are complete, containing a number.

complete.latlon<-subset(warehouses, complete.cases(warehouses$lat)==T)

Find my Gift at a Warehouse! Closest Point to My Location

Using SpatialPoints you can tell R that the $lat and $lon vectors are not independent but in fact coordinate pairs.  The function accepts a data frame of coordinate pairs and then projection and coordinate system information.  The projection is specified as longlat and the coordinate system is WGS84.  These are standard for most maps but there are other map projections you may need to specify in other instances.

warehouses.coords <- SpatialPoints(coords = data.frame(lon=complete.latlon$lon,                                          

lat=complete.latlon$lat), proj4string=CRS("+proj=longlat +datum=WGS84"))

Next I need to create another SpatialPoints object so the distances can be calculated between the two objects.  I am only concerned with my location, Boston Ma, but you could construct a larger data frame for multiple point calculations.  This code follows the same pattern as warehouses.coords.

my.coords <- SpatialPoints(coords = data.frame(loc.coords),

                 proj4string=CRS("+proj=longlat +datum=WGS84"))

Calculating the distance between each warehouse and Boston MA is easy using spDists.  In this code I am adding a new column to the original complete data called $distance.  The spDists function accepts the first coordinate pair object, warehouses.coords then the second my.coords.  Since the second only contains a single value, it is recycled for each warehouse.  The last parameter specifies if the coordinate pairs are part of the longlat projection.  If longlat=F then Euclidean distance is returned, like you probably did in geometry class.  Here longlat=T so the “great circles” distance is calculated.  This is a truer measure of distance to points on the spherical Earth.  In the end the $distance vector represents the kilometer distance between my location, Boston, and each individual amazon, Wal-Mart or Target warehouse.

complete.latlon$distance <- spDists(warehouses.coords, my.coords, longlat=T)

The next code quickly identifies the warehouse with the minimum distance to my location.  This is done with which.min while indexing the rows of complete.latlon.  which.min returns the row position of the minimum value in the vector.  This integer can be used to index the rows within square brackets.  I created closest.warehouse and call it in the console to get the information for the single closest location.

closest.warehouse<-complete.latlon[which.min(complete.latlon$distance),]

closest.warehouse

I need to go pick up my ODSC ticket and Echo at an Amazon Food Warehouse at 201 Beacham Street in Everett MA.  If I had a team of flying reindeers this location would only be 3.8 kilometers from my location.

Unfortunately, I have to take a car so I need routing information.  The ggmap library provides a method for accessing the Google route API with route.  To get turn by turn directions I apply as.character to the $location of the single closest warehouse.   In this example 201 Beacham St Everett MA will be passed to route.

warehouse.address<-as.character(closest.warehouse$location)

To get the route data frame with directions pass in a starting location as text such as “Boston, MA” and an ending location, “201 Beacham St Everett MA.”  The returned data frame has meters, kilometers, miles, duration and lat/lon turning points for my ride.

route.df<-route(my.loc, warehouse.address, structure = 'route')

A great way to quickly get a map is with qmap.  This is similar to the ggplot2 shortcut function qplot.  In this code pass in the map location, my.loc and a zoom integer.  Then I add a geom_path layer.  Keep in mind geom_path plots line segments in the order they appear in the data frame.  In contrast geom_line will plot them in the order of the X axis.  You can see the different by changing the function in the layer.  The geom_path needs the route data frame with aesthetics for x and y.  Then I added a thick red line so I could see the route on the map easily.

qmap(my.loc, zoom = 13) +  

  geom_path(data = route.df, aes(= lon, y = lat),  colour = "red", size = 1.5, lineend = "round")

The route from my location to the nearest Amazon warehouse for my gifts!

Next I will want some idea of what the building looks like.  One way to find that is by calling a satellite image of the destination from Google Maps.  The way ggmaps works is to first get the base layer map.  Here I am creating a base called ind.warehouse.  Using get_googlemap I specify a center of the map as the lat/lon pair for the warehouse.  Next the scale for pixel resolution and most importantly maptype are declared.  I want an actual picture so I pass in “satellite” but you could use “terrain”, “roadmap”, or “hybrid” for other types.  Then I specify 17 as the zoom integer so the image is close up.  The last parameter creates a small data frame with coordinates for a marker at 201 Beacham Street.  In the end this function will retrieve the appropriate map tile to use as my base layer.

ind.warehouse <- get_googlemap(center = c(lon = closest.warehouse$lon, lat =closest.warehouse$lat),

scale = 2, maptype = 'satellite', zoom=17,

markers = data.frame(lon = closest.warehouse$lon,

lat = closest.warehouse$lat))

Since I just want to plot the picture I simply call it within ggmap.

ggmap(ind.warehouse, extent = 'device')

Not surprisingly the Amazon Warehouse is a large building…containing my gifts!  

Explore all Amazon, Wal-Mart & Target Warehouses with LEAFLET

Leaflet is an open source JavaScript library for creating interactive maps.  It is lightweight and can be rendered on a mobile browser so it is useful.  R has a package called leaflet that creates the HTML with JavaScript for these maps.  To learn more about leaflet check out the project’s webpage.

The leaflet package authors use the %>%, “pipe,” operator to pass objects to functions.  This is useful to create clear and concise code.  However, if you are just learning this can be difficult to understand.  Instead I create a map following the same linear construction as you have seen in this post but you will want to get used to using %>% as you become more familiar with the leaflet functions.

To begin pass in the complete lat/lon records to leaflet.  The leaflet function cannot handle NA values.  This function creates an HTMl “widget” that will ultimately render my map.  The dynamic.map HTML widget is just HTML without “tiles.”  The tiles layer of the map contains the background map images.

dynamic.map<-leaflet(complete.latlon)

dynamic.map<- addTiles(dynamic.map)

Calling dynamic.map in R Studio will open up the Viewer image of the Earth projected flat.  I need to add one more layer containing the warehouse information so it can be plotted on top of the tiles.  Before plotting the map with markers I want to color code markers by retailer.  For example, orange dots will represent Amazon, blues will be Wal-Mart or Sam’s Club and red will be Target warehouse locations.  So I first create a color palette mapping to the unique factor levels of $retailer.  To do so I use colorFactor, declare my colors and then the domain containing the data for assignment.

 

Now that you have the data set, check out the rest of this post at ODSC!

This entry was posted in blog posts, ODSC and tagged , . Bookmark the permalink.