Provides access to the US Census Bureau batch endpoints for locations and
geographies. The function implements iteration and optional parallelization
in order to geocode datasets larger than the API limit of 1,000 and more
efficiently than sending 10,000 per request. It also supports multiple outputs,
including (optionally, if sf
is installed,) sf
class objects.
cxy_geocode(
.data,
id = NULL,
street,
city = NULL,
state = NULL,
zip = NULL,
return = "locations",
benchmark = "Public_AR_Current",
vintage = NULL,
timeout = 30,
parallel = 1,
class = "dataframe",
output = "simple"
)
data.frame containing columns with structured address data
Optional String - Name of column containing unique ID
String - Name of column containing street address
Optional String - Name of column containing city
Optional String - Name of column containing state
Optional String - Name of column containing zip code
One of 'locations' or 'geographies' denoting returned information from the API. If you would like Census geography data, you must specify a valid vintage for your benchmark.
Optional Census benchmark to geocode against. To obtain current
valid benchmarks, use the cxy_benchmarks()
function.
Optional Census vintage to geocode against. You may use the
cxy_vintages()
function to obtain valid vintages.
Numeric, in minutes, how long until request times out
Integer, number of cores greater than one if parallel requests are desired. All operating systems now use a SOCK cluster, and the dependencies are not longer suggested packages. Instead, they are installed by default. Note that this value may not represent more cores than the system reports are available. If it is larger, the maximum number of available cores will be used.
One of 'dataframe' or 'sf' denoting the output class. 'sf' will only return matched addresses.
One of 'simple' or 'full' denoting the returned columns. Simple returns just coordinates.
A data.frame or sf object containing geocoded results
Parallel requests are supported across platforms. If supported (POSIX platforms) the process is forked, otherwise a SOCK cluster is used (Windows). You may not specify more cores than the system reports are available
# load data
x <- stl_homicides[1:10,]
# geocode
cxy_geocode(x, street = 'street_address', city = 'city', state = 'state', zip = 'postal_code',
return = 'locations', class = 'dataframe', output = 'simple')
#> street_address year date state postal_code city
#> 9 5738 Terry Ave 2008 01/12/2008 12:37 MO NA St. Louis
#> 7 5356 Page Blvd 2008 01/17/2008 04:00 MO NA St. Louis
#> 10 5826 Roosevelt Pl 2008 01/20/2008 21:19 MO NA St. Louis
#> 4 3859 Ohio Ave 2008 01/21/2008 17:38 MO NA St. Louis
#> 5 4100 Saint Louis Ave 2008 01/30/2008 15:34 MO NA St. Louis
#> 3 2418 N Euclid Ave 2008 01/30/2008 19:19 MO NA St. Louis
#> 2 1646 S 39th St 2008 02/04/2008 17:45 MO NA St. Louis
#> 8 5617 Enright Ave 2008 02/09/2008 17:30 MO NA St. Louis
#> 6 5001 N Kingshighway Blvd 2008 02/09/2008 22:59 MO NA St. Louis
#> 1 1500 Cass Ave 2008 02/11/2008 21:50 MO NA St. Louis
#> cxy_lon cxy_lat
#> 9 -90.27411 38.67785
#> 7 -90.27331 38.66157
#> 10 -90.27592 38.67959
#> 4 -90.22941 38.58544
#> 5 -90.23151 38.66029
#> 3 -90.25564 38.66613
#> 2 -90.24485 38.61837
#> 8 -90.28238 38.65493
#> 6 -90.24467 38.68689
#> 1 -90.19742 38.64184