Provides access to the US Census Bureau batch endpoints for locations and geographies. The function implements iteration and optional parallelization in order to geocode datasets larger than the API limit of 1,000 and more efficiently than sending 10,000 per request. It also supports multiple outputs, including (optionally, if sf is installed,) sf class objects.

cxy_geocode(
  .data,
  id = NULL,
  street,
  city = NULL,
  state = NULL,
  zip = NULL,
  return = "locations",
  benchmark = "Public_AR_Current",
  vintage = NULL,
  timeout = 30,
  parallel = 1,
  class = "dataframe",
  output = "simple"
)

Arguments

.data

data.frame containing columns with structured address data

id

Optional String - Name of column containing unique ID

street

String - Name of column containing street address

city

Optional String - Name of column containing city

state

Optional String - Name of column containing state

zip

Optional String - Name of column containing zip code

return

One of 'locations' or 'geographies' denoting returned information from the API. If you would like Census geography data, you must specify a valid vintage for your benchmark.

benchmark

Optional Census benchmark to geocode against. To obtain current valid benchmarks, use the cxy_benchmarks() function.

vintage

Optional Census vintage to geocode against. You may use the cxy_vintages() function to obtain valid vintages.

timeout

Numeric, in minutes, how long until request times out

parallel

Integer, number of cores greater than one if parallel requests are desired. All operating systems now use a SOCK cluster, and the dependencies are not longer suggested packages. Instead, they are installed by default. Note that this value may not represent more cores than the system reports are available. If it is larger, the maximum number of available cores will be used.

class

One of 'dataframe' or 'sf' denoting the output class. 'sf' will only return matched addresses.

output

One of 'simple' or 'full' denoting the returned columns. Simple returns just coordinates.

Value

A data.frame or sf object containing geocoded results

Details

Parallel requests are supported across platforms. If supported (POSIX platforms) the process is forked, otherwise a SOCK cluster is used (Windows). You may not specify more cores than the system reports are available

Examples

# load data
x <- stl_homicides[1:10,]

# geocode
cxy_geocode(x, street = 'street_address', city = 'city', state = 'state', zip = 'postal_code',
   return = 'locations', class = 'dataframe', output = 'simple')
#>              street_address year             date state postal_code      city
#> 9            5738 Terry Ave 2008 01/12/2008 12:37    MO          NA St. Louis
#> 7            5356 Page Blvd 2008 01/17/2008 04:00    MO          NA St. Louis
#> 10        5826 Roosevelt Pl 2008 01/20/2008 21:19    MO          NA St. Louis
#> 4             3859 Ohio Ave 2008 01/21/2008 17:38    MO          NA St. Louis
#> 5      4100 Saint Louis Ave 2008 01/30/2008 15:34    MO          NA St. Louis
#> 3         2418 N Euclid Ave 2008 01/30/2008 19:19    MO          NA St. Louis
#> 2            1646 S 39th St 2008 02/04/2008 17:45    MO          NA St. Louis
#> 8          5617 Enright Ave 2008 02/09/2008 17:30    MO          NA St. Louis
#> 6  5001 N Kingshighway Blvd 2008 02/09/2008 22:59    MO          NA St. Louis
#> 1             1500 Cass Ave 2008 02/11/2008 21:50    MO          NA St. Louis
#>      cxy_lon  cxy_lat
#> 9  -90.27411 38.67785
#> 7  -90.27331 38.66157
#> 10 -90.27592 38.67959
#> 4  -90.22941 38.58544
#> 5  -90.23151 38.66029
#> 3  -90.25564 38.66613
#> 2  -90.24485 38.61837
#> 8  -90.28238 38.65493
#> 6  -90.24467 38.68689
#> 1  -90.19742 38.64184