5.2 JSON data

Data provided via web APIs is often made available in JSON (JavaScript Object Notation) format, a simple human readable text format for storing hierarchical data.

There are a few R packages that parse JSON data, but jsonlite is our package of choice. Like readr and sf, jsonlite can read data directly from a URL.

Data for for the following examples is sourced from the Australian Bureau of Meteorology’s Latest Weather Observations for Melbourne Airport, made available directly from their website in JSON format.

# Bureau of Meteorology - Latest Weather Observations for Melbourne Airport
bom_url <- "http://www.bom.gov.au/fwo/IDV60901/IDV60901.94866.json"

bom_data <- jsonlite::fromJSON(bom_url)

jsonlite has an intuitive mapping from JSON data types to R. The object returned from the BoM site is a list containing various pieces of metadata alongside our data of interest.

Check out the original URL to see the JSON data that has been mapped to this R structure.

str(bom_data)
#> List of 1
#>  $ observations:List of 3
#>   ..$ notice:'data.frame':   1 obs. of  4 variables:
#>   .. ..$ copyright     : chr "Copyright Commonwealth of Australia 2020, Bureau of Meteorology. For more information see: http://www.bom.gov.a"| __truncated__
#>   .. ..$ copyright_url : chr "http://www.bom.gov.au/other/copyright.shtml"
#>   .. ..$ disclaimer_url: chr "http://www.bom.gov.au/other/disclaimer.shtml"
#>   .. ..$ feedback_url  : chr "http://www.bom.gov.au/other/feedback"
#>   ..$ header:'data.frame':   1 obs. of  8 variables:
#>   .. ..$ refresh_message: chr "Issued at  2:11 pm EDT Tuesday  1 December 2020"
#>   .. ..$ ID             : chr "IDV60901"
#>   .. ..$ main_ID        : chr "IDV60900"
#>   .. ..$ name           : chr "Melbourne Airport"
#>   .. ..$ state_time_zone: chr "VIC"
#>   .. ..$ time_zone      : chr "EDT"
#>   .. ..$ product_name   : chr "Capital City Observations"
#>   .. ..$ state          : chr "Victoria"
#>   ..$ data  :'data.frame':   161 obs. of  35 variables:
#>   .. ..$ sort_order          : int [1:161] 0 1 2 3 4 5 6 7 8 9 ...
#>   .. ..$ wmo                 : int [1:161] 94866 94866 94866 94866 94866 94866 94866 94866 94866 94866 ...
#>   .. ..$ name                : chr [1:161] "Melbourne Airport" "Melbourne Airport" "Melbourne Airport" "Melbourne Airport" ...
#>   .. ..$ history_product     : chr [1:161] "IDV60901" "IDV60901" "IDV60901" "IDV60901" ...
#>   .. ..$ local_date_time     : chr [1:161] "01/02:00pm" "01/01:30pm" "01/01:00pm" "01/12:30pm" ...
#>   .. ..$ local_date_time_full: chr [1:161] "20201201140000" "20201201133000" "20201201130000" "20201201123000" ...
#>   .. ..$ aifstime_utc        : chr [1:161] "20201201030000" "20201201023000" "20201201020000" "20201201013000" ...
#>   .. ..$ lat                 : num [1:161] -37.7 -37.7 -37.7 -37.7 -37.7 -37.7 -37.7 -37.7 -37.7 -37.7 ...
#>   .. ..$ lon                 : num [1:161] 145 145 145 145 145 ...
#>   .. ..$ apparent_t          : num [1:161] 18.8 17.3 15.3 14 14.1 14.2 13.9 15.6 17.5 19.4 ...
#>   .. ..$ cloud               : chr [1:161] "Mostly clear" "Mostly clear" "Mostly clear" "Cloudy" ...
#>   .. ..$ cloud_base_m        : int [1:161] 600 600 600 3300 3300 2610 2500 1110 1110 2340 ...
#>   .. ..$ cloud_oktas         : int [1:161] 1 1 1 8 8 3 8 1 1 3 ...
#>   .. ..$ cloud_type_id       : int [1:161] 6 6 6 NA NA NA 35 6 6 NA ...
#>   .. ..$ cloud_type          : chr [1:161] "Stratocumulus" "Stratocumulus" "Stratocumulus" "-" ...
#>   .. ..$ delta_t             : num [1:161] 4.5 4 3.2 2.7 2.8 3.3 3.6 7.5 7.4 7.7 ...
#>   .. ..$ gust_kmh            : int [1:161] 33 33 39 39 37 52 52 59 50 59 ...
#>   .. ..$ gust_kt             : int [1:161] 18 18 21 21 20 28 28 32 27 32 ...
#>   .. ..$ air_temp            : num [1:161] 21.7 20.4 19.4 18.5 18.6 19.3 19.5 23.1 24.2 25 ...
#>   .. ..$ dewpt               : num [1:161] 14 13.5 13.9 13.8 13.7 13.6 13.2 9.3 11.1 11.5 ...
#>   .. ..$ press               : num [1:161] 1005 1006 1005 1004 1004 ...
#>   .. ..$ press_qnh           : num [1:161] 1006 1006 1005 1005 1005 ...
#>   .. ..$ press_msl           : num [1:161] 1005 1006 1005 1004 1004 ...
#>   .. ..$ press_tend          : chr [1:161] "-" "-" "-" "-" ...
#>   .. ..$ rain_trace          : chr [1:161] "0.0" "0.0" "0.0" "0.0" ...
#>   .. ..$ rel_hum             : int [1:161] 61 64 70 74 73 69 67 41 43 42 ...
#>   .. ..$ sea_state           : chr [1:161] "-" "-" "-" "-" ...
#>   .. ..$ swell_dir_worded    : chr [1:161] "-" "-" "-" "-" ...
#>   .. ..$ swell_height        : logi [1:161] NA NA NA NA NA NA ...
#>   .. ..$ swell_period        : logi [1:161] NA NA NA NA NA NA ...
#>   .. ..$ vis_km              : chr [1:161] "10" "10" "10" "10" ...
#>   .. ..$ weather             : chr [1:161] "-" "-" "-" "Showers" ...
#>   .. ..$ wind_dir            : chr [1:161] "N" "NNW" "N" "NNE" ...
#>   .. ..$ wind_spd_kmh        : int [1:161] 22 22 28 30 30 33 35 39 37 32 ...
#>   .. ..$ wind_spd_kt         : int [1:161] 12 12 15 16 16 18 19 21 20 17 ...

JSON data is often hierarchically structured or nested in this way, and you’ll need to work your way through the structure to get to the data you need.

class(bom_data)
#> [1] "list"
names(bom_data)
#> [1] "observations"
names(bom_data$observations)
#> [1] "notice" "header" "data"

class(bom_data$observations$data)
#> [1] "data.frame"
bom_data$observations$data %>% as_tibble()
#> # A tibble: 161 x 35
#>    sort_order   wmo name  history_product local_date_time local_date_time…
#>         <int> <int> <chr> <chr>           <chr>           <chr>           
#>  1          0 94866 Melb… IDV60901        01/02:00pm      20201201140000  
#>  2          1 94866 Melb… IDV60901        01/01:30pm      20201201133000  
#>  3          2 94866 Melb… IDV60901        01/01:00pm      20201201130000  
#>  4          3 94866 Melb… IDV60901        01/12:30pm      20201201123000  
#>  5          4 94866 Melb… IDV60901        01/12:27pm      20201201122700  
#>  6          5 94866 Melb… IDV60901        01/12:03pm      20201201120300  
#>  7          6 94866 Melb… IDV60901        01/12:00pm      20201201120000  
#>  8          7 94866 Melb… IDV60901        01/11:37am      20201201113700  
#>  9          8 94866 Melb… IDV60901        01/11:30am      20201201113000  
#> 10          9 94866 Melb… IDV60901        01/11:14am      20201201111400  
#> # … with 151 more rows, and 29 more variables: aifstime_utc <chr>, lat <dbl>,
#> #   lon <dbl>, apparent_t <dbl>, cloud <chr>, cloud_base_m <int>,
#> #   cloud_oktas <int>, cloud_type_id <int>, cloud_type <chr>, delta_t <dbl>,
#> #   gust_kmh <int>, gust_kt <int>, air_temp <dbl>, dewpt <dbl>, press <dbl>,
#> #   press_qnh <dbl>, press_msl <dbl>, press_tend <chr>, rain_trace <chr>,
#> #   rel_hum <int>, sea_state <chr>, swell_dir_worded <chr>, swell_height <lgl>,
#> #   swell_period <lgl>, vis_km <chr>, weather <chr>, wind_dir <chr>,
#> #   wind_spd_kmh <int>, wind_spd_kt <int>

Just for fun, here’s the temperature as a line graph.

library(ggplot2)
bom_data$observations$data %>%
  ggplot(aes(x = lubridate::as_datetime(local_date_time_full), y = apparent_t)) +
  geom_line() +
  theme_minimal() +
  theme(legend.position = "bottom",
        axis.title = element_blank(),
        legend.title = element_blank()) +
  labs(title = "Melbourne Airport, Apparent Temperature (celsius)",
       caption = "Source: Bureau of Meteorology")