Introduction to PHEindicatormethods

Introduction

This vignette introduces the following functions from the PHEindicatormethods package and provides basic sample code to demonstrate their execution. The code included is based on the code provided within the ‘examples’ section of the function documentation. This vignette does not explain the methods applied in detail but these can (optionally) be output alongside the statistics or for a more detailed explanation, please see the references section of the function documentation.

The following packages must be installed and loaded if not already available

library(PHEindicatormethods)
library(dplyr)

Package functions

This vignette covers the following core functions available within PHEindicatormethods:

Function	Type	Description
phe_proportion	Non-aggregate	Performs a calculation on each row of data (unless data is grouped)
phe_rate	Non-aggregate	Performs a calculation on each row of data (unless data is grouped)

phe_mean	Aggregate	Performs a calculation on each grouping set

calculate_dsr	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRatio	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs
calculate_ISRate	Aggregate, standardised	Performs a calculation on each grouping set and requires additional reference inputs

Other functions are introduced in separate vignettes.

Non-aggregate functions

Create some test data for the non-aggregate functions

The following code chunk creates a data frame containing observed number of events and populations for 4 geographical areas over 2 time periods that is used later to demonstrate the PHEindicatormethods package functions:

df <- data.frame(
        area = rep(c("Area1","Area2","Area3","Area4"), 2),
        year = rep(2015:2016, each = 4),
        obs = sample(100, 2 * 4, replace = TRUE),
        pop = sample(100:200, 2 * 4, replace = TRUE))
df
#>    area year obs pop
#> 1 Area1 2015  81 196
#> 2 Area2 2015  96 117
#> 3 Area3 2015  99 101
#> 4 Area4 2015  59 169
#> 5 Area1 2016  41 131
#> 6 Area2 2016  67 162
#> 7 Area3 2016  53 156
#> 8 Area4 2016  57 183

Execute phe_proportion and phe_rate

INPUT: The phe_proportion and phe_rate functions take a single data frame as input with columns representing the numerators and denominators for the statistic. Any other columns present will be retained in the output.

OUTPUT: The functions output the original data frame with additional columns appended. By default the additional columns are the proportion or rate, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The functions also accept additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

Here are some example code chunks to demonstrate these two functions and the arguments that can optionally be specified

# default proportion
phe_proportion(df, obs, pop)
#>    area year obs pop     value   lowercl   uppercl confidence       statistic
#> 1 Area1 2015  81 196 0.4132653 0.3466405 0.4832246        95% proportion of 1
#> 2 Area2 2015  96 117 0.8205128 0.7411469 0.8795010        95% proportion of 1
#> 3 Area3 2015  99 101 0.9801980 0.9306538 0.9945527        95% proportion of 1
#> 4 Area4 2015  59 169 0.3491124 0.2813212 0.4236107        95% proportion of 1
#> 5 Area1 2016  41 131 0.3129771 0.2398571 0.3967532        95% proportion of 1
#> 6 Area2 2016  67 162 0.4135802 0.3406029 0.4905612        95% proportion of 1
#> 7 Area3 2016  53 156 0.3397436 0.2700705 0.4171195        95% proportion of 1
#> 8 Area4 2016  57 183 0.3114754 0.2488361 0.3818668        95% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify confidence level for proportion
phe_proportion(df, obs, pop, confidence = 99.8)
#>    area year obs pop     value   lowercl   uppercl confidence       statistic
#> 1 Area1 2015  81 196 0.4132653 0.3110811 0.5235087      99.8% proportion of 1
#> 2 Area2 2015  96 117 0.8205128 0.6881684 0.9044849      99.8% proportion of 1
#> 3 Area3 2015  99 101 0.9801980 0.8804309 0.9970039      99.8% proportion of 1
#> 4 Area4 2015  59 169 0.3491124 0.2466454 0.4677196      99.8% proportion of 1
#> 5 Area1 2016  41 131 0.3129771 0.2041482 0.4472203      99.8% proportion of 1
#> 6 Area2 2016  67 162 0.4135802 0.3020982 0.5346836      99.8% proportion of 1
#> 7 Area3 2016  53 156 0.3397436 0.2348608 0.4631148      99.8% proportion of 1
#> 8 Area4 2016  57 183 0.3114754 0.2172708 0.4243798      99.8% proportion of 1
#>   method
#> 1 Wilson
#> 2 Wilson
#> 3 Wilson
#> 4 Wilson
#> 5 Wilson
#> 6 Wilson
#> 7 Wilson
#> 8 Wilson

# specify multiplier to output proportions as percentages
phe_proportion(df, obs, pop, multiplier = 100)
#>    area year obs pop    value  lowercl  uppercl confidence  statistic method
#> 1 Area1 2015  81 196 41.32653 34.66405 48.32246        95% percentage Wilson
#> 2 Area2 2015  96 117 82.05128 74.11469 87.95010        95% percentage Wilson
#> 3 Area3 2015  99 101 98.01980 93.06538 99.45527        95% percentage Wilson
#> 4 Area4 2015  59 169 34.91124 28.13212 42.36107        95% percentage Wilson
#> 5 Area1 2016  41 131 31.29771 23.98571 39.67532        95% percentage Wilson
#> 6 Area2 2016  67 162 41.35802 34.06029 49.05612        95% percentage Wilson
#> 7 Area3 2016  53 156 33.97436 27.00705 41.71195        95% percentage Wilson
#> 8 Area4 2016  57 183 31.14754 24.88361 38.18668        95% percentage Wilson

# specify multiplier for proportion, confidence level and remove metadata columns
phe_proportion(df, obs, pop, confidence = 99.8, multiplier = 100, type = "standard")
#>    area year obs pop    value  lowercl  uppercl
#> 1 Area1 2015  81 196 41.32653 31.10811 52.35087
#> 2 Area2 2015  96 117 82.05128 68.81684 90.44849
#> 3 Area3 2015  99 101 98.01980 88.04309 99.70039
#> 4 Area4 2015  59 169 34.91124 24.66454 46.77196
#> 5 Area1 2016  41 131 31.29771 20.41482 44.72203
#> 6 Area2 2016  67 162 41.35802 30.20982 53.46836
#> 7 Area3 2016  53 156 33.97436 23.48608 46.31148
#> 8 Area4 2016  57 183 31.14754 21.72708 42.43798

# default rate
phe_rate(df, obs, pop)
#>    area year obs pop    value  lowercl   uppercl confidence       statistic
#> 1 Area1 2015  81 196 41326.53 32818.14  51365.73        95% rate per 100000
#> 2 Area2 2015  96 117 82051.28 66460.15 100199.65        95% rate per 100000
#> 3 Area3 2015  99 101 98019.80 79663.78 119336.73        95% rate per 100000
#> 4 Area4 2015  59 169 34911.24 26574.52  45033.75        95% rate per 100000
#> 5 Area1 2016  41 131 31297.71 22457.31  42459.98        95% rate per 100000
#> 6 Area2 2016  67 162 41358.02 32050.42  52524.03        95% rate per 100000
#> 7 Area3 2016  53 156 33974.36 25447.36  44440.18        95% rate per 100000
#> 8 Area4 2016  57 183 31147.54 23589.40  40355.99        95% rate per 100000
#>   method
#> 1  Byars
#> 2  Byars
#> 3  Byars
#> 4  Byars
#> 5  Byars
#> 6  Byars
#> 7  Byars
#> 8  Byars

# specify multiplier for rate and confidence level
phe_rate(df, obs, pop, confidence = 99.8, multiplier = 100)
#>    area year obs pop    value  lowercl   uppercl confidence    statistic method
#> 1 Area1 2015  81 196 41.32653 28.56564  57.58885      99.8% rate per 100  Byars
#> 2 Area2 2015  96 117 82.05128 58.57015 111.38729      99.8% rate per 100  Byars
#> 3 Area3 2015  99 101 98.01980 70.35481 132.46502      99.8% rate per 100  Byars
#> 4 Area4 2015  59 169 34.91124 22.51774  51.38103      99.8% rate per 100  Byars
#> 5 Area1 2016  41 131 31.29771 18.31404  49.56651      99.8% rate per 100  Byars
#> 6 Area2 2016  67 162 41.35802 27.46961  59.49212      99.8% rate per 100  Byars
#> 7 Area3 2016  53 156 33.97436 21.34022  51.03067      99.8% rate per 100  Byars
#> 8 Area4 2016  57 183 31.14754 19.92331  46.13788      99.8% rate per 100  Byars

# specify multiplier for rate, confidence level and remove metadata columns
phe_rate(df, obs, pop, type = "standard", confidence = 99.8, multiplier = 100)
#>    area year obs pop    value  lowercl   uppercl
#> 1 Area1 2015  81 196 41.32653 28.56564  57.58885
#> 2 Area2 2015  96 117 82.05128 58.57015 111.38729
#> 3 Area3 2015  99 101 98.01980 70.35481 132.46502
#> 4 Area4 2015  59 169 34.91124 22.51774  51.38103
#> 5 Area1 2016  41 131 31.29771 18.31404  49.56651
#> 6 Area2 2016  67 162 41.35802 27.46961  59.49212
#> 7 Area3 2016  53 156 33.97436 21.34022  51.03067
#> 8 Area4 2016  57 183 31.14754 19.92331  46.13788

These functions can also return aggregate data if the input dataframes are grouped:

# default proportion - grouped
df %>%
  group_by(year) %>%
  phe_proportion(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop value lowercl uppercl confidence statistic       method
#>   <int> <int> <int> <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   335   583 0.575   0.534   0.614 95%        proportion of 1 Wilson
#> 2  2016   218   632 0.345   0.309   0.383 95%        proportion of 1 Wilson

# default rate - grouped
df %>%
  group_by(year) %>%
  phe_rate(obs, pop)
#> # A tibble: 2 × 9
#> # Groups:   year [2]
#>    year   obs   pop  value lowercl uppercl confidence statistic       method
#>   <int> <int> <int>  <dbl>   <dbl>   <dbl> <chr>      <chr>           <chr> 
#> 1  2015   335   583 57461.  51472.  63956. 95%        rate per 100000 Byars 
#> 2  2016   218   632 34494.  30066.  39389. 95%        rate per 100000 Byars

Aggregate functions

The remaining functions aggregate the rows in the input data frame to produce a single statistic. It is also possible to calculate multiple statistics in a single execution of these functions if the input data frame is grouped - for example by indicator ID, geographic area or time period (or all three). The output contains only the grouping variables and the values calculated by the function - any additional unused columns provided in the input data frame will not be retained in the output.

The df test data generated earlier can be used to demonstrate phe_mean:

Execute phe_mean

INPUT: The phe_mean function take a single data frame as input with a column representing the numbers to be averaged.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values (if applicable), the mean, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence and a reduced level of detail to be output.

Here are some example code chunks to demonstrate the phe_mean function and the arguments that can optionally be specified

# default mean
phe_mean(df,obs)
#>   value_sum value_count    stdev  value  lowercl  uppercl confidence statistic
#> 1       553           8 20.91095 69.125 51.64301 86.60699        95%      mean
#>                     method
#> 1 Student's t-distribution

# multiple means in a single execution with 99.8% confidence
df %>%
    group_by(year) %>%
        phe_mean(obs, confidence = 0.998)
#> # A tibble: 2 × 10
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl confidence statistic
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl> <chr>      <chr>    
#> 1  2015       335           4  18.3  83.8  -9.62     177. 99.8%      mean     
#> 2  2016       218           4  10.8  54.5  -0.428    109. 99.8%      mean     
#> # ℹ 1 more variable: method <chr>

# multiple means in a single execution with 99.8% confidence and data-only output
df %>%
    group_by(year) %>%
        phe_mean(obs, type = "standard", confidence = 0.998)
#> # A tibble: 2 × 7
#> # Groups:   year [2]
#>    year value_sum value_count stdev value lowercl uppercl
#>   <int>     <int>       <int> <dbl> <dbl>   <dbl>   <dbl>
#> 1  2015       335           4  18.3  83.8  -9.62     177.
#> 2  2016       218           4  10.8  54.5  -0.428    109.

Standardised Aggregate functions

Create some test data for the standardised aggregate functions

The following code chunk creates a data frame containing observed number of events and populations by age band for 4 areas, 5 time periods and 2 sexes:

df_std <- data.frame(
            area = rep(c("Area1", "Area2", "Area3", "Area4"), each = 19 * 2 * 5),
            year = rep(2006:2010, each = 19 * 2),
            sex = rep(rep(c("Male", "Female"), each = 19), 5),
            ageband = rep(c(0, 5,10,15,20,25,30,35,40,45,
                           50,55,60,65,70,75,80,85,90), times = 10),
            obs = sample(200, 19 * 2 * 5 * 4, replace = TRUE),
            pop = sample(10000:20000, 19 * 2 * 5 * 4, replace = TRUE))
head(df_std)
#>    area year  sex ageband obs   pop
#> 1 Area1 2006 Male       0  96 10836
#> 2 Area1 2006 Male       5 184 19544
#> 3 Area1 2006 Male      10 117 10665
#> 4 Area1 2006 Male      15  36 17725
#> 5 Area1 2006 Male      20  49 19554
#> 6 Area1 2006 Male      25 109 15477

Execute calculate_dsr

INPUT: The minimum input requirement for the calculate_dsr function is a single data frame with columns representing the numerators and denominators and standard populations for each standardisation category. The standard populations must be appended to the input data frame by the user prior to execution of the function. The 2013 European Standard Population is provided within the package in vector form (esp2013), which you can join to your dataset. Alternative standard populations can be used but must be provided by the user.

OUTPUT: By default, the function outputs one row per grouping set containing the grouping variable values, the total count, the total population, the dsr, the lower 95% confidence limit, the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output. It is also possible to calculate CIs when we can’t assume events are independent - further details can be found in the DSR vignette.

Here are some example code chunks to demonstrate the calculate_dsr function and the arguments that can optionally be specified


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1735    300742  602.    572.    634. 95%       
#>  2 Area1  2006 Male          1768    294272  648.    615.    681. 95%       
#>  3 Area1  2007 Female        1914    290838  639.    608.    672. 95%       
#>  4 Area1  2007 Male          1792    277328  673.    639.    708. 95%       
#>  5 Area1  2008 Female        1793    282462  626.    595.    659. 95%       
#>  6 Area1  2008 Male          1714    294401  620.    588.    652. 95%       
#>  7 Area1  2009 Female        1790    297043  612.    582.    643. 95%       
#>  8 Area1  2009 Male          2079    282031  781.    745.    817. 95%       
#>  9 Area1  2010 Female        1749    257541  766.    729.    805. 95%       
#> 10 Area1  2010 Male          1861    281951  659.    626.    692. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>


# Append the standard populations to the data frame
# calculate separate dsrs for each area, year and sex and drop metadata fields from output
df_std %>%
    mutate(refpop = rep(esp2013, 40)) %>%
    group_by(area, year, sex) %>%
    calculate_dsr(obs,pop, stdpop = refpop, type = "standard")
#> # A tibble: 40 × 8
#>    area   year sex    total_count total_pop value lowercl uppercl
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female        1735    300742  602.    572.    634.
#>  2 Area1  2006 Male          1768    294272  648.    615.    681.
#>  3 Area1  2007 Female        1914    290838  639.    608.    672.
#>  4 Area1  2007 Male          1792    277328  673.    639.    708.
#>  5 Area1  2008 Female        1793    282462  626.    595.    659.
#>  6 Area1  2008 Male          1714    294401  620.    588.    652.
#>  7 Area1  2009 Female        1790    297043  612.    582.    643.
#>  8 Area1  2009 Male          2079    282031  781.    745.    817.
#>  9 Area1  2010 Female        1749    257541  766.    729.    805.
#> 10 Area1  2010 Male          1861    281951  659.    626.    692.
#> # ℹ 30 more rows

# calculate for under 75s by filtering out records for 75+ from input data frame and standard population
df_std %>%
  filter(ageband <= 70) %>%
  mutate(refpop = rep(esp2013[1:15], 40)) %>%
  group_by(area, year, sex) %>%
  calculate_dsr(obs, pop, stdpop = refpop)
#> # A tibble: 40 × 11
#>    area   year sex    total_count total_pop value lowercl uppercl confidence
#>    <chr> <int> <chr>        <int>     <int> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female        1332    238968  592.    560.    626. 95%       
#>  2 Area1  2006 Male          1409    229066  648.    614.    684. 95%       
#>  3 Area1  2007 Female        1364    224357  627.    593.    662. 95%       
#>  4 Area1  2007 Male          1403    208289  689.    652.    727. 95%       
#>  5 Area1  2008 Female        1280    224528  603.    570.    638. 95%       
#>  6 Area1  2008 Male          1354    221915  628.    594.    664. 95%       
#>  7 Area1  2009 Female        1421    233909  605.    573.    638. 95%       
#>  8 Area1  2009 Male          1700    223635  794.    756.    834. 95%       
#>  9 Area1  2010 Female        1441    199244  776.    736.    818. 95%       
#> 10 Area1  2010 Male          1475    216067  678.    643.    715. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

Execute calculate_ISRatio and calculate_ISRate

INPUT: These functions take a single data frame as input, with columns representing the numerators and denominators for each standardisation category, plus reference numerators and denominators for each standardisation category.

The reference data can either be provided in a separate data frame/vectors or as columns within the input data frame:

reference data provided as a data frame or as vectors - the data frame/vectors and the input data frame must both contain rows for the same standardisation categories, and both must be sorted, within each grouping set, by these standardisation categories in the same order.
reference data provided as columns within the input data frame - the reference numerators and denominators can be appended to the input data frame prior to execution of the function - if the data is grouped to generate multiple indirectly standardised rates or ratios then the reference data will need to be repeated and appended to the data rows for every grouping set.

OUTPUT: By default, the functions output one row per grouping set containing the grouping variable values, the observed and expected counts, the reference rate (ISRate only), the indirectly standardised rate or ratio, the lower 95% confidence limit, and the upper 95% confidence limit, the confidence level, the statistic name and the method.

OPTIONS: If reference data are being provided as columns within the input data frame then the user must specify this as the function expects vectors by default. The function also accepts additional arguments to specify the level of confidence, the multiplier and a reduced level of detail to be output.

The following code chunk creates a data frame containing the reference data - this example uses the all area data for persons in the baseline year:

df_ref <- df_std %>%
    filter(year == 2006) %>%
    group_by(ageband) %>%
    summarise(obs = sum(obs),
              pop = sum(pop),
              .groups = "drop_last")
    
head(df_ref)
#> # A tibble: 6 × 3
#>   ageband   obs    pop
#>     <dbl> <int>  <int>
#> 1       0  1021 109521
#> 2       5   929 126748
#> 3      10   940 121874
#> 4      15   608 118633
#> 5      20   867 117643
#> 6      25   939 128424

Here are some example code chunks to demonstrate the calculate_ISRatio function and the arguments that can optionally be specified

# calculate separate smrs for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 11
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl confidence
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Female     1735    1996. 0.869   0.829   0.911 95%       
#>  2 Area1  2006 Male       1768    1943. 0.910   0.868   0.953 95%       
#>  3 Area1  2007 Female     1914    1922. 0.996   0.952   1.04  95%       
#>  4 Area1  2007 Male       1792    1832. 0.978   0.933   1.02  95%       
#>  5 Area1  2008 Female     1793    1884. 0.952   0.908   0.997 95%       
#>  6 Area1  2008 Male       1714    1962. 0.874   0.833   0.916 95%       
#>  7 Area1  2009 Female     1790    1990. 0.900   0.858   0.942 95%       
#>  8 Area1  2009 Male       2079    1873. 1.11    1.06    1.16  95%       
#>  9 Area1  2010 Female     1749    1687. 1.04    0.989   1.09  95%       
#> 10 Area1  2010 Male       1861    1884. 0.988   0.943   1.03  95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same smrs by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRatio(obs, pop, refobs, refpop, refpoptype = "field",
                      type = "standard")
#> # A tibble: 40 × 8
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1735    1996. 0.869   0.829   0.911
#>  2 Area1  2006 Male       1768    1943. 0.910   0.868   0.953
#>  3 Area1  2007 Female     1914    1922. 0.996   0.952   1.04 
#>  4 Area1  2007 Male       1792    1832. 0.978   0.933   1.02 
#>  5 Area1  2008 Female     1793    1884. 0.952   0.908   0.997
#>  6 Area1  2008 Male       1714    1962. 0.874   0.833   0.916
#>  7 Area1  2009 Female     1790    1990. 0.900   0.858   0.942
#>  8 Area1  2009 Male       2079    1873. 1.11    1.06    1.16 
#>  9 Area1  2010 Female     1749    1687. 1.04    0.989   1.09 
#> 10 Area1  2010 Male       1861    1884. 0.988   0.943   1.03 
#> # ℹ 30 more rows

The calculate_ISRate function works exactly the same way but instead of expressing the result as a ratio of the observed and expected rates the result is expressed as a rate and the reference rate is also provided. Here are some examples:

# calculate separate indirectly standardised rates for each area, year and sex
# standardised against the all-year, all-sex, all-area reference data
df_std %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, df_ref$obs, df_ref$pop)
#> # A tibble: 40 × 12
#> # Groups:   area, year, sex [40]
#>    area   year sex   observed expected ref_rate value lowercl uppercl confidence
#>    <chr> <int> <chr>    <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl> <chr>     
#>  1 Area1  2006 Fema…     1735    1996.     662.  575.    549.    603. 95%       
#>  2 Area1  2006 Male      1768    1943.     662.  602.    575.    631. 95%       
#>  3 Area1  2007 Fema…     1914    1922.     662.  659.    630.    690. 95%       
#>  4 Area1  2007 Male      1792    1832.     662.  648.    618.    678. 95%       
#>  5 Area1  2008 Fema…     1793    1884.     662.  630.    601.    660. 95%       
#>  6 Area1  2008 Male      1714    1962.     662.  579.    552.    607. 95%       
#>  7 Area1  2009 Fema…     1790    1990.     662.  596.    568.    624. 95%       
#>  8 Area1  2009 Male      2079    1873.     662.  735.    704.    768. 95%       
#>  9 Area1  2010 Fema…     1749    1687.     662.  686.    655.    719. 95%       
#> 10 Area1  2010 Male      1861    1884.     662.  654.    625.    685. 95%       
#> # ℹ 30 more rows
#> # ℹ 2 more variables: statistic <chr>, method <chr>

# calculate the same indirectly standardised rates by appending the reference data to the data frame
# and drop metadata columns from output
df_std %>%
    mutate(refobs = rep(df_ref$obs,40),
           refpop = rep(df_ref$pop,40)) %>%
    group_by(area, year, sex) %>%
    calculate_ISRate(obs, pop, refobs, refpop, refpoptype = "field",
                     type = "standard")
#> # A tibble: 40 × 9
#> # Groups:   area, year, sex [40]
#>    area   year sex    observed expected ref_rate value lowercl uppercl
#>    <chr> <int> <chr>     <int>    <dbl>    <dbl> <dbl>   <dbl>   <dbl>
#>  1 Area1  2006 Female     1735    1996.     662.  575.    549.    603.
#>  2 Area1  2006 Male       1768    1943.     662.  602.    575.    631.
#>  3 Area1  2007 Female     1914    1922.     662.  659.    630.    690.
#>  4 Area1  2007 Male       1792    1832.     662.  648.    618.    678.
#>  5 Area1  2008 Female     1793    1884.     662.  630.    601.    660.
#>  6 Area1  2008 Male       1714    1962.     662.  579.    552.    607.
#>  7 Area1  2009 Female     1790    1990.     662.  596.    568.    624.
#>  8 Area1  2009 Male       2079    1873.     662.  735.    704.    768.
#>  9 Area1  2010 Female     1749    1687.     662.  686.    655.    719.
#> 10 Area1  2010 Male       1861    1884.     662.  654.    625.    685.
#> # ℹ 30 more rows