Predicting Mortality Risk after a Fall in Older Adults using Health Care Spending Patterns: A population-based cohort study

Author

Alexandros Katsiferis

Abstract

Objective: to develop a prognostic model of 1-year mortality for individuals aged 65+ presenting at the emergency department (ED) with a fall based on health care spending patterns to guide clinical decision making.

Design: population-based cohort study (n = 35,997) included with a fall in 2013 and followed one year.

Methods: health care spending indicators (DIORs) 2-years before admission were evaluated as potential predictors, along with age, sex, and other clinical and sociodemographic covariates. Multivariable logistic regression models were developed and internally validated (10-fold cross-validation). Performance was assessed via discrimination (area under the receiver operating characteristics curve), Brier Scores, calibration, and decision curve analysis.

Results: the AUC of age and sex for mortality was 72.5% [95% confidence interval 71.8 to 73.2]. The best model included age, sex, number of medications and health care spending DIORs. It exhibited high discrimination (AUC: 81.1 [80.5 to 81.6]), good calibration and potential clinical benefit for various threshold probabilities. Overall, health care spending patterns improved predictive accuracy the most while also exhibiting superior performance and clinical benefit.

Conclusions: patterns of health care spending have the potential to significantly improve assessments on who is at high risk of dying following admission to the ED with a fall. The proposed methodology can assist in predicting the prognosis of fallers, emphasizing the added predictive value of longitudinal health-related information next to clinical and sociodemographic predictors.

All analyses have been conducted using the servers at Statistics Denmark (https://www.dst.dk/en)

Load the required packages

Show the code
library(DSTora) # Access to database
library(ROracle)# Access to database
library(tidyverse) # Data Handling
library(rms) # Modelling Strategies
library(riskRegression) # Performance Measures
library(lubridate) # Date Handling
library(janitor) # Mainly for tables
library(Publish) # Table creation
library(theft) # DIORs extraction
library(dcurves) # Decision curves
library(tidymodels) # Machine learning models
library(finetune) # Hyperparameter optimization of ML models
library(patchwork) # Multiple plots
library(performance) # Check multicollinearity of model

The data consist of individuals 65+years of age who have experienced falls during 2013, recorded in the emergency department. The sample size consists of 41146 individuals. Those were followed for a maximum of 3 years.

List of variables included:

  • Sex

  • Age at first fall

  • Admittance date in ED for 2013 fall

  • Admittance date for repeat fall

  • Time to repeat fall (maximum of 3 year follow-up period)

  • Event indicator for repeat fall

  • Date of Death: If there is

  • Time to death

  • Indicator for death within the 3 year period

  • Number of unique medications before the first fall

  • Educational Level (in ordered categories)

  • Income

  • Weekly health care expenditures:

    • Hospitalization-related expenditures

    • Prescription drugs expenditures

    • Primary care expenditures

    • Home care expenditures

    • Residential care expenditures

We need some transformations to our variables (such as renaming, for example):

Show the code
# Rename the Danish variable names into English and clean it up a bit

falls1 <- falls |> 
  select(PERSON_ID, SEX, Age = 'ALDER_UF', First_Admit_Date = 'INDDTO_UF',
         Second_Admit_Date = 'INDDTO_UR', Time_To_Refall = "time_repfall",
         Event_Refall = "event_repfall", Date_Of_Death = "DODDATO",
         Death_Time = "time_death", Death_Event = "event_death",
         Medications = "N_MED", Education = "ED_LEVEL", Income = "INCOME")

# Make education as factor

falls1$Education <- as.factor(falls1$Education)

# Gender variable in falls dataset (make it as a factor from binary)

falls1 <- falls1 |> 
  mutate(Gender = if_else(SEX == 0, 'Females', 'Males'))

falls1$Gender <- as.factor(falls1$Gender)

Merging data from the registers (falls information + demographics + healthcare expenditures)

Show the code
# Construct the dataset with the variables that we need
# We need ID, Date of death, Date of birth, Gender, and Bereavement Data

pop_data <- pop_data |>  
  select(ID ='PERSON_ID',Date_Of_Death = 'DODDATO', Date_Of_Birth = 'FOED_DAG', 
         Sex = 'KOEN',Bereavement_Date = 'BEREAVEMENT_DATE')

# Let's rename the ID variable as PERSON_ID

pop_data <- pop_data |> 
  rename(PERSON_ID = "ID")

# Let's create a new variable of gender and make it as factor

pop_data <- pop_data |> 
  mutate(Gender = if_else(Sex == 1, 'Males', 'Females'))

pop_data$Gender <- as.factor(pop_data$Gender)

# We need to merge the two data-sets now
# First we keep only the neccessary variables from the population data

pop1 <- pop_data |> 
  select(PERSON_ID, Bereavement_Date,Date_Of_Birth)

Expenditures database and demographics/falls merging

A few things should be taken into account:

  • There are people who were 65 years of age at 2013 when they had their first fall.

  • The expenditure data are available for all Danish individuals 65years of age or older at the start of 2011.

  • That means that in order to merge the expenditures with the rest of the variables, we need those who experienced fall when they were at least 67 at 2013. That way we monitor medical spending from 2011 to 2013 for those individuals (meaning they were 65 at that point)

Let’s start by first merging the sociodemographics with the falls

Show the code
# We filter for people 67 or older

falls_67 <- falls1 |> 
  filter(Age >= 67) 

# Now let's merge these two dataframes (pop1 and falls_67)
# pop1 = sociodemographic characteristics

pop_merged <- falls_67 |>
  inner_join(pop1, by = 'PERSON_ID')

Polishing-up the variables describing dates:

Show the code
# Now we need to make the dates a bit more transparent, without the hour

pop_merged$First_Admit_Date <- strftime(pop_merged$First_Admit_Date, format = '%Y-%m-%d')

pop_merged$First_Admit_Date <- as_date(pop_merged$First_Admit_Date)


pop_merged$Second_Admit_Date <- strftime(pop_merged$Second_Admit_Date, format = '%Y-%m-%d')

pop_merged$Second_Admit_Date <- as_date(pop_merged$Second_Admit_Date)


pop_merged$Date_Of_Death <- strftime(pop_merged$Date_Of_Death, format = '%Y-%m-%d')

pop_merged$Date_Of_Death <- as_date(pop_merged$Date_Of_Death)


pop_merged$Bereavement_Date <- strftime(pop_merged$Bereavement_Date, format = '%Y-%m-%d')

pop_merged$Bereavement_Date <- as_date(pop_merged$Bereavement_Date)


pop_merged$Date_Of_Birth <- strftime(pop_merged$Date_Of_Birth, format = '%Y-%m-%d')

pop_merged$Date_Of_Birth <- as_date(pop_merged$Date_Of_Birth)

Adding the bereavement variable in our current dataframe:

Show the code
# Creation of a bereavement status variable

pop_merged <- pop_merged |> 
  mutate(Bereavement = if_else(Bereavement_Date > First_Admit_Date | 
                                 is.na(Bereavement_Date), 'Non_Bereaved', 'Bereaved'))

pop_merged$Bereavement <- as.factor(pop_merged$Bereavement)

pop_merged$Bereavement <- relevel(pop_merged$Bereavement,ref = 'Non_Bereaved')

Merging with the health care expenditures data

  • These are available for individuals at least 65years of age by 01/01/2011

  • They are available for weekly periods

  • They cover different types of expenditures

  • They are highly representative of the expenditures in the Danish population of older adults

Show the code
# Hospital costs

hospital_costs <- tbl(conn,'costs_drg') |>  
  collect()

# Prescription Drugs:

prescription_drugs <- tbl(conn, 'costs_lmdb21') |>  
  collect()

# Home Care Costs:

home_care <- tbl(conn, 'costs_home_care21') |>  
  collect()

# Residential Care Costs:

resid_care <- tbl(conn, 'costs_residential21') |> 
  collect()

# Primary Care Costs:

primary_care <-  tbl(conn, 'costs_sssy20') |> 
  collect()

# Hospital Costs (Inpatient):

inpatient_costs <- hospital_costs |>  
  filter(SOURCE == 'DRGHEL') |> 
  select(-SOURCE)

# Hospital Costs(Outpatient):
outpatient_costs <- hospital_costs |> 
  filter(SOURCE == 'DRGAMB') |> 
  select(-SOURCE)

# Let's explore the expenditures data using summaries

 #summary(outpatient_costs$COST)
 #summary(inpatient_costs$COST)
 #summary(prescription_drugs$COST)
 #summary(home_care$COST)
 #summary(resid_care$COST)
 #summary(primary_care$COST)
 #summary(outpatient_costs$COST)


# We will impute the NAs with zeros, since estimation could not be performed. 
# But we have values for other kinds of costs for those individuals
# NA values mean that the person did not spend on that specific week

inpatient_costs <- inpatient_costs |>  
  replace_na(list(COST = 0))


# Mistakenly we also have negative values for primary care costs
# This will also be imputed with 0

primary_care <- primary_care |>  
  mutate(COST = ifelse(COST < 0 , 0, COST))

# Now we should merge the costs for individuals

merged_costs <- bind_rows(prescription_drugs, home_care, resid_care, 
                          primary_care,inpatient_costs,outpatient_costs)

# We sum all the costs per person and time 

merged_costs_test <- merged_costs |>  
  group_by(PERSON_ID, TIME) |> 
  mutate(Total_Costs = sum(COST)) |>  
  ungroup()

# The dataframe has duplicated time points, we do not need that. We will just keep one time point

merged_costs_test <- merged_costs_test |>  
  group_by(PERSON_ID) |> 
  filter(!duplicated(TIME)) |> 
  ungroup()

# Let's also arrange the time and remove the COST column

merged_costs_test <- merged_costs_test |> 
  arrange(TIME) |>  
  select(-COST)

Now we are ready to merge the expenditures data with the socio/falls dataframe

Show the code
df <- pop_merged |>  
  inner_join(merged_costs_test, by = 'PERSON_ID')

# We will replace the NA dates of death (NA in that case means they are still alive)
# We will just replace the NA with a random date after the follow-up '2022-01-01'.

df <- df |> 
  replace_na(list(Date_Of_Death = as_date('2022-01-01')))


# I would like to investigate two years of expenditures before the first fall to predict mortality or second fall in the year after

# That means that we do not need the ammount of expenditures after the fall

# Let's specify the date of two years before the first fall so we can filter them

df$Two_Years_Before_Fall <- df$First_Admit_Date - 2*365.3333

# Let's create a date variable corresponding to the TIME variable (Dates corresponding to week numbers)

Dates <-seq(as_date('2011/01/01'),as_date('2021/12/31'),by = 'week')

Dates <- as.data.frame(Dates)

Dates <- Dates |> 
  mutate(TIME = seq(0,573,1))

# Now merge the dates into the original falls data-set
df_new <- df |> 
  inner_join(Dates, by = 'TIME')


# Now let's just fill the missing weeks

df_fill <- tidyr::complete(df_new,PERSON_ID,Dates = seq(as.Date('2011-01-01'), as.Date('2021-12-31'),by = 'week'), fill = list(Total_Costs = 0 ))


# Fill the NA values of different columns due to the filling

df_fill <- df_fill |>  
  group_by(PERSON_ID) |> 
  tidyr::fill(Date_Of_Death, Date_Of_Birth, Gender, Bereavement_Date, 
              Age, Two_Years_Before_Fall, Income,
              Education, Bereavement, Death_Event, 
              Time_To_Refall, Second_Admit_Date, Event_Refall, SEX, Medications,
              First_Admit_Date, Death_Time,
              .direction = 'downup') |> 
  ungroup()

df_fill <- df_fill |> 
  select(-TIME) |> 
  filter(Dates >= Two_Years_Before_Fall & Dates <= First_Admit_Date)

df_fill <- df_fill |>  
  group_by(PERSON_ID) |> 
  mutate(Weeks = row_number() - 1) |> 
  ungroup()

# Now we have to arrange the expenditures into a wide format

full_wide <- df_fill |>
  select(PERSON_ID, Gender, Age, Income, 
         Education, Bereavement, Death_Event, Time_To_Refall, Event_Refall, 
         Medications, Death_Time, Total_Costs, Weeks) |> 
  pivot_wider(names_from = Weeks, values_from = Total_Costs, names_prefix = 'Week_')

Now we have our final dataframe called full_wide.

  • This dataframe does not hold information about survival status after first fall.

  • It does hold information regarding sociodemographics, medication use, and weekly expenditures.

  • We need to implement some more details to our dataset

Our main analysis is:

  • Estimation of 1-year mortality risk in the fallers

Modelling/Prediction of mortality

  • We will use Logistic Regression with restricted cubic splines and interactions to predict the aforementioned endpoints

  • We perform internal validation using 10-fold cross-validation

  • We will create several models:

    • Base model: Age and Gender as predictors

    • Basic model: Simple model + Income + Bereavement Status + Education + Number of Medications

    • Model with DIORs : Age, Gender, Number of unique medications + Summary Statistics of Health-care expenditures

  • We will assess calibration (calibration plots, Brier score) and discrimination (AUC) in cross-validated samples, as well as perform a decision curve analysis.

Mortality Analysis

Show the code
full_wide_year <- df_fill |>
  select(Gender, Age, Income, Education, Bereavement,
         Medications, Total_Costs, Weeks, Death_Time) |> 
  pivot_wider(names_from = Weeks, values_from = Total_Costs, names_prefix = 'Week_')

# Create a new variable specifying survival status at the end of the first year after refall

full_wide_year <- full_wide_year |> 
  mutate(Survival1 = if_else(Death_Time <= 365, 'Dead', 'Alive'))

# Make it as a factor

full_wide_year$Survival1 <- as.factor(full_wide_year$Survival1)

# This code chunk below puts the Survival1(status at the end of the first year) after the general Survival(status at the end of the follow-up) and then deletes the latter from the data frame

full_wide_year <- full_wide_year |> 
  relocate(Survival1,.after = Survival) |> 
  select(-Death_Time)

Let’s create some data splits in case we need to do a single training/test split

Show the code
# We keep the seed at 234 for reproducibility

set.seed(234)

split5 <- initial_split(full_wide_year)

train5 <- training(split5) # training set

test5 <- testing(split5) # testing set

Set up the datasets for prediction of recurrent falls

Show the code
# We create our dataframe

full_wide_year_refall <- df_fill |>
  select(Event_Refall, Gender, Age, Income, Education, Bereavement,
         Medications, Total_Costs, Weeks, Time_To_Refall) |> 
  pivot_wider(names_from = Weeks, values_from = Total_Costs, names_prefix = 'Week_')


# Create a new variable of falls within a year after first one

full_wide_year_refall <- full_wide_year_refall |> 
  mutate(Event_Refall1 = if_else(Time_To_Refall <= 365 & Event_Refall == 1, 
                                 'Fall', 'No Fall'))

# We make it as factor

full_wide_year_refall$Event_Refall1 <- as.factor(full_wide_year_refall$Event_Refall1)

# Change the reference level so as prediction are given for the fall event

full_wide_year_refall$Event_Refall1 <- relevel(full_wide_year_refall$Event_Refall1, ref = 'No Fall')

full_wide_year_refall <- full_wide_year_refall |> 
  relocate(Event_Refall1,.after = Event_Refall) |> 
  select(-Event_Refall,-Time_To_Refall)

Let’s create data splits in case we need them (recurrent fall)

Show the code
# Use seed 234 to match the mortality predictions

set.seed(234)

split6 <- initial_split(full_wide_year_refall)

train6 <- training(split6)

test6 <- testing(split6)

Creation of train/ test split

Show the code
table_df <- bind_rows(train5, test5, .id = "dataset")

table_df <- table_df |> 
  mutate(Dataset = if_else(dataset == 1, 'Development Set', 'Validation Set'))

table_df$Dataset <- as.factor(table_df$Dataset)

table_df <- table_df |> 
  relocate(Dataset,.after = dataset)

table_df1 <- table_df |> 
  rowwise() |> 
  mutate(Average_Spending = mean(c_across(Week_0:Week_104),na.rm = T))

# Now we need to have a column for the recurrent falls status

table_df_extra <- bind_rows(train6, test6, .id = "dataset")

# Here we do it
table_df1$Event_Refall1 <- table_df_extra$Event_Refall1

# Convert the expenditures into Euros

table_df1$Average_Expenditures <- table_df1$Average_Spending * 1000 * 0.13

# Select only the relevant variables
table_df1 <- table_df1 |> 
  mutate(Composite = if_else(
    Survival1 == "Dead" | Event_Refall1 == "Fall", "Composite Yes", "Composite No")) |> 
  select(Dataset, Survival1, Event_Refall1, Composite, Gender, Age, Income, Medications, 
         Education, Bereavement, Average_Spending, Average_Expenditures)

DIORs preprocessing (Mortality 1-year)

  • We will create a model that uses summary statistics of health care expenditures instead of the raw expenditures.

  • We will do it for the mortality analysis now

  • We will first have to extract the summary statistics and create a new dataframe

Show the code
# Week 104 is not available for all individuals, so we will not include it for the calculation of DIORs.

df_fill1 <- df_fill |> 
  filter(Weeks != 104)

# We calculate the features for each person
feature_matrix <- calculate_features(data = df_fill,id_var = 'PERSON_ID',
                                     time_var = 'Weeks',
                                     values_var = 'Total_Costs',
                                     catch24 = T,seed = 234)

# We normalize them to be on the same scale

normed <- normalise_feature_frame(feature_matrix, 
                                  names_var = "names", 
                                  values_var = "values",
                                  method = "RobustSigmoid")

# Use the wide format

normed_wide <- normed |> 
  pivot_wider(names_from = names,values_from = values)

normed_wide <- normed_wide |> 
  select(-method) |> 
  rename(PERSON_ID = 'id')

###############################
#### Get the PERSON IDs #######
###############################

id_df <- df_fill |>
  select(PERSON_ID,Event_Refall, Gender, Age, Income,
         Total_Costs, Weeks, Time_To_Refall) |>    
  pivot_wider(names_from = Weeks, 
              values_from = Total_Costs, 
              names_prefix = 'Week_')

set.seed(234)

id_split <- initial_split(id_df)

id_train <- training(id_split)

id_test <- testing(id_split)

training_ids <- id_train |> 
  select(PERSON_ID)

testing_ids <- id_test |> 
  select(PERSON_ID)

# Now I get all the person ids for the individuals

all_ids <- bind_rows(training_ids, testing_ids)

# Put them into table_df1 

table_df1$PERSON_ID <- all_ids$PERSON_ID

###########################################################

overall_df1 <- table_df1 |> 
  inner_join(normed_wide,by = 'PERSON_ID')

ov <- overall_df1[match(id_df$PERSON_ID,overall_df1$PERSON_ID),]

# We only keep the variables we need

ov <- ov |> 
  select(-PERSON_ID, -Average_Spending, 
         -Event_Refall1,-Dataset)

# Let's remove the observations for whom some DIORs could not be computed

ov_omit <- na.omit(ov)

Sample Size Calculations (Not shown in the draft, but sample size sufficient)

Show the code
progn_sampling <- pmsampsize(type = "b", parameters = 50, cstatistic = 0.70, prevalence = 0.16, seed = 234)

Logistic Regression Modelling

Show the code
dd = datadist(ov_omit)

# Specify the full model and fit it to the whole dataset

logistic_simple <- lrm(Survival1 ~ 
  rcs(Age,4)* rcs(DN_Mean,4) + Gender +  
  rcs(Medications,4) + 
  rcs(CO_trev_1_num,4) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,4) + 
  rcs(SB_MotifThree_quantile_hh,4) + 
  rcs(SB_BinaryStats_mean_longstretch1,4),
  data = ov_omit, 
  x= T, 
  y=T,tol = 1e-12)

# Specify a model with a smaller ammount of knots

logistic_simple_3knots <-  lrm(Survival1 ~ 
  rcs(Age,3)* rcs(DN_Mean,3) + Gender +  
  rcs(Medications,3) + 
  rcs(CO_trev_1_num,3) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,3) + 
  rcs(SB_MotifThree_quantile_hh,3) + 
  rcs(SB_BinaryStats_mean_longstretch1,3),
  data = ov_omit, 
  x= T, 
  y=T,tol = 1e-12)


# Specify a model with a bigger amount of knots

logistic_simple_5knots <-  lrm(Survival1 ~ 
  rcs(Age,3)* rcs(DN_Mean,3) + Gender +  
  rcs(Medications,3) + 
  rcs(CO_trev_1_num,3) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,3) + 
  rcs(SB_MotifThree_quantile_hh,3) + 
  rcs(SB_BinaryStats_mean_longstretch1,3),
  data = ov_omit, 
  x= T, 
  y=T,tol = 1e-12)


# Specify a model with only age and sex

logistic_simplest <- lrm(Survival1 ~ rcs(Age,4)* Gender,
  data = ov_omit, 
  x= T, 
  y=T,tol = 1e-12)


# Specify a basic model with age, gender, sociodemographics

logistic_basic <- lrm(Survival1 ~ 
  rcs(Age,4)*Gender +  
  rcs(Medications,4) + 
  rcs(Income,4) +
  Education + 
  Bereavement,
  data = ov_omit, 
  x= T, 
  y=T,tol = 1e-12)


# Assess performance in 10 fold cross validation

logist_score <- Score(
  object = list("Full Model (Including Expenditures)" = logistic_simple,
                "Age, Gender plus Sociodemographic variables" = logistic_basic,
                "Age and Gender" = logistic_simplest,
                "3 Knots" = logistic_simple_3knots,
                "5 Knots" = logistic_simple_5knots),
                      formula = Survival1 ~ 1,
                      se.fit = T, metrics = c("auc","brier"), summary = "ipa",
                      plots = "cal",data = ov_omit,split.method = "cv10",
  seed = 234,ncpus = 20)

# The summary of the model

summary(logist_score)

Create a summary of the model (coefficients): Supplementary Material

Show the code
print(logistic_simple)

Explainability-plots for our analysis (Figure 2)

Show the code
# Partial effects 

p1 <- ggplot(Predict(logistic_simple,Age,fun = plogis),adj.subtitle = F,addlayer = theme_minimal(), xlab = "Age at admittance in ED after fall", ylab = "Predicted 1-year mortality risk")

p2 <- ggplot(Predict(logistic_simple, Medications, fun = plogis), adj.subtitle = F,addlayer = theme_minimal(), xlab = "Number of unique prescribed medications within three years before fall")

p3 <- ggplot(Predict(logistic_simple, Gender, fun = plogis), adj.subtitle = F, addlayer = theme_minimal())

p4 <- ggplot(Predict(logistic_simple, CO_trev_1_num, fun = plogis ), adj.subtitle = F, addlayer = theme_minimal(), xlab = "Successive differences in health care spending")

p5 <- ggplot(Predict(logistic_simple,   DN_OutlierInclude_p_001_mdrmd, fun = plogis), 
             adj.subtitle = F, addlayer = theme_minimal(), xlab = "Timing of extreme health care spending")

p6 <- ggplot(Predict(logistic_simple, SB_MotifThree_quantile_hh, fun = plogis ), 
             adj.subtitle = F, addlayer = theme_minimal(), xlab = "Quantile entropy of health care spending")

p7 <- ggplot(Predict(logistic_simple, SB_BinaryStats_mean_longstretch1, fun = plogis), 
             adj.subtitle = F, addlayer = theme_minimal(), xlab = "Long stretches of extreme health care spending")

p8 <- ggplot(Predict(logistic_simple, DN_Mean, fun = plogis), adj.subtitle = F, addlayer = theme_minimal(), xlab = "Two-years average of weekly health care spending")


(p1 + p2) / (p3 + p4) / (p5 + p6) / (p7 + p8)

Some interactions (Not included in the paper)

Show the code
pred_inter_age <- Predict(logistic_simple, "Age", "DN_Mean", fun = plogis)

pred_inter_meds <- Predict(logistic_simple, "Age", "Medications", fun = plogis)

pred_inter_meds <- Predict(logistic_simple, "DN_Mean", "DN_OutlierInclude_p_001_mdrmd",
                           fun = plogis)

# Interactions of variables on mortality risk

###############################
inter1 <- bplot(pred_inter_age, yhat ~ Age + DN_Mean, lfun = levelplot,
      ylab = "Average Weekly Expenditures", zlab = "Pr(Dying)\n", 
      xlab = "Age in ED after first fall",adj.subtitle = FALSE)


inter2 <- bplot(pred_inter_meds, yhat ~ Age + Medications, lfun = levelplot,
      ylab = "Number of Medications until first fall", 
      zlab = "Pr(Dying)\n", xlab = "Age in ED after first fall",adj.subtitle = FALSE)


inter3 <- bplot(pred_inter_meds, yhat ~ DN_Mean + DN_OutlierInclude_p_001_mdrmd , 
                lfun = levelplot, 
                ylab = "Timing of extreme expenditures", 
                zlab = "Pr(Dying)\n", xlab = "Average Weekly Expenditures",
                adj.subtitle = FALSE)

################################

# And here is the plot

cowplot::plot_grid(inter1, inter2, inter3,nrow = 3)


# Interactions of Average Healthcare Expenditures with other variables

#############################

inter4 <- ggplot(Predict(logistic_simple, DN_Mean, Age = c(70,80,90,100),fun = plogis),
                 ylab = "1-year estimated mortality risk", 
                 xlab = "Average weekly expenditures", adj.subtitle = F) + 
  theme_minimal()


inter5 <- ggplot(Predict(logistic_simple, DN_Mean, 
               DN_OutlierInclude_p_001_mdrmd = c(0.1,0.9),fun = plogis),
                ylab = "1-year estimated mortality risk", 
                xlab = "Average weekly expenditures", 
                adj.subtitle = F) + theme_minimal()


inter6 <- ggplot(Predict(logistic_simple,Age,Gender,fun = plogis),adj.subtitle = F,
                 addlayer = theme_minimal(), 
                 xlab = "Age at admittance in ED after fall")

inter7 <- ggplot(Predict(logistic_simple, DN_Mean, Gender,fun = plogis),
                 xlab = "Average weekly expenditures",
                 adj.subtitle = F) +
                 theme_minimal()

################################

inter4 / inter5 / inter6 / inter7

Create a more advanced basic model (With secondary predictors)

Show the code
ov_new <- overall_df1[match(id_df$PERSON_ID,overall_df1$PERSON_ID),]

# We only keep the variables we need

ov_new <- ov_new |> 
  select(-Average_Spending, 
         -Event_Refall1,-Dataset)

# Let's remove the observations for whom some DIORs could not be computed

ov_omit_new <- na.omit(ov_new)

# Let's add hospitalization, hip fracture, heart issues medication, previous falls

hf <- falls |> select(PERSON_ID, HFRAC, HOSP, HEART_MED, pre_falls_1, pre_falls_cat_1, pre_falls_3, pre_falls_cat_3)

ov_omit_new <- ov_omit_new |> left_join(hf,by = "PERSON_ID")


# Change the binary indicators into factors

ov_omit_new <- ov_omit_new |> 
  mutate(HFRAC = if_else(HFRAC == 0, 
                         "Not admitted with a hip fracture", "Admitted with a hip fracture"),
         HOSP = if_else(HOSP == 0 , "Not hospitalized after admittance", "Hospitalized after admittance"),
         HEART_MED = if_else(HEART_MED == 0, "Not taking medication for heart complications","Taking Medication for heart complications"))

# Convert them into factors 

ov_omit_new$HFRAC <- as.factor(ov_omit_new$HFRAC)
ov_omit_new$HOSP <- as.factor(ov_omit_new$HOSP)
ov_omit_new$HEART_MED <- as.factor(ov_omit_new$HEART_MED)

# Also the education should be a factor

ov_omit_new <- ov_omit_new |> 
  mutate(Education = case_when(Education == 0 ~ "Lowest Education Level",
                               Education == 1 ~ "Middle Education Level",
                               Education == 2 ~ "Highest Education Level"))

ov_omit_new$Education <- as.factor(ov_omit_new$Education)

ov_omit_new$Education <-relevel(ov_omit_new$Education, ref = "Lowest Education Level")

# Negative income set to zero

ov_omit_new <- ov_omit_new |> 
  mutate(Income = if_else(Income < 0 , 0, Income))

# Add some more sociodemographic variables

# Number of children

n_child <- tbl(conn, 'number_of_children') |>  
  collect() # Number of offsprings

children <- n_child %>% 
  replace_na(list(N_MOTHER = 0, N_FATHER = 0)) %>% 
  group_by(PERSON_ID) %>% 
  mutate(Children = sum(N_MOTHER + N_FATHER)) %>% 
  ungroup() %>% 
  select(PERSON_ID,Children)

# Number of comorbidities

multimorb <- tbl(conn, 'multimorbidity') |> 
  collect() # Number of multimorbidities

morb_new <- multimorb |> 
  group_by(PERSON_ID) |>  
  summarise(Total_Number = sum(n())) |> 
  ungroup()

# Immigration Status

immigration_data <- tbl(conn, 'pop21') |>  
  collect()

immigration_data <- immigration_data |> 
  select(PERSON_ID, IE_TYPE)


# Put them into the full dataset

ov_omit_newest <- ov_omit_new |> 
  left_join(morb_new, by = "PERSON_ID") |> 
  replace_na(list(Total_Number = 0))

ov_omit_newest <- ov_omit_newest |> 
  left_join(children, by = "PERSON_ID") |> 
  replace_na(list(Children = 0))

ov_omit_newest <- ov_omit_newest |> 
  left_join(immigration_data, by = "PERSON_ID") |> 
  replace_na(list(IE_TYPE = 0))


# Create some categories for number of children and immigration status

ov_omit_newest <- ov_omit_newest |> 
  mutate(Children_F = case_when(
  Children == 0 ~ 'Zero',
  Children == 1 ~ 'One',
  Children == 2 ~ 'Two',
  Children == 3 ~ 'Three',
  Children >= 4 ~ '4 or more')) |> 
  mutate(Immigration_Status = case_when(IE_TYPE == 1 ~ 'Danish',
                                        IE_TYPE == 2 ~ 'Immigrants',
                                        IE_TYPE == 3 ~ 'Descendants'))


ov_omit_newest$Children_F <- as.factor(ov_omit_newest$Children_F)
ov_omit_newest$Immigration_Status <- as.factor(ov_omit_newest$Immigration_Status)


# Specify datadist

dd = datadist(ov_omit_newest)

options(datadist = "dd")


# Now let's create a basic logistic regression model and compare with our previous basic one

# Basic model + secondary predictors

logistic_basic_advanced <- lrm(Survival1 ~ 
  rcs(Age,4)*(Gender) +  
  rcs(Medications,4) + 
  rcs(Income,4) * Gender +
  Education +
  HFRAC +
  Gender*HEART_MED +
  Gender*pre_falls_3 +
  Bereavement +
  Immigration_Status +
  Children_F,
  data = ov_omit_newest, 
  x= T, 
  y=T,
  tol = 1e-15)

# Final model + secondary predictors

logistic_simple_more_covs <-  lrm(Survival1 ~ 
  rcs(Age,4)* (rcs(DN_Mean,4) + Gender) +  
  rcs(Medications,4) + 
  rcs(CO_trev_1_num,4) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,4) + 
  rcs(SB_MotifThree_quantile_hh,4) + 
  rcs(SB_BinaryStats_mean_longstretch1,4) +
  rcs(Income,4) * Gender +
  Education +
  HFRAC +
  HEART_MED +
  Children_F +
  Immigration_Status +
  pre_falls_3,
  data = ov_omit_newest, 
  x= T, 
  y=T,
  tol = 1e-15)


# Only intercept (empirical prevalence)

logistic_intercept <- glm(Survival1 ~ 1, data = ov_omit_newest, x = T, y = T,family = "binomial")

# Assess performance in 10 fold cross validation

logist_score_new <- Score(
  object = list("Full Model (Including Expenditures)" = logistic_simple,
                "Age, Gender plus Sociodemographic variables" = logistic_basic,
                "Age and Gender" = logistic_simplest,
                "Only Intercept" = logistic_intercept,
                "Advanced_Basic" = logistic_basic_advanced,
                "3 Knots" = logistic_simple_3knots,
                "5 Knots" = logistic_simple_5knots,
                "Full Model plus more" = logistic_simple_more_covs),
                      formula = Survival1 ~ 1,
                      se.fit = T, metrics = c("auc","brier"), summary = "ipa",
                      plots = "cal",data = ov_omit_newest,split.method = "cv10",
  seed = 234,ncpus = 20)

summary(logist_score_new)


# Calibration of three main models (Supplementary Figure)

cali <- plotCalibration(logist_score_new, 
                models =c(
                  "Full Model (Including Expenditures)", 
                  "Advanced_Basic", 
                  "Age and Gender"),
                round = F,
                xlab = "Predicted 1-year mortality risk",
                ylab = "Observed 1-year mortality proportion",rug = F,legend = T,auc.in.legend = T, brier.in.legend = F,cex = 0.4)

Univariable associations of each predictor of the final model (Odds Ratios)

Show the code
# Univariable associations

summary(lrm(Survival1 ~ rcs(Age,4), data = ov_omit))

summary(lrm(Survival1 ~ Gender, data = ov_omit))

summary(lrm(Survival1 ~ rcs(DN_Mean,4), data = ov_omit))

summary(lrm(Survival1 ~ rcs(Medications,4), data = ov_omit))

summary(lrm(Survival1 ~ rcs(CO_trev_1_num,4), data = ov_omit))

summary(lrm(Survival1 ~ rcs(DN_OutlierInclude_p_001_mdrmd,4), data = ov_omit))

summary(lrm(Survival1 ~ rcs(SB_MotifThree_quantile_hh,4), data = ov_omit))

summary(lrm(Survival1 ~ rcs(SB_BinaryStats_mean_longstretch1,4), data = ov_omit))

Table 1 of Paper

Show the code
summary(utable(Survival1 ~  Gender + Age + HFRAC + HOSP + HEART_MED + 
                 Children_F + Immigration_Status + pre_falls_3 + pre_falls_cat_3 +
                 Q(Income) + Medications + Education + Bereavement + 
                 Q(Average_Expenditures) + CO_trev_1_num + 
                 DN_OutlierInclude_p_001_mdrmd + 
                 SB_MotifThree_quantile_hh + 
                 SB_BinaryStats_mean_longstretch1,
                 data = ov_omit_newest,show.totals = T))

DCA with cross-validated predictions (Figure 3)

Show the code
# Updated DCA (Using cross-validated predictions)

cv_preds <- logist_score_new$Calibration$plotframe

cv_preds_full_model <- cv_preds |> 
  filter(model == "Full Model (Including Expenditures)") |> 
  select(-model) |> 
  arrange(ID) |> 
  rename(Predictions_full_model = "risk")

cv_preds_simplest_model <- cv_preds |> 
  filter(model == "Age and Gender") |> 
  select(-model) |> 
  arrange(ID) |> 
  rename(Predictions_simplest_model = "risk")

cv_preds_basic_model <- cv_preds |> 
  filter(model == "Advanced_Basic") |> 
  select(-model) |> 
  arrange(ID) |> 
  rename(Predictions_basic_model = "risk")


dc_logistic$Predictions_simplest_mortality <- cv_preds_simplest_model$Predictions_simplest_model

dc_logistic$Predictions_fullsociodems_mortality <- cv_preds_basic_model$Predictions_basic_model

dc_logistic$Predictions_full_mortality <- cv_preds_full_model$Predictions_full_model


plot_curves_new <- dca(Survival1 ~ Predictions_full_mortality + 
                     Predictions_simplest_mortality + 
                     Predictions_fullsociodems_mortality,
    data = dc_logistic, 
    thresholds = seq(0,0.6,0.05),
    label = list(Predictions_full_mortality = "Model with DIORs",
                 Predictions_simplest_mortality = "Model with Age and Sex",
                 Predictions_fullsociodems_mortality = "Model with Age, Sex, Clinical and Socio-demographic predictors")) |> 
  plot(smooth = TRUE)

plot_curves_new + 
  theme(legend.position = "top", axis.text = element_text(size = 12),
        legend.text = element_text(size = 11))

Recurrent fall analysis (Secondary Analysis)

Show the code
ov_ref <- overall_df1[match(id_df$PERSON_ID,overall_df1$PERSON_ID),]

# We only keep the variables we need (refall episode)

ov_ref <- ov_ref |> 
  select(-PERSON_ID, -Average_Spending, 
         -Survival1,-Dataset)

# Let's remove the observations for whom some DIORs could not be computed

ov_omit_ref <- na.omit(ov_ref)

# Let's put the refall variable in the dataframe with all the predictors

ov_omit_newest$Event_Refall1 <- ov_omit_ref$Event_Refall1

ov_omit_newest <- ov_omit_newest |> 
  relocate(Event_Refall1, .after = Survival1)

Modelling of recurrent falls (Secondary Analysis)

Show the code
# Specify a model with only age and sex

logistic_simplest_refall <- lrm(Event_Refall1 ~ rcs(Age,4)* Gender,
  data = ov_omit_newest, 
  x= T, 
  y=T,tol = 1e-12)

# Only intercept model

logistic_intercept_refall <- glm(Event_Refall1 ~ 1,data = ov_omit_newest, family = "binomial", x = T, y = T)

# Specify a basic model with age, sex, sociodemographics

logistic_basic_refall <- lrm(Event_Refall1 ~ 
  rcs(Age,4)*Gender +  
  rcs(Medications,4) + 
  rcs(Income,4) +
  Education + 
  Bereavement,
  data = ov_omit_newest, 
  x= T, 
  y=T,tol = 1e-12)

# Basic model + secondary predictors

logistic_basic_advanced_refall <- lrm(Event_Refall1 ~ 
  rcs(Age,4)*(Gender) +  
  rcs(Medications,4) + 
  rcs(Income,4) * Gender +
  Education +
  HFRAC +
  Gender*HEART_MED +
  Gender*pre_falls_3 +
  Bereavement +
  Immigration_Status +
  Children_F,
  data = ov_omit_newest, 
  x= T, 
  y=T,tol = 1e-15)

# Final model + secondary predictors

logistic_full_refall <- lrm(Event_Refall1 ~ 
  rcs(Age,4)* (rcs(DN_Mean,4) + Gender) +  
  rcs(Medications,4) + 
  rcs(CO_trev_1_num,4) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,4) + 
  rcs(SB_MotifThree_quantile_hh,4) + 
  rcs(SB_BinaryStats_mean_longstretch1,4) +
  rcs(Income,4) * Gender +
  Education +
  HFRAC +
  HEART_MED +
  Children_F +
  Immigration_Status +
  pre_falls_3,
  data = ov_omit_newest, 
  x= T, 
  y=T,tol = 1e-15)

# Final model (no secondary predictors)

logistic_diors_nosec <- lrm(Event_Refall1 ~ 
  rcs(Age,4)* rcs(DN_Mean,4) + Gender +  
  rcs(Medications,4) + 
  rcs(CO_trev_1_num,4) + 
  rcs(DN_OutlierInclude_p_001_mdrmd,4) + 
  rcs(SB_MotifThree_quantile_hh,4) + 
  rcs(SB_BinaryStats_mean_longstretch1,4),
  data = ov_omit_newest, 
  x= T, 
  y=T,tol = 1e-12)

refall_score <- Score(
  object = list("Full Model (Including Expenditures)" = logistic_full_refall,
                "Age and Gender" = logistic_simplest_refall,
                "Basic Model" = logistic_basic_refall,
                "Advance Basic Model" = logistic_basic_advanced_refall,
                "Intercept Only Model" = logistic_intercept_refall,
                "Full Model/No Secondary" = logistic_diors_nosec),
                      formula = Event_Refall1 ~ 1,
                      se.fit = T, metrics = c("auc","brier"), summary = "ipa",
                      plots = "cal",data = ov_omit_newest,split.method = "cv10",
  seed = 234,ncpus = 20)

Figure 1 of paper: Distribution of Risks

Show the code
# Distribution of Risks from 3 mains models

dc_logistic1 <- dc_logistic

dc_logistic1 <- dc_logistic1 |> 
  pivot_longer(names_to = "Risk_Model",values_to = "Risk_Predictions",cols =c ("Predictions_full_mortality", "Predictions_fullsociodems_mortality", "Predictions_simplest_mortality"))

dc_logistic1 |> 
  ggplot(aes(x = Risk_Predictions, fill = Risk_Model)) +
  geom_density(alpha = 0.5) +
  theme_minimal() +
  see::scale_fill_see_d(labels = c("Model with DIORs",
                                   "Model with Age, Sex, Clinical and Socio-demographic Characteristics", "Model with Age and Sex")) +
  theme(legend.position = "top",legend.text = element_text(size = 11),
        axis.title.x = element_text(size = 11),
        axis.title.y = element_text(size = 11),
        axis.text.x = element_text(size = 11),
        axis.text.y = element_text(size = 11)) +
  xlab("1-year mortality risk predictions") +
  ylab(NULL)

Variance Inflation Factors (Final model with DIORs)

Show the code
performance::check_collinearity(logistic_simple)