Live Webinar with Dr. Charles McCulloch

Analytic Strategies for the OAI Data

May 11, 2026

0:00:00.080,0:00:03.679
want to talk about analytics.

0:00:02.240,0:00:07.120
Today I want to talk about analytic

0:00:03.679,0:00:10.160
strategies for the OI data. Let's see. I

0:00:07.120,0:00:11.920
got to say being recorded. Okay. So

0:00:10.160,0:00:14.400
here's my outline. I'll start off with

0:00:11.920,0:00:15.920
introduction and some examples uh and

0:00:14.400,0:00:18.080
talk about some sort of general

0:00:15.920,0:00:19.760
considerations that I always go through

0:00:18.080,0:00:23.439
when I'm thinking about analyzing data

0:00:19.760,0:00:25.439
from OI. Um and a key component of

0:00:23.439,0:00:27.519
dealing with this data is that almost

0:00:25.439,0:00:29.119
always you have to deal with uh

0:00:27.519,0:00:31.760
accommodating correlations. you know

0:00:29.119,0:00:34.719
either longitudinally over time between

0:00:31.760,0:00:36.640
knees or regions within these um and

0:00:34.719,0:00:40.200
that's important to get the proper

0:00:36.640,0:00:40.200
statistical analysis.

0:00:40.239,0:00:45.520
All right. So when I think about

0:00:42.559,0:00:47.680
analyzing a data set I always start by

0:00:45.520,0:00:49.600
thinking about the nature of the outcome

0:00:47.680,0:00:52.079
variable. Is it a binary outcome

0:00:49.600,0:00:53.680
variable in which case I might be guided

0:00:52.079,0:00:55.199
to using something like logistic

0:00:53.680,0:00:57.600
regression like presence of

0:00:55.199,0:01:00.719
osteophittes? Uh and I might then I

0:00:57.600,0:01:03.120
would characterize the associations

0:01:00.719,0:01:05.920
using things like odds ratios or areas

0:01:03.120,0:01:07.600
under the ROC curve. If I have a numeric

0:01:05.920,0:01:09.520
outcome, I'm typically going to be

0:01:07.600,0:01:12.240
thinking of things like linear

0:01:09.520,0:01:14.960
regression. So things like wmac pain I

0:01:12.240,0:01:16.960
might treat as just numeric

0:01:14.960,0:01:19.200
um and I'd fit linear regression type

0:01:16.960,0:01:21.439
models. But you can also have other

0:01:19.200,0:01:24.640
types of models like time toe event

0:01:21.439,0:01:26.799
models. Um time until knee replacement

0:01:24.640,0:01:29.759
for example that you would handle with a

0:01:26.799,0:01:32.320
Cox model or a pool logistic regression.

0:01:29.759,0:01:34.479
And somewhat less uh commonly you might

0:01:32.320,0:01:38.640
intercounter count outcomes in which

0:01:34.479,0:01:40.640
case I might use pon regression.

0:01:38.640,0:01:42.720
And of course, any of these methods that

0:01:40.640,0:01:45.439
you're going to use need to be modified

0:01:42.720,0:01:46.960
to be able to accommodate um cluster

0:01:45.439,0:01:51.040
data or repeated measures or

0:01:46.960,0:01:53.119
longitudinal measures over time.

0:01:51.040,0:01:55.360
So let me just think of some examples

0:01:53.119,0:01:58.000
and walk through you know what sort of

0:01:55.360,0:02:00.960
considerations we need to incorporate.

0:01:58.000,0:02:03.680
Um so for example suppose quality of

0:02:00.960,0:02:05.759
life as measured by the coup scale is

0:02:03.680,0:02:08.640
that related to somebody's body mass

0:02:05.759,0:02:10.160
index at baseline? Um, that's just a

0:02:08.640,0:02:12.319
cross-sectional analysis. There's

0:02:10.160,0:02:15.520
nothing longitudinal or clustered about

0:02:12.319,0:02:17.360
it. Um, example two, is the difference

0:02:15.520,0:02:19.360
between men and women in the Wulmac pain

0:02:17.360,0:02:22.319
score the same for those with and

0:02:19.360,0:02:25.520
without symptomatic knee osteoarthritis

0:02:22.319,0:02:27.680
at baseline? That's um at baseline, so

0:02:25.520,0:02:30.000
it's not longitudinal over time, but

0:02:27.680,0:02:32.879
it's still clustered because we've got a

0:02:30.000,0:02:35.440
warm pain score for each knee for each

0:02:32.879,0:02:37.440
person.

0:02:35.440,0:02:39.840
Um, here's another example of a

0:02:37.440,0:02:42.000
clustered data analysis is the presence

0:02:39.840,0:02:45.200
of osteophites at baseline predicted by

0:02:42.000,0:02:48.480
knee pain. Again, we've got measurements

0:02:45.200,0:02:50.800
that are separate for each knee.

0:02:48.480,0:02:53.120
Um, a lot of times people want to use

0:02:50.800,0:02:55.120
the OI data to answer questions about

0:02:53.120,0:02:58.400
changes over time. One of the strengths

0:02:55.120,0:03:01.040
of the data set is the longitudinal

0:02:58.400,0:03:03.360
follow-up for participants. For example,

0:03:01.040,0:03:05.120
is the 18-month change in WAC pain

0:03:03.360,0:03:09.879
score. the same or different for those

0:03:05.120,0:03:09.879
with symptomatic neoa at baseline.

0:03:10.000,0:03:14.959
So going back to example one is boost

0:03:12.879,0:03:16.800
quality of life related to baseline BMI.

0:03:14.959,0:03:19.760
I might start by looking at something

0:03:16.800,0:03:22.159
graphical like a scatter plot and we see

0:03:19.760,0:03:24.720
you know monotonically decreasing

0:03:22.159,0:03:26.959
relationship of quality of life with

0:03:24.720,0:03:28.319
baseline BMI

0:03:26.959,0:03:31.360
and just handle this with linear

0:03:28.319,0:03:32.959
regression. uh it's not clustered data.

0:03:31.360,0:03:35.040
Um and we might get a regression

0:03:32.959,0:03:38.319
coefficient. The coefficient is minus

0:03:35.040,0:03:41.840
one with a standard error of 0.09 and

0:03:38.319,0:03:43.440
some very tin tiny p value. Um this is

0:03:41.840,0:03:46.159
just a standard linear regression

0:03:43.440,0:03:48.720
problem. Um and there's no clustered or

0:03:46.159,0:03:51.519
longitudinal data here. Very simple to

0:03:48.720,0:03:53.920
analyze.

0:03:51.519,0:03:59.840
All right. But what happens we do when

0:03:53.920,0:04:01.920
we do have analyses or data sets where

0:03:59.840,0:04:04.159
we have to accommodate the clustering or

0:04:01.920,0:04:07.120
the repeated measures.

0:04:04.159,0:04:09.280
If we don't use spec specific analysis

0:04:07.120,0:04:12.000
methods that can incorporate this

0:04:09.280,0:04:13.840
correlation standard errors p values and

0:04:12.000,0:04:16.560
confidence intervals can be incorrect

0:04:13.840,0:04:18.560
sometimes grossly so and I'll show you

0:04:16.560,0:04:20.160
some examples in a minute. Uh and

0:04:18.560,0:04:22.400
unfortunately it's not possible to

0:04:20.160,0:04:23.759
predict you know okay some people would

0:04:22.400,0:04:25.120
might like to say something well I'm

0:04:23.759,0:04:26.720
just doing a linear regression it's

0:04:25.120,0:04:28.880
clustered data this is probably

0:04:26.720,0:04:31.120
conservative or it's liberal it's not

0:04:28.880,0:04:34.080
possible to predict which way the proper

0:04:31.120,0:04:36.639
analysis will go compared to a

0:04:34.080,0:04:38.880
simplistic analysis that ignores the

0:04:36.639,0:04:41.759
correlations or the longitudinal nature

0:04:38.880,0:04:43.840
of the data.

0:04:41.759,0:04:46.880
All right. So to give you a little bit

0:04:43.840,0:04:51.520
of intuition though for a between person

0:04:46.880,0:04:54.479
predictor so say body mass index the

0:04:51.520,0:04:56.800
proper clustered data outcome measured

0:04:54.479,0:04:59.360
on two knees analysis will usually have

0:04:56.800,0:05:01.759
larger standard errors than a simplistic

0:04:59.360,0:05:04.320
analysis. The intuition there is that

0:05:01.759,0:05:06.320
between person predictors an analysis

0:05:04.320,0:05:08.160
that assumes the knees are independent

0:05:06.320,0:05:10.800
will over represent the information

0:05:08.160,0:05:12.639
content. Each knee does not contribute

0:05:10.800,0:05:14.639
an independent and new piece of

0:05:12.639,0:05:16.960
information as if it was a knee measured

0:05:14.639,0:05:19.600
on a new person. There's some redundancy

0:05:16.960,0:05:21.680
of the information there. On the other

0:05:19.600,0:05:24.560
hand, if we're looking at a withinerson

0:05:21.680,0:05:27.360
predictor like a knee specific predictor

0:05:24.560,0:05:30.720
like the WAC knee pain for each knee as

0:05:27.360,0:05:32.320
the predictor, the proper clustered data

0:05:30.720,0:05:34.000
analysis method will usually have

0:05:32.320,0:05:36.000
smaller standard errors. And the

0:05:34.000,0:05:38.400
intuition there is that using each

0:05:36.000,0:05:41.120
person as their own control increases

0:05:38.400,0:05:42.960
efficiency. Looking within a person at

0:05:41.120,0:05:45.120
differences between the knees is

0:05:42.960,0:05:47.120
typically more precise than if you

0:05:45.120,0:05:50.080
collected the information on different

0:05:47.120,0:05:52.160
people's knees.

0:05:50.080,0:05:54.800
All right, here's an example. Um, this

0:05:52.160,0:05:57.039
is going back to example two. Is there a

0:05:54.800,0:06:00.000
sex by baseline symptomatic

0:05:57.039,0:06:03.919
osteoarthritis interaction for the WAC

0:06:00.000,0:06:06.880
pain score? So,

0:06:03.919,0:06:10.319
if we're comparing men and women

0:06:06.880,0:06:13.919
as to what their WAC pain scores when

0:06:10.319,0:06:16.479
they do or don't have uh knee arthritis,

0:06:13.919,0:06:18.080
um are the differences the same? So, if

0:06:16.479,0:06:21.840
I'm looking here at the bottom part of

0:06:18.080,0:06:24.400
this slide, um with for people with no

0:06:21.840,0:06:26.479
knee, males and females have very

0:06:24.400,0:06:30.720
similar average pain scores. Difference

0:06:26.479,0:06:33.680
is 0.05. But for people with NEOA, uh,

0:06:30.720,0:06:35.120
women have a higher pain score. Um, and

0:06:33.680,0:06:37.840
you know, maybe that's statistically

0:06:35.120,0:06:40.319
significant. If we incorrectly assume

0:06:37.840,0:06:42.080
independence, um, the estimated

0:06:40.319,0:06:45.840
difference, it's really just the 0.91

0:06:42.080,0:06:49.840
subtract off the 0.05. Um, has a p value

0:06:45.840,0:06:52.400
of 0.1. If we do the proper

0:06:49.840,0:06:54.639
analysis accounting for the correlation,

0:06:52.400,0:06:56.240
same estimate, different standard error,

0:06:54.639,0:06:57.840
and we now get a p- value that's an

0:06:56.240,0:06:59.039
order of magnitude bigger.

0:06:57.840,0:07:00.800
Qualitatively, they're both

0:06:59.039,0:07:03.360
statistically significant, but it's easy

0:07:00.800,0:07:04.960
to envision situations that were the

0:07:03.360,0:07:08.160
case is a little more borderline when

0:07:04.960,0:07:10.240
one would be fairly highly statistically

0:07:08.160,0:07:12.400
significant, the other one wouldn't be.

0:07:10.240,0:07:14.479
Um, so you know, but even in this case,

0:07:12.400,0:07:16.000
of course, the p values are uh quite

0:07:14.479,0:07:17.599
different.

0:07:16.000,0:07:21.039
All

0:07:17.599,0:07:24.080
right. So when you analyze the there the

0:07:21.039,0:07:26.479
methods for accommodating cluster data

0:07:24.080,0:07:28.720
or longitudinal data or data on change

0:07:26.479,0:07:31.280
or repeated measures um are all the

0:07:28.720,0:07:33.360
same. They're all methods that

0:07:31.280,0:07:35.360
accommodate the correlation between the

0:07:33.360,0:07:37.120
data whether it's due to clustering

0:07:35.360,0:07:38.800
whether it's due to longitudinal changes

0:07:37.120,0:07:40.720
over time or whether it's due to

0:07:38.800,0:07:42.800
repeated measures. And there are two

0:07:40.720,0:07:45.280
primary approaches and those are what

0:07:42.800,0:07:48.000
are called mixed models or generalized

0:07:45.280,0:07:50.240
estimating equations. uh the names for

0:07:48.000,0:07:54.199
both are a little esoteric uh but not

0:07:50.240,0:07:54.199
really to worry about

0:07:54.240,0:08:00.720
these are commonly available in all the

0:07:57.280,0:08:02.560
major packages in SAS mix models are

0:08:00.720,0:08:04.319
proced

0:08:02.560,0:08:08.800
and glimmix

0:08:04.319,0:08:11.199
uh routines in STA it's mixed megmixed

0:08:08.800,0:08:14.560
effects generalized linear models mojit

0:08:11.199,0:08:17.440
for uh binary outcomes in R lemur

0:08:14.560,0:08:19.280
gleamer and NLME

0:08:17.440,0:08:23.360
And the generalized estimating equations

0:08:19.280,0:08:26.639
approach is in SAS it's called proc G in

0:08:23.360,0:08:31.120
state it's called XTG and in R there are

0:08:26.639,0:08:33.120
two major packages uh GE pack and GLMG

0:08:31.120,0:08:37.800
slightly prefer the second one but both

0:08:33.120,0:08:37.800
are very reasonable uh packages.

0:08:38.240,0:08:43.599
All right so what would we want out of a

0:08:40.560,0:08:46.640
method for dealing with longitudinal or

0:08:43.599,0:08:48.720
cluster data? Um,

0:08:46.640,0:08:49.600
oh, sorry,

0:08:48.720,0:08:49.600
a little side s side s side s side s

0:08:49.600,0:08:51.200
side s side s side s side s side s side

0:08:49.600,0:08:54.000
s side sidet track here. Um,

0:08:51.200,0:08:56.320
longitudinal order clustering is only an

0:08:54.000,0:08:59.040
issue when that clustering or

0:08:56.320,0:09:01.600
longitudinal nature of the data set is

0:08:59.040,0:09:04.160
for the outcome variable. It doesn't

0:09:01.600,0:09:05.680
really matter if it's for the predictor.

0:09:04.160,0:09:08.560
For example, suppose you're interested

0:09:05.680,0:09:11.600
in whether days missed from work is

0:09:08.560,0:09:14.240
predicted by knee pain. So days missed

0:09:11.600,0:09:15.920
from work is at the person level. And if

0:09:14.240,0:09:18.560
we're just doing this at say the

0:09:15.920,0:09:21.360
baseline, um we only have one

0:09:18.560,0:09:25.360
measurement for each person even though

0:09:21.360,0:09:28.480
the predictor knee pain is clustered.

0:09:25.360,0:09:29.920
It's measured on each knee. Um so this

0:09:28.480,0:09:31.360
does not really have repeated measures

0:09:29.920,0:09:34.240
on the outcome. And we can deal with

0:09:31.360,0:09:36.160
this with the standard statistical

0:09:34.240,0:09:37.680
methods. So we can accommodate this

0:09:36.160,0:09:40.080
either by including both the left and

0:09:37.680,0:09:42.959
right knee predictors as predictors in

0:09:40.080,0:09:45.680
the in the model or or by calculating

0:09:42.959,0:09:48.000
some summary measure for example average

0:09:45.680,0:09:49.920
knee pain. And if we're worried about oh

0:09:48.000,0:09:52.160
what if somebody has asymmetric knee

0:09:49.920,0:09:53.920
pain maybe that does something you could

0:09:52.160,0:09:55.200
have two predictors one being the

0:09:53.920,0:09:58.720
average and the other being the

0:09:55.200,0:10:01.120
difference between the two.

0:09:58.720,0:10:04.320
All right. So, what would we want out of

0:10:01.120,0:10:06.959
an analysis method to accommodate

0:10:04.320,0:10:08.560
clustered or longitudinal data? Um, we'd

0:10:06.959,0:10:10.880
want it to be able to accommodate a

0:10:08.560,0:10:14.959
variety of different outcome types. Key

0:10:10.880,0:10:16.800
ones being binary or numeric. We'd want

0:10:14.959,0:10:20.240
it to be able to accommodate clustering

0:10:16.800,0:10:21.600
by knee, person, longitudinal over time,

0:10:20.240,0:10:24.640
perhaps even, you know, different

0:10:21.600,0:10:26.560
regions of interest within a knee. um

0:10:24.640,0:10:28.800
and we would not necessarily want to

0:10:26.560,0:10:31.680
spend a lot of time modeling the

0:10:28.800,0:10:33.920
correlation. Usually the correlation is

0:10:31.680,0:10:36.480
sort of a nuisance factor. We need to

0:10:33.920,0:10:40.959
accommodate it, but we're not really

0:10:36.480,0:10:42.399
that um curious about it. Um I'll I'll

0:10:40.959,0:10:44.720
talk about the differences in the

0:10:42.399,0:10:47.920
methods in a second, but I would say,

0:10:44.720,0:10:49.680
you know, in 95% of the examples I run

0:10:47.920,0:10:52.000
across, people are mostly interested in

0:10:49.680,0:10:53.680
the associations. the correlation needs

0:10:52.000,0:10:57.040
to be accommodated for the reasons that

0:10:53.680,0:10:59.760
I uh noted and I'll mention in in a

0:10:57.040,0:11:03.200
couple of other examples but not of

0:10:59.760,0:11:05.440
primary interest. So often it's nice not

0:11:03.200,0:11:08.079
to be able to have to spend a lot of

0:11:05.440,0:11:10.640
time modeling the correlation.

0:11:08.079,0:11:13.200
So that leads me to recommending as sort

0:11:10.640,0:11:15.519
of a base analysis strategy generalized

0:11:13.200,0:11:19.040
estimating equations. It works with many

0:11:15.519,0:11:21.279
different types of outcomes. um it

0:11:19.040,0:11:23.920
utilizes and it's important to turn on

0:11:21.279,0:11:26.160
for example in STA it's not the default

0:11:23.920,0:11:28.800
so you you want to turn on this robust

0:11:26.160,0:11:31.600
variance estimate um it obiates the need

0:11:28.800,0:11:35.360
to model correlation structure so the

0:11:31.600,0:11:38.079
basic idea is that it uses the empirical

0:11:35.360,0:11:39.519
results your data set itself to estimate

0:11:38.079,0:11:42.480
and accommodate the correlation

0:11:39.519,0:11:45.200
structure um so it's not very dependent

0:11:42.480,0:11:47.839
upon what correlation structure you use

0:11:45.200,0:11:49.760
as sort of a working model to get your

0:11:47.839,0:11:51.600
estimates.

0:11:49.760,0:11:53.279
It works well as long as you don't have

0:11:51.600,0:11:55.440
too many repeated measures per subject

0:11:53.279,0:11:58.079
and you have a large number of subjects.

0:11:55.440,0:12:00.160
So for many analyses in the OAI data

0:11:58.079,0:12:02.800
set, you're going to be able to use, you

0:12:00.160,0:12:05.279
know, hundreds if not thousands of

0:12:02.800,0:12:08.000
participants. So you probably have a

0:12:05.279,0:12:10.160
large number of of participants,

0:12:08.000,0:12:12.079
subjects, and not that many observations

0:12:10.160,0:12:14.639
per person. Even if you use, you know,

0:12:12.079,0:12:16.639
full longitudinal data in a couple knees

0:12:14.639,0:12:18.880
or even even a small number of regions

0:12:16.639,0:12:21.839
within a knee, it doesn't add up to that

0:12:18.880,0:12:23.760
many repeated measurements.

0:12:21.839,0:12:25.440
So, this is ideal for analyses that

0:12:23.760,0:12:28.320
incorporate multiple knees and time

0:12:25.440,0:12:31.360
periods. It can get less good if you're

0:12:28.320,0:12:34.639
using just a small subset of the OI data

0:12:31.360,0:12:37.440
set and you're using like 10 different

0:12:34.639,0:12:39.440
time points, two knees, and five regions

0:12:37.440,0:12:43.200
within a knee. then you have lots of

0:12:39.440,0:12:48.120
observations per person. Um and these

0:12:43.200,0:12:48.120
sort of methods can then work less well.

0:12:48.480,0:12:54.079
Um you both these methods GE or mixed

0:12:51.600,0:12:56.800
models can accommodate unbalanced data.

0:12:54.079,0:12:58.720
Some subjects contribute one knee while

0:12:56.800,0:13:00.880
others contribute two or they don't have

0:12:58.720,0:13:05.440
all the same amount of data which is

0:13:00.880,0:13:08.800
invariable in a in a data set like this.

0:13:05.440,0:13:10.800
Okay. One big proviso that might cause

0:13:08.800,0:13:14.000
you to think about using mixed models

0:13:10.800,0:13:16.480
instead. Um, always be wary of the

0:13:14.000,0:13:17.680
genesis of missing data. If the fact

0:13:16.480,0:13:19.760
that the data are missing is

0:13:17.680,0:13:22.000
informative, i.e. those with missing

0:13:19.760,0:13:24.399
visits are in extreme pain, but you

0:13:22.000,0:13:26.720
don't get to see them because they miss

0:13:24.399,0:13:28.399
their visits, then virtually no standard

0:13:26.720,0:13:30.560
statistical method will get the right

0:13:28.399,0:13:32.320
answer.

0:13:30.560,0:13:34.480
um with considerable missing data

0:13:32.320,0:13:36.240
especially due to dropout common if

0:13:34.480,0:13:39.200
you're analyzing later time periods from

0:13:36.240,0:13:42.320
the osteoarthritis initiative consider

0:13:39.200,0:13:44.800
using or comparing you know GE might be

0:13:42.320,0:13:47.600
your first analysis but compare it to a

0:13:44.800,0:13:50.399
mixed model analysis because mix

0:13:47.600,0:13:53.120
modeling accommodates and it is

0:13:50.399,0:13:55.920
sometimes slightly less biased with

0:13:53.120,0:13:58.079
informative missing data um or

0:13:55.920,0:14:00.320
accommodate missing data with something

0:13:58.079,0:14:02.959
like inverse probability waiting Here's

0:14:00.320,0:14:05.839
here's a reference to that methodology.

0:14:02.959,0:14:08.399
Um, and or use empirical standard errors

0:14:05.839,0:14:10.399
with mixed models. So, use the mixed

0:14:08.399,0:14:12.240
models to get your estimates, but then

0:14:10.399,0:14:15.519
use the data to estimate what the

0:14:12.240,0:14:18.079
correlation structure might be.

0:14:15.519,0:14:19.760
All right, here's going back to example

0:14:18.079,0:14:22.880
two and comparing these different

0:14:19.760,0:14:25.600
analysis methods again. Is there a sex

0:14:22.880,0:14:27.920
by baseline symptomatic OA interaction

0:14:25.600,0:14:30.720
for the WMAC pain score? Assuming

0:14:27.920,0:14:33.519
independence, that's incorrect analysis.

0:14:30.720,0:14:35.839
GE with the robust option turned on. Mix

0:14:33.519,0:14:37.920
model plain mixed model with the

0:14:35.839,0:14:40.399
empirical standard errors turned on. The

0:14:37.920,0:14:43.760
estimates are all the same. The G the

0:14:40.399,0:14:46.800
standard errors vary by a fair amount.

0:14:43.760,0:14:48.959
Um you can see the p value is is pretty

0:14:46.800,0:14:52.639
drastically too small for assuming

0:14:48.959,0:14:56.320
independence. the GE method um pretty

0:14:52.639,0:14:58.720
dependable here and um mix model is a

0:14:56.320,0:15:00.320
little off maybe because I haven't

0:14:58.720,0:15:02.480
modeled the correlation structure

0:15:00.320,0:15:04.639
correctly in this simple mix model

0:15:02.480,0:15:06.480
analysis when I turn on the empirical

0:15:04.639,0:15:08.720
standard errors it makes the correction

0:15:06.480,0:15:12.279
and and gets the standard errors uh

0:15:08.720,0:15:12.279
probably correct

0:15:13.279,0:15:18.600
um that was SAS I'm sorry

0:15:18.639,0:15:23.440
um SAS um GE or gen mod with the

0:15:21.360,0:15:25.519
repeated option.

0:15:23.440,0:15:28.240
SAS um in mixed you can turn on the

0:15:25.519,0:15:31.440
empirical option in STA you can do

0:15:28.240,0:15:33.600
cluster or or cluster robust uh standard

0:15:31.440,0:15:35.519
errors added to almost any command

0:15:33.600,0:15:38.519
including the mixed models mixed model

0:15:35.519,0:15:38.519
commands.

0:15:38.800,0:15:42.240
All right, here's a here's another

0:15:40.399,0:15:45.120
example. Does pain predict presence of

0:15:42.240,0:15:48.000
osteophytes at baseline?

0:15:45.120,0:15:49.759
um you can fit a logistic regression

0:15:48.000,0:15:52.639
model

0:15:49.759,0:15:54.800
that accounts for clustering using GE

0:15:52.639,0:15:57.519
and get answers like the odds of an

0:15:54.800,0:16:00.240
osteophite increased by 12 a.5% with

0:15:57.519,0:16:02.720
each increase in pain score. So you can

0:16:00.240,0:16:05.279
fit logistic regression type models and

0:16:02.720,0:16:06.959
accommodate the correlation. You know as

0:16:05.279,0:16:08.560
I mentioned earlier of course we're

0:16:06.959,0:16:11.040
going to have interpretations with

0:16:08.560,0:16:13.680
logistic models that are odds ratios and

0:16:11.040,0:16:15.440
things like areas under the ROC curve.

0:16:13.680,0:16:19.720
Um, but you can still do that accounting

0:16:15.440,0:16:19.720
for clustering by subject.

0:16:20.240,0:16:23.920
There are a couple of additional

0:16:22.079,0:16:26.320
considerations to think about when

0:16:23.920,0:16:28.079
you're analyzing longitudinal data.

0:16:26.320,0:16:30.240
Namely, you have to include a time

0:16:28.079,0:16:34.240
variable in your analysis because that's

0:16:30.240,0:16:38.399
what captures the changes over time.

0:16:34.240,0:16:40.320
um inclusion of time or visit um

0:16:38.399,0:16:42.320
interactions with baseline predictors

0:16:40.320,0:16:44.240
will allow you to say whether or not

0:16:42.320,0:16:46.800
baseline predictors are associated with

0:16:44.240,0:16:49.680
change over time. So remember what

0:16:46.800,0:16:51.279
interactions do interactions say is the

0:16:49.680,0:16:53.279
effect the same or different depending

0:16:51.279,0:16:55.040
on another variable. So if we're

0:16:53.279,0:16:58.000
interested in whether change over time

0:16:55.040,0:17:00.880
is different, our time variable like

0:16:58.000,0:17:02.639
visit is capturing change over time. If

0:17:00.880,0:17:04.559
we want to know if that differs by some

0:17:02.639,0:17:07.679
baseline characteristic, we need to

0:17:04.559,0:17:09.760
include an interaction term.

0:17:07.679,0:17:13.039
It's different if we have a time varying

0:17:09.760,0:17:16.000
predictor. So MRI findings at sequential

0:17:13.039,0:17:17.919
visits um we can just include those in

0:17:16.000,0:17:20.880
the model if we have a variable that

0:17:17.919,0:17:22.880
changes over time. Um but if we have

0:17:20.880,0:17:27.039
something that's fixed um then we

0:17:22.880,0:17:28.960
include an interaction with time.

0:17:27.039,0:17:30.960
Another sort of key consideration with

0:17:28.960,0:17:34.080
longitudinal data is you may want to use

0:17:30.960,0:17:36.559
lag variables. Um if you use uh a

0:17:34.080,0:17:38.960
predictor that's from a previous visit

0:17:36.559,0:17:42.320
um that can help you establish either

0:17:38.960,0:17:44.000
whether it's prognostic and um because

0:17:42.320,0:17:46.320
of time precedence it may help

0:17:44.000,0:17:49.360
strengthen the inference of causation.

0:17:46.320,0:17:51.200
It doesn't prove causation, but um we

0:17:49.360,0:17:53.760
can never basically prove causation from

0:17:51.200,0:17:56.080
observational studies like OI, but it

0:17:53.760,0:17:58.640
can help strengthen that inference a

0:17:56.080,0:18:00.160
little bit.

0:17:58.640,0:18:02.240
All right, so here's another example

0:18:00.160,0:18:04.320
going back to example three. Does

0:18:02.240,0:18:06.720
18-month change in WAC pain depend on

0:18:04.320,0:18:08.640
baseline symptomatic kneea? So I would

0:18:06.720,0:18:11.919
include an interaction term here because

0:18:08.640,0:18:14.240
it's the baseline symptomatic knee

0:18:11.919,0:18:16.880
osteoarthritis.

0:18:14.240,0:18:18.480
Um here are the comparisons.

0:18:16.880,0:18:22.480
Independence

0:18:18.480,0:18:25.440
GE robust mixed with a random effect of

0:18:22.480,0:18:27.760
person empirical. So again the the

0:18:25.440,0:18:30.960
coefficients are all the same. Standard

0:18:27.760,0:18:33.200
errors differ a bit. Um here we do see

0:18:30.960,0:18:34.799
that we have borderline statistically

0:18:33.200,0:18:37.360
significant p values when we do the

0:18:34.799,0:18:40.080
proper analysis. We have a p- value

0:18:37.360,0:18:42.080
that's three time three-fold higher um

0:18:40.080,0:18:44.320
when we incorrectly assume independent.

0:18:42.080,0:18:45.919
So again emphasizing the point that

0:18:44.320,0:18:48.480
these

0:18:45.919,0:18:50.320
proper analyses can be either more

0:18:48.480,0:18:52.000
liberal or more conservative compared to

0:18:50.320,0:18:54.799
the naive analysis. You can't predict

0:18:52.000,0:18:58.160
which direction they're going to go um

0:18:54.799,0:19:00.880
when you do the proper analysis.

0:18:58.160,0:19:02.720
All right. So what about a lot of people

0:19:00.880,0:19:05.200
a lot of times people ask me can I just

0:19:02.720,0:19:06.640
analyze chain scores? Um that's an

0:19:05.200,0:19:08.320
excellent and a simple method when there

0:19:06.640,0:19:10.480
are only two time points. So there's

0:19:08.320,0:19:12.000
only one change, but it's not as

0:19:10.480,0:19:14.000
attractive with either multiple time

0:19:12.000,0:19:15.840
points or imbalanced data. You can get

0:19:14.000,0:19:18.080
loss of efficiency even if you only have

0:19:15.840,0:19:20.320
two time points. You can get small gains

0:19:18.080,0:19:22.880
and efficiency by including them in one

0:19:20.320,0:19:24.960
of these methods. Um, and if you do

0:19:22.880,0:19:26.480
analyze chain scores in some

0:19:24.960,0:19:29.840
literatures, it's pretty common to

0:19:26.480,0:19:32.160
analyze chain scores and also adjust for

0:19:29.840,0:19:34.559
the baseline value. I don't recommend

0:19:32.160,0:19:36.880
that. Um, and that will usually create

0:19:34.559,0:19:40.480
biased estimates of change. I'll show

0:19:36.880,0:19:44.080
you in in a second how that works out.

0:19:40.480,0:19:49.039
All right. So, here's just a simple data

0:19:44.080,0:19:52.640
table. Here's the WAC knee pain score um

0:19:49.039,0:19:54.320
at baseline and visit 12 months divided

0:19:52.640,0:19:57.440
by whether or not they had symptomatic

0:19:54.320,0:20:00.160
neoa at baseline.

0:19:57.440,0:20:02.720
So, at baseline, those with uh

0:20:00.160,0:20:05.520
symptomatic neoa, not surprisingly, had

0:20:02.720,0:20:07.200
much higher pain scores. uh but they

0:20:05.520,0:20:12.400
actually dropped a little bit by 12

0:20:07.200,0:20:15.520
months um whereas the uh the we also see

0:20:12.400,0:20:17.919
a small slightly smaller drop um in

0:20:15.520,0:20:20.480
those who didn't have symptomatic neoa

0:20:17.919,0:20:23.520
at baseline all right what does a formal

0:20:20.480,0:20:26.160
analysis of this look like if I use a

0:20:23.520,0:20:28.559
longitudinal analysis the difference in

0:20:26.160,0:20:31.200
the change baseline to 12 months between

0:20:28.559,0:20:34.320
the OA and nonoa groups is point about

0:20:31.200,0:20:36.400
27 with a standard error of 0.13 p value

0:20:34.320,0:20:38.880
that's just slightly statistically

0:20:36.400,0:20:41.200
significant 0.045.

0:20:38.880,0:20:45.600
If I use a simple change score analysis,

0:20:41.200,0:20:47.679
I get basically exactly the same answer.

0:20:45.600,0:20:51.200
If I adjust for the baseline value in

0:20:47.679,0:20:54.000
addition, it gives a difference of 042.

0:20:51.200,0:20:56.720
And and notice, you know, this 2 this

0:20:54.000,0:20:59.360
27, you know, that's basically just the

0:20:56.720,0:21:01.600
difference in the two changes, which is

0:20:59.360,0:21:04.400
kind of what we'd expect. you know the

0:21:01.600,0:21:05.600
difference in in the yes group is a

0:21:04.400,0:21:09.679
little bit bigger than the difference in

0:21:05.600,0:21:11.520
the no group by about 0 27.

0:21:09.679,0:21:13.600
Um if I adjust for baseline the

0:21:11.520,0:21:16.240
estimated difference is 042 with a p

0:21:13.600,0:21:19.200
value that's about zero. The the

0:21:16.240,0:21:20.480
adjusted analysis is not answering the

0:21:19.200,0:21:22.159
same question. It's not answering

0:21:20.480,0:21:23.919
whether the change in time is different

0:21:22.159,0:21:26.159
between the two groups. So I don't

0:21:23.919,0:21:28.080
recommend this as a standard analysis.

0:21:26.159,0:21:30.799
There's an interesting uh article some

0:21:28.080,0:21:32.880
years back now by Maria Gleemore which

0:21:30.799,0:21:36.000
shows that adjusting chain score

0:21:32.880,0:21:39.200
analyses almost uh for for baseline

0:21:36.000,0:21:41.120
values almost never answers a reasonable

0:21:39.200,0:21:43.360
causal question. So you probably don't

0:21:41.120,0:21:44.960
want to be doing it.

0:21:43.360,0:21:46.960
All right, that's the end of my

0:21:44.960,0:21:49.679
presentation. I have a few minutes to be

0:21:46.960,0:21:52.480
able to answer questions. Um there's my

0:21:49.679,0:21:55.360
contact information and next up in this

0:21:52.480,0:21:58.360
seminar series is uh Grace Low on June

0:21:55.360,0:21:58.360
8th.

0:21:58.480,0:22:03.280
Thank you so much Dr. McCulla for that

0:22:00.400,0:22:05.280
really insightful presentation and yeah

0:22:03.280,0:22:07.919
as you just said we are now open for

0:22:05.280,0:22:10.799
Q&A. So if anyone has any questions feel

0:22:07.919,0:22:13.919
free to type that into the chat. Um I

0:22:10.799,0:22:16.559
have a question for you here. Um so what

0:22:13.919,0:22:18.799
is the role of trajectory analysis to

0:22:16.559,0:22:21.360
identify trajectories that represent

0:22:18.799,0:22:22.799
data over time versus using a model with

0:22:21.360,0:22:25.120
repeated measures?

0:22:22.799,0:22:27.440
Yeah, so a trajectory analysis well I

0:22:25.120,0:22:29.520
mean is a is a broad term. Most

0:22:27.440,0:22:33.520
trajectory analyses that I know of are

0:22:29.520,0:22:35.679
actually um simple uh specific uh

0:22:33.520,0:22:37.600
examples of these sort of methods. So

0:22:35.679,0:22:39.679
typically with trajectory analyses

0:22:37.600,0:22:42.159
you're fitting some sort of smooth curve

0:22:39.679,0:22:45.120
over time. So it's a question of how you

0:22:42.159,0:22:47.520
treat the time variable or visit. So in

0:22:45.120,0:22:51.679
the simplest sort of trajectory analyses

0:22:47.520,0:22:54.240
you um fit like linear andor quadratic

0:22:51.679,0:22:57.120
curves usually separately by group and

0:22:54.240,0:22:59.760
compare the groups. There are there are

0:22:57.120,0:23:02.960
also sort of more exploratory methods of

0:22:59.760,0:23:05.760
trajectory analysis that may attempt to

0:23:02.960,0:23:07.120
group people into trajectories. I would

0:23:05.760,0:23:08.880
call those group based trajectory

0:23:07.120,0:23:10.720
analyses. those are a little bit

0:23:08.880,0:23:14.240
different because they typically look

0:23:10.720,0:23:16.240
for latent um categories that underly

0:23:14.240,0:23:18.799
the groupings. So that may be what

0:23:16.240,0:23:20.720
you're you're talking about. But again,

0:23:18.799,0:23:23.440
the basic statistical methods that

0:23:20.720,0:23:26.720
underly those methods are either GE or

0:23:23.440,0:23:29.679
mixed model analyses.

0:23:26.720,0:23:32.559
Thank you. Um oh, we just got another

0:23:29.679,0:23:35.200
question. Uh says, "Thank you for your

0:23:32.559,0:23:37.360
presentation. uh for the change analysis

0:23:35.200,0:23:40.640
with the baseline predictor is it enough

0:23:37.360,0:23:42.640
to look at the inter interaction team or

0:23:40.640,0:23:43.760
should we also look at the main effect

0:23:42.640,0:23:46.480
term?

0:23:43.760,0:23:48.640
Yeah. So um when you're interested in

0:23:46.480,0:23:51.120
whether a baseline predictor is

0:23:48.640,0:23:52.960
associated with change over time that

0:23:51.120,0:23:54.880
primary question is answered by the

0:23:52.960,0:23:57.440
interaction term. So no you don't need

0:23:54.880,0:23:59.919
to you to look at the main effect to

0:23:57.440,0:24:02.799
answer that question. Of course, the you

0:23:59.919,0:24:05.520
know, almost any estimate in a a

0:24:02.799,0:24:08.080
regression model tells you something. If

0:24:05.520,0:24:11.760
you've coded your time variable so that

0:24:08.080,0:24:13.760
time zero is baseline, then the main

0:24:11.760,0:24:15.440
effect is the comparison at baseline. So

0:24:13.760,0:24:17.360
that still might be of interest, but it

0:24:15.440,0:24:20.000
doesn't answer the question whether the

0:24:17.360,0:24:23.480
change in over time is related to the

0:24:20.000,0:24:23.480
baseline predictor.

0:24:26.320,0:24:31.520
Um, another question, are there analytic

0:24:29.520,0:24:33.440
approaches that we need to consider when

0:24:31.520,0:24:36.000
combining data from different imaging

0:24:33.440,0:24:39.200
projects?

0:24:36.000,0:24:40.799
Yes. Um, so

0:24:39.200,0:24:46.720
um sometimes the different imaging

0:24:40.799,0:24:49.279
projects um have uh were conducted with

0:24:46.720,0:24:51.840
um not straightforward like randomly

0:24:49.279,0:24:54.240
sampled designs. Sometimes some of the

0:24:51.840,0:24:56.559
imaging projects especially early on

0:24:54.240,0:24:58.720
when um reading the images which was

0:24:56.559,0:25:00.880
much more expensive now it's gotten a

0:24:58.720,0:25:02.720
lot more automated were like case

0:25:00.880,0:25:06.960
control designs and sometimes that can

0:25:02.720,0:25:09.039
introduce bias into the analyses. So

0:25:06.960,0:25:12.400
it's it's almost always a good idea to

0:25:09.039,0:25:15.760
do a little bit of descriptive um work

0:25:12.400,0:25:17.600
um perhaps by considering estimates from

0:25:15.760,0:25:20.080
different subsets of the data. make sure

0:25:17.600,0:25:23.600
that they're relatively consistent

0:25:20.080,0:25:26.400
before combining them. Um, and to look

0:25:23.600,0:25:29.279
out for um, subsets that might have been

0:25:26.400,0:25:31.520
conducted with somewhat unusual designs

0:25:29.279,0:25:36.840
like as I said with a case control or

0:25:31.520,0:25:36.840
maybe even a case cohort um, design.

0:25:38.080,0:25:43.840
Great. Um, another question, amazing

0:25:41.120,0:25:46.000
presentation. It quick question in your

0:25:43.840,0:25:48.080
example. What are the advantages and

0:25:46.000,0:25:50.000
disadvantages of using right and left

0:25:48.080,0:25:52.720
knee pain separately as predictors

0:25:50.000,0:25:55.039
versus using an average pain measure as

0:25:52.720,0:25:57.440
a predictor? If using separate knees,

0:25:55.039,0:25:58.720
you are adjusting for pain in each side,

0:25:57.440,0:26:00.480
how is that considered in the

0:25:58.720,0:26:02.960
interpretation?

0:26:00.480,0:26:06.080
Yeah. So, if le let's say that knee pain

0:26:02.960,0:26:08.960
is the primary predictor of interest um

0:26:06.080,0:26:10.559
for your your your study. So, you know,

0:26:08.960,0:26:12.960
it's important to think ahead. I mean,

0:26:10.559,0:26:15.360
why would you want to say

0:26:12.960,0:26:17.679
left knee pain is predictive of such and

0:26:15.360,0:26:19.360
such? You know, it it would probably not

0:26:17.679,0:26:23.600
really be a very sensible way to

0:26:19.360,0:26:25.679
characterize results. Um, and so

0:26:23.600,0:26:27.279
I I might then gravitate to using

0:26:25.679,0:26:30.000
something like, okay, average knee pain

0:26:27.279,0:26:31.600
is probably what's going to be causing

0:26:30.000,0:26:34.960
people to I think the example was

0:26:31.600,0:26:36.880
missing days of work. Um, but you might

0:26:34.960,0:26:39.440
be able to make an argument that

0:26:36.880,0:26:42.000
asymmetry and knee pain, having one knee

0:26:39.440,0:26:44.320
very painful compared to the other knee

0:26:42.000,0:26:46.320
might be important as well. So that's

0:26:44.320,0:26:49.200
why, you know, I might gravitate to

0:26:46.320,0:26:51.120
doing that sort of a pre-summary of the

0:26:49.200,0:26:52.799
predictors and put them in the model,

0:26:51.120,0:26:54.880
checking to make sure just to make sure

0:26:52.799,0:26:56.799
that asymmetric knee pain wasn't

0:26:54.880,0:26:59.360
important. And then I could simplify

0:26:56.799,0:27:01.919
down to just say, look, we incorporated

0:26:59.360,0:27:03.760
knee pain in both knees. What we found

0:27:01.919,0:27:05.520
was that really it was the average knee

0:27:03.760,0:27:07.600
pain that was important. Here's the

0:27:05.520,0:27:09.120
interpretation. You know, on the other

0:27:07.600,0:27:11.760
hand, if you're interested in something

0:27:09.120,0:27:14.159
like a gate analysis, asymmetric knee

0:27:11.760,0:27:16.240
pain may be an important driver, in

0:27:14.159,0:27:19.799
which case that then might be the key

0:27:16.240,0:27:19.799
predictor of interest.

0:27:20.240,0:27:25.919
Thank you. I'm not seeing any more

0:27:23.039,0:27:28.400
questions, so it's about time to finish

0:27:25.919,0:27:30.400
up for today anyway. So, I want to thank

0:27:28.400,0:27:32.640
you again so much, Dr. Dr. McCulla for

0:27:30.400,0:27:34.960
that excellent presentation and taking

0:27:32.640,0:27:37.279
time to join us today. And I also want

0:27:34.960,0:27:40.799
to thank everyone else for joining us.

0:27:37.279,0:27:43.360
And uh also as we wrap up, I want to

0:27:40.799,0:27:46.880
invite you all to our upcoming webinar

0:27:43.360,0:27:48.480
uh hosted on June June 8th with Dr.

0:27:46.880,0:27:52.000
Grace Lo who will be talking about

0:27:48.480,0:27:55.200
lifetime physical activity data. Um, I

0:27:52.000,0:27:56.799
am sending a link in the chat

0:27:55.200,0:27:59.360
to register for that if you're

0:27:56.799,0:28:01.840
interested as long as as well as other

0:27:59.360,0:28:05.200
links to previous webinars and the

0:28:01.840,0:28:07.279
posteinar survey. So yeah, thank you all

0:28:05.200,0:28:12.000
for your time and I hope you all have a

0:28:07.279,0:28:12.000
great rest of your day. Thanks.

watch on YouTube