Live Webinar with Dr. Jamie Collins
Working with Data from the FNIH Osteoarthritis Biomarkers Consortium
March 9, 2026
Hi everyone. Thank you all so much for
taking time out of your busy schedules
to attend our oai.edu live webinar series
hosted by the OAI CORE Knowledgebase. I
also want to thank today's speaker Dr.
Jamie Collins who is the associate
director of the orthopedic and arthritis
center for outcomes research at
Brigham and Women's Hospital and associate
professor of orthopedic surgery at
Harvard Medical School. The title of
today's talk is working with data from
the FNIH osteoarthritis biomarkers
consortium a nested case control study
within the osteoarthritis osteoarthritis
initiative. And following this
presentation we should have a few
minutes for Q&A. So if you have any
questions feel free to type them into
the Q&A box at the bottom of your
screen. And if you can't see it
immediately you might need to click the
three dots uh to see more options. And
so without further ado I'll let uh Dr.
Collins take over.
Great. Thank you so much, Julieann, and
thanks to Jeff for inviting me to
speak. And we're going to talk today
about um working with data from the
FNIH OA Biomarkers Consortium.
I wanted to start just by acknowledging
the funding um scientific and financial
support for the FNIH OA biomarkers
consortium um were made possible um by
the organizations that you see on this
slide.
And just as a brief outline, I'll give
an overview of the nested case control
study, including the rationale and study
design. I'll describe what data are
available and how you access that data.
And then just spend a few minutes on
analytic considerations for working with
these data.
And I'd also just like to acknowledge
the OAI OA biomarkers project team um
shown on this slide. These were um all
of the people involved in phase one um
in particular Virginia Krauss and David
Hunter who are the PIs of this project.
And here are some key references
describing the rationale and the study
design. Um in particular that third
reference has a lot of detail on the
study design which we'll go through
briefly today.
So in terms of the rationale for this
study despite the clinical and economic
impact of OA there are currently no
pharmacologic therapies approved by
regulatory agencies to prevent OA or
stop disease progression.
Improvements in clinical trial design
are critically needed to overcome
barriers to the development of disease
modifying treatments
and refinement and improvement of
measures of joint structural change
based on imaging and/or biochemical
biomarkers are needed to identify
individuals with neoa that are likely to
progress radiographically and
symptomatically and to overcome the
limited responsiveness of existing
imaging biomarkers.
So the aims of the um OA biomarkers
consortium phase one study is to
establish the prognostic validity of
several imaging and biochemical
biomarkers for neoa progression.
So we measured um neoa progression um
sort of at a later stage at 36 to 60
months from baseline.
And the study investigated whether
biomarkers measured a baseline were
prognostic of disease progression with
the idea that these could be used for
prognostic enrichment of progressors in
clinical trials.
And it also investigated whether
short-term change in biomarkers measured
over 24 months were predictive of
longerterm progression with the idea
that such biomarkers may be candidates
to assess as potential surrogate
endpoints in trials. So, is there
something that we can measure earlier um
than radiographic progression um that's
still predictive of a clinically
relevant outcome?
So, the study design, it's a nested case
control study within the OAI. And so, by
nested case control study, what we mean
is that all participants were selected
from the larger OAI study.
Um so, it's a case control study. So,
the case control status was based on
disease progression.
So that was measured with um
radiographic progression which was a
loss of medial minimum joint space width
of 7/10 of a millimeter by 24 36 or 48
months versus baseline. And this
threshold was based on the OBI specific
smallest detectable change.
Um cases were also defined by pain
progression which was a persistent
increase versus baseline in total WOMAC
pain score above the minimally
clinically important difference of nine
points at 24 36 48 and 60 months. And to
meet this persistent threshold that
increase had to be maintained at at
least two time points.
So the primary case definition was knees
having both radiographic and pain
progression. And so that's what we sort
of deem our primary progression status.
um in this study
and then the primary control definition
was knees that did not reach criteria
for both end points. So the controls
included knees with X-ray progression
but not pain progression, X-ray only
progressors, knees with pain progression
but not X-ray progression, so pain only
progressors, and knees with neither
X-ray nor pain progression, so a
non-progressor.
Um so again these participants were
selected from the larger OAI cohort. Um
so out of the 4,796
participants in the OAI
excuse me um the first major inclusion
criteria was that um all required serum
and urine specimens X-ray joint space
width and MR imaging and WAC data at
both baseline and 24 months. And so, you
know, just to start, this is quite a
select cohort. Um, participants in order
to be considered to be included in this
nested case control study had to have
complete biomarker data at both baseline
and 24 months.
Um, and then from there, um, eligible
knees had kale grade between 1 and three
at baseline.
And from here, um, these knees were
sorted into four mutually exclusive
outcome groups. So again those four
outcome groups that we described on the
prior slide X-ray and pain progression,
X-ray only progression, pain only
progression and neither X-ray nor pain
progression. Um these were frequency
matched by um BMI and KL strata. And
again um you know further details are
available in that reference at the
bottom. I think the main thing to point
out here is that um this is not a random
selection. If you look for example at
the X-ray and pain progression um on the
far left here, there were 252 knees that
were potentially eligible based on X-ray
and pain progression and 194 of those
were selected. And if you go all the way
to the far right, there were 943 knees
that um were potentially eligible of
non-progressors and 200 of those were
selected. So this um nested case control
study oversamples cases and we'll come
back to that when we talk about um the
analytic implications but that's just
something to keep in mind with the
sampling strategy as that our
progressors or our cases are over
sampled relative to the full OAI.
Um so what data are available? Um
imaging biomarkers were assessed at
baseline 12 and 24 months which includes
MRI semi-quantitative assessment of MRI
with the MOS scores and then
quantitative assessment of cartilage
bone shape subcchondrial bone area and
miniscal volume. And for X-ray we have
subcondrial bone tacular integrity.
Um it's a lot of data and it can get a
little bit confusing. Um these measures
were assessed by several different
groups and each set of biomarkers is in
its own data set. Um so for example the
semiquantitative scores the centrally
performed longitudinal semiquantitative
readings with moes that was performed by
Boston imaging core labs and that's in
one data set.
Um condrometrics did u measurements of
cartilage volume thickness and other
associated measurements including
subcondral bone area and those are in
its own data set. Um, so when you go to
download the data and work with the
data, you just have to keep in mind um
that for each vendor or each group that
did an assessment of these data, each
has its own data set um in the data
download. And so if you're looking for
something, you may have to look across
several data sets to find what you're
looking for. And you can see again just
a description of what's available here.
Measures of bone shape, subcondrial bone
area, cartilage and meniscal volume, um
measurements of subcondrial bone plate
shape and curvature, um etc.
Um biochemical biomarker data are
available again at baseline 12 and 24
months. Um data are available on 11
serum and six urine bio uh biioarkers in
addition to urine creatinine and you can
see the list on the right there.
Um for the biochemical biomarker data I
would really urge people to take a look
at the description. This is uh the
document you can see in this bullet
here. Um there's really important
information on the assay methods
validation and calculation of reference
intervals. So um there's quite a bit of
documentation for this biopecimen data
and I would again urge you to to look
through that description in detail. And
this also can get a little bit confusing
um because the data are available both
for the FNIH cohort of 600
but also for 129 reference control
samples of individuals without OA from
the Johnson County osteoarthritis
project. So again, if you go to download
the biochemical biomarker data, you'll
get data both for the FNIH and for
Johnson County. And again, people can
find that a little bit confusing. So I
would again just urge you to look
through that description first to make
sure you understand all of the data sets
and documentation that are available for
the biochemical biomarkers.
Um, if you've not worked with OAI data
before, it can be a little bit
overwhelming at first. Um, but I think
it's a good thing because there's a lot,
again, a lot of description of exactly
what you're looking at. So you'll see
several files for each data set. Um so
for example here if you were interested
in working in that condrometrics data
the quantitative measures of cartilage
you would start by looking at this
descript.pdf.
Um this is the description of the data
set that includes biomarker assessment
and quantification. So exactly how the
markers were assessed, how were they
measured and it provides references to
publications describing that. Um the
next two are data sets. um it's a SAS
data set and then um I think this is a
transfer file. Um both of these should
be easily read into any sort of standard
statistical software package. Obviously
the SAS data set can be read with SAS,
it can be read with R, with Python STA,
etc. And so there's two versions of the
data set. There's a contents file that
has um just the contents of the data
set. So the variable names and labels,
and that can be really helpful just to
orient yourself to the data. And then
finally, the stats includes descriptive
statistics. And so this is always a nice
check to make sure that you have the
correct data and that it matches what's
in the descriptive statistics file. And
so again, this can get a little bit
overwhelming as I showed a few slides
ago. Um there's a separate sort of piece
of information um for each of those
vendors. So if you wanted to look at
theorphics data, you would see a similar
number of documents for that. So when
you do go to download um these files,
you do get quite a bit of information.
And so this is just to orient you into
what you're seeing there.
um to access the data and documentation
um it's at that NDA NIH website where
you get the fuller osteoarthritis
initiative um data if you click on the
full download um it will bring you to a
screen that shows all of the different
data that you can get and there's um a
separate button for the FNIH project um
and so this if you go to download this
it says biopecimens it actually includes
everything so I included a screenshot of
some of the things that you would
download if you were to click on um that
biopecimen SAS and again it's all of the
data for the entire FNIH including the
clinical data um reference intervals
this is showing the bone shape from etc
so you'll get dozens of files um but
this would be everything you need for
FNIH and again I I realize it says
biopecimens but this is going to give
you everything if you click on
downloading that
um so I think it's important just to
quickly go through the clinical data so
this would be that clinical FNIH SAS
data
These data are one line per participant
and one knee per participant. So again
just going back to that um flowchart,
one knee per participant was selected
for inclusion into the study. Um so this
clinical data set will have 600
observations in it. I think the the most
important variables that you need from
the clinical data set is that case
control status. And so this is in two
places. um it's in this case variable
where it's a numeric variable um with
the um grouping seen here. And then
there's also group type which is just
actually written out in text. And so
that first participant is in group two
which is the joint space loss only
progressor. Um and that last participant
is in group four which is a
non-progressor so did not progress on um
x-ray or pain.
These data can be linked as I said to
the larger OI data set um simply using
the ID variable. So it's the same ID in
both data sets. Um the top is that
clinical data from the FNIH download and
the bottom is the all clinical data from
the larger OI. So for example, if you
wanted to get coorbidities and WAC
function and merge that in with the FNIH
data, you could do that by ID. So it's
really pretty straightforward.
The one thing to keep in mind again is
that um again one need per participant
was selected for the FNIH case control
study. And so if you are getting data
from um the all clinical data set for
example you have to make sure that
you're pulling the right variable. So
that first participant their right knee
is included you want to make sure that
you're getting the right knee score for
WAC disability and the second
participant the left knee is included in
the FNIH study. And so you want to make
sure you're pulling the data for the
left knee.
Um, and I'm sure um, if you watched
Jeff's webinar last month um, he
probably went through all of this, but
the way that the side is included in the
OI data set depends on which data set
you're looking at, whether you're
looking at the clinical data, whether
you're looking at imaging data. So just
a note of caution um, to to pay special
attention to the side that's included.
Again, that will be in that clinical
FNIH data set um, to make sure that
you're pulling the right information.
Um so I wanted to just spend a few
minutes talking about analytic
opportunities with this data set. Um so
the first obviously is modeling that
case control outcome. Um and so we have
again our baseline biomarkers and then
short-term change measured over baseline
to 24 months where we'd be predicting
that case control status. Um and again
this is the most straightforward way to
use the data to model the case control
outcome. So again to examine
associations between baseline and/or
short-term change um with joint space
narrowing and pain case control status.
I think you know a word of caution here
for the FNIH data but really for any OI
analysis is that um many people have
used these data um and there have been
lots of different analyses both using
the available FNIH bio biomarker data I
mean also by groups who've reanalyzed
the images and quantified new
biomarkers. So I think you know as with
any OI analysis it's a really good idea
to do a thorough literature review
before you undertake any new analysis.
Um so what are some opportunities that I
see with this data that maybe have been
a little bit underststudied? I think
mediation analysis would be a really
interesting way to use these data. And
this is an example of a paper that was
recently published investigating whether
cineitis mediates the association
between bone marrow lesions and knee
pain using data from the FNIH OA
biomarkers consortium. Um and so the
question with mediation analysis is that
we may see an um an association between
an exposure so in this case BML size
score an outcome which in this case was
wne pain. And the question that we ask
is whether there's um a mediator
um through which um some of that
association is actually mediated. So in
this case um the investigators asked
whether some of the association between
BMLI score and WAC knee pain is actually
mediated through cineitis score. And I
think the FNIH offers a really unique
opportunity to do some of these analyses
um because we have longitudinal
biomarker data at baseline 12 and 24
months. So there's really an I think a
really interesting opportunity um for
longitudinal mediation analysis.
Um I think there's also um an an
opportunity to apply novel methods to
assess composite biomarkers um using
machine learning or otherwise. Um and so
we have all of these data. We've done
logistic regression. We've done um
random forests. But are there novel ways
that we can try to investigate composite
biomarkers? And I've included a couple
um papers um on the slide here that I
think would sort of offer interesting
approaches. Um I'm really interested in
in generalized additive models, which is
a flexible approach that allows for
nonlinear associations between
predictors and outcomes. And so I think
there's still a lot here in terms of um
trying to investigate combinations of
biomarkers that may um predict the case
control outcome.
So just in terms of analytic
considerations um the question I hear a
lot is can I undertake an analysis other
than predicting case control status. So
what if for example we wanted to look at
a different outcome we wanted to predict
total knee replacement or functional
decline instead of pain or if we wanted
to do maybe a cluster analysis for
phenotyping. Can we kind of reuse these
data for another question other than
that case control um outcome? And the
answer is yes, but with an asterisk. Um,
so you can, and I think it is totally
reasonable to do secondary analyses of a
nested case control study using a
different outcome or answering a
different question. Um, you just really
have to keep in mind that this is a
nested case control study and not a
random sample from the OI. And as I
mentioned at the beginning, um, cases
were oversampled. And so um when
undertaking a secondary analysis we
really have to be mindful of the
selection bias and sort of um what
population this generalizes to.
Um and there are um statistical methods
for this. So um you know any estimate of
incidence or prevalence is going to be
subject to selection bias. The
recommendation is to use inclusion
probability weighting to account for the
outcome dependent sampling. So you
essentially rewe your estimates by the
chance that that knee got selected um
for inclusion into the um case control
study. And then another thing to do
would be to stratify the analysis by
case status as a sort of sanity check.
So let's say for example you were
interested in whether baseline imaging
biomarkers were associated with
functional decline. Um you might ask
whether the association between that
baseline biomarker and functional
decline is the same for cases as for
controls. And if it's not, then you
really do have to start to think more
about that selection bias and what does
that mean for the generalizability of
your findings. And again, um there are
statistical methods to deal with this.
I've included a paper here that includes
some nice methods for doing this
inclusion probability waiting. Um I
think the the analysis itself is
relatively straightforward, but actually
calculating those weights um is not
something that's available in the
publicly available data. And so that's
something that you would have to work
through.
Um, and I did just want to provide a
little bit more information on the
FNIHOA biomarkers consortium.
Um, so on the FNIH website, you can get
an overview of phase one of the
biomarkers consortium project. Um, I'll
also point you to this um, slide deck
that's on the ORS website. Way back at
the beginning of the project in 2015, um
the investigators gave um one of the
pre-ongress workshops at the ORS meeting
on the project and those slides are
available on the ORS website.
Um finally um I would like to just um
mention that this nested case control
study was phase one of the OA biomarkers
project. In phase two, which was the
progress OA study, we attempted to
externally validate the highest
performing markers um from phase one um
in data from the placebo arms of
clinical trials. So the progress OA
study is not part of the OAI. It includes
data from the placebo arms of clinical
trials and um again we attempted to
externally validate the highest
performing markers. We are working on
making those data publicly available and
um working with Jeff to make sure that
once we do make those data publicly
available, we can get the word out and
let everybody access that. Um but in the
meantime, I've highlighted a publication
and um the FNIH description of the
progress that we study. And then it's um
sort of a full circle moment today and
talking about phase one of the
biomarkers consortium. Um we're
recording a podcast this afternoon with
David Hunter for his joint action
podcast
that will describe the third phase of
the biomarkers project in which um we're
attempting to get data from both the
placebo and treatment arms of clinical
trials to investigate some of these
biomarkers. So be on the lookout for the
announcement about the third phase of
the trial and again that will go up in
the coming weeks on David Hunter's Joint
Action podcast.
Um and so I wanted to end with the
announcement for um the next speaker in
this series, Dr. Michelle Yao, who's an
assistant scientist at Hebrew Senior
Life, an assistant professor of medicine
at Harvard Medical School, is going to
speak um next month about genetic data
resources in the OAI, a primer. Um for
those of you who don't know Michelle,
she's really one of the preeminent
researchers in OA and genetics. Um and I
think this is going to be a really
fantastic seminar.
And so with that, I'd like to say thank
you and I'm um happy to take any
questions.
Well, thank you so much Dr. Collins for
that excellent presentation. Uh so we'll
move on to Q&A. Uh anyone has any
questions, feel free to uh type them in
the Q&A box. Um I have a question. Uh,
can the FNIH MRI based data be merged
with other projects that measure the
MRIs?
Yes, that's a great point. Um, and
something I probably should have put on
the slides. So, um, there's several
other studies um, that have done um, MRI
assessments and OAI. Um, the one that
comes to mind is POMA, which I can't
deabbreviate at the moment, but I think
that was looking at um people without OA
at baseline and predicting long-term
total knee replacement. Um, and so those
data can certainly be merged together. I
think there is some overlap across the
studies though. So if you see um, you
know, semi-quantitative imaging data
from both POM and FNIH, there is overlap
and who was included in those. And so
you want to make sure that you're
checking um to make sure you're not
including somebody twice, but those data
can all be merged. It's the same study
ID for all of the data sets.
Um
not seeing any other questions but at
the moment uh so I want to say thank you
again so much for uh coming and giving
our talk today and also want to thank
you all for uh joining and coming
joining and attending uh this live
webinar. Uh as Dr. Collins mentioned
next month we have another webinar with
Michelle Yao. I sent a link in the chat
to register. So definitely uh help you
can uh make that if you're available.
And as I also mentioned earlier, we have
a post webinar survey uh that should
populate after the end of the meeting
and we'd really appreciate if you take
the time to do that. So without uh
further ado, thank you so much and we
all hope you all have a great rest of
your day.
