Live Webinar with Dr. Jamie Collins

Working with Data from the FNIH Osteoarthritis Biomarkers Consortium

March 9, 2026

Hi everyone. Thank you all so much for

taking time out of your busy schedules

to attend our oai.edu live webinar series

hosted by the OAI CORE Knowledgebase. I

also want to thank today's speaker Dr.

Jamie Collins who is the associate

director of the orthopedic and arthritis

center for outcomes research at

Brigham and Women's Hospital and associate

professor of orthopedic surgery at

Harvard Medical School. The title of

today's talk is working with data from

the FNIH osteoarthritis biomarkers

consortium a nested case control study

within the osteoarthritis osteoarthritis

initiative. And following this

presentation we should have a few

minutes for Q&A. So if you have any

questions feel free to type them into

the Q&A box at the bottom of your

screen. And if you can't see it

immediately you might need to click the

three dots uh to see more options. And

so without further ado I'll let uh Dr.

Collins take over.

Great. Thank you so much, Julieann, and

thanks to Jeff for inviting me to

speak. And we're going to talk today

about um working with data from the

FNIH OA Biomarkers Consortium.

I wanted to start just by acknowledging

the funding um scientific and financial

support for the FNIH OA biomarkers

consortium um were made possible um by

the organizations that you see on this

slide.

And just as a brief outline, I'll give

an overview of the nested case control

study, including the rationale and study

design. I'll describe what data are

available and how you access that data.

And then just spend a few minutes on

analytic considerations for working with

these data.

And I'd also just like to acknowledge

the OAI OA biomarkers project team um

shown on this slide. These were um all

of the people involved in phase one um

in particular Virginia Krauss and David

Hunter who are the PIs of this project.

And here are some key references

describing the rationale and the study

design. Um in particular that third

reference has a lot of detail on the

study design which we'll go through

briefly today.

So in terms of the rationale for this

study despite the clinical and economic

impact of OA there are currently no

pharmacologic therapies approved by

regulatory agencies to prevent OA or

stop disease progression.

Improvements in clinical trial design

are critically needed to overcome

barriers to the development of disease

modifying treatments

and refinement and improvement of

measures of joint structural change

based on imaging and/or biochemical

biomarkers are needed to identify

individuals with neoa that are likely to

progress radiographically and

symptomatically and to overcome the

limited responsiveness of existing

imaging biomarkers.

So the aims of the um OA biomarkers

consortium phase one study is to

establish the prognostic validity of

several imaging and biochemical

biomarkers for neoa progression.

So we measured um neoa progression um

sort of at a later stage at 36 to 60

months from baseline.

And the study investigated whether

biomarkers measured a baseline were

prognostic of disease progression with

the idea that these could be used for

prognostic enrichment of progressors in

clinical trials.

And it also investigated whether

short-term change in biomarkers measured

over 24 months were predictive of

longerterm progression with the idea

that such biomarkers may be candidates

to assess as potential surrogate

endpoints in trials. So, is there

something that we can measure earlier um

than radiographic progression um that's

still predictive of a clinically

relevant outcome?

So, the study design, it's a nested case

control study within the OAI. And so, by

nested case control study, what we mean

is that all participants were selected

from the larger OAI study.

Um so, it's a case control study. So,

the case control status was based on

disease progression.

So that was measured with um

radiographic progression which was a

loss of medial minimum joint space width

of 7/10 of a millimeter by 24 36 or 48

months versus baseline. And this

threshold was based on the OBI specific

smallest detectable change.

Um cases were also defined by pain

progression which was a persistent

increase versus baseline in total WOMAC

pain score above the minimally

clinically important difference of nine

points at 24 36 48 and 60 months. And to

meet this persistent threshold that

increase had to be maintained at at

least two time points.

So the primary case definition was knees

having both radiographic and pain

progression. And so that's what we sort

of deem our primary progression status.

um in this study

and then the primary control definition

was knees that did not reach criteria

for both end points. So the controls

included knees with X-ray progression

but not pain progression, X-ray only

progressors, knees with pain progression

but not X-ray progression, so pain only

progressors, and knees with neither

X-ray nor pain progression, so a

non-progressor.

Um so again these participants were

selected from the larger OAI cohort. Um

so out of the 4,796

participants in the OAI

excuse me um the first major inclusion

criteria was that um all required serum

and urine specimens X-ray joint space

width and MR imaging and WAC data at

both baseline and 24 months. And so, you

know, just to start, this is quite a

select cohort. Um, participants in order

to be considered to be included in this

nested case control study had to have

complete biomarker data at both baseline

and 24 months.

Um, and then from there, um, eligible

knees had kale grade between 1 and three

at baseline.

And from here, um, these knees were

sorted into four mutually exclusive

outcome groups. So again those four

outcome groups that we described on the

prior slide X-ray and pain progression,

X-ray only progression, pain only

progression and neither X-ray nor pain

progression. Um these were frequency

matched by um BMI and KL strata. And

again um you know further details are

available in that reference at the

bottom. I think the main thing to point

out here is that um this is not a random

selection. If you look for example at

the X-ray and pain progression um on the

far left here, there were 252 knees that

were potentially eligible based on X-ray

and pain progression and 194 of those

were selected. And if you go all the way

to the far right, there were 943 knees

that um were potentially eligible of

non-progressors and 200 of those were

selected. So this um nested case control

study oversamples cases and we'll come

back to that when we talk about um the

analytic implications but that's just

something to keep in mind with the

sampling strategy as that our

progressors or our cases are over

sampled relative to the full OAI.

Um so what data are available? Um

imaging biomarkers were assessed at

baseline 12 and 24 months which includes

MRI semi-quantitative assessment of MRI

with the MOS scores and then

quantitative assessment of cartilage

bone shape subcchondrial bone area and

miniscal volume. And for X-ray we have

subcondrial bone tacular integrity.

Um it's a lot of data and it can get a

little bit confusing. Um these measures

were assessed by several different

groups and each set of biomarkers is in

its own data set. Um so for example the

semiquantitative scores the centrally

performed longitudinal semiquantitative

readings with moes that was performed by

Boston imaging core labs and that's in

one data set.

Um condrometrics did u measurements of

cartilage volume thickness and other

associated measurements including

subcondral bone area and those are in

its own data set. Um, so when you go to

download the data and work with the

data, you just have to keep in mind um

that for each vendor or each group that

did an assessment of these data, each

has its own data set um in the data

download. And so if you're looking for

something, you may have to look across

several data sets to find what you're

looking for. And you can see again just

a description of what's available here.

Measures of bone shape, subcondrial bone

area, cartilage and meniscal volume, um

measurements of subcondrial bone plate

shape and curvature, um etc.

Um biochemical biomarker data are

available again at baseline 12 and 24

months. Um data are available on 11

serum and six urine bio uh biioarkers in

addition to urine creatinine and you can

see the list on the right there.

Um for the biochemical biomarker data I

would really urge people to take a look

at the description. This is uh the

document you can see in this bullet

here. Um there's really important

information on the assay methods

validation and calculation of reference

intervals. So um there's quite a bit of

documentation for this biopecimen data

and I would again urge you to to look

through that description in detail. And

this also can get a little bit confusing

um because the data are available both

for the FNIH cohort of 600

but also for 129 reference control

samples of individuals without OA from

the Johnson County osteoarthritis

project. So again, if you go to download

the biochemical biomarker data, you'll

get data both for the FNIH and for

Johnson County. And again, people can

find that a little bit confusing. So I

would again just urge you to look

through that description first to make

sure you understand all of the data sets

and documentation that are available for

the biochemical biomarkers.

Um, if you've not worked with OAI data

before, it can be a little bit

overwhelming at first. Um, but I think

it's a good thing because there's a lot,

again, a lot of description of exactly

what you're looking at. So you'll see

several files for each data set. Um so

for example here if you were interested

in working in that condrometrics data

the quantitative measures of cartilage

you would start by looking at this

descript.pdf.

Um this is the description of the data

set that includes biomarker assessment

and quantification. So exactly how the

markers were assessed, how were they

measured and it provides references to

publications describing that. Um the

next two are data sets. um it's a SAS

data set and then um I think this is a

transfer file. Um both of these should

be easily read into any sort of standard

statistical software package. Obviously

the SAS data set can be read with SAS,

it can be read with R, with Python STA,

etc. And so there's two versions of the

data set. There's a contents file that

has um just the contents of the data

set. So the variable names and labels,

and that can be really helpful just to

orient yourself to the data. And then

finally, the stats includes descriptive

statistics. And so this is always a nice

check to make sure that you have the

correct data and that it matches what's

in the descriptive statistics file. And

so again, this can get a little bit

overwhelming as I showed a few slides

ago. Um there's a separate sort of piece

of information um for each of those

vendors. So if you wanted to look at

theorphics data, you would see a similar

number of documents for that. So when

you do go to download um these files,

you do get quite a bit of information.

And so this is just to orient you into

what you're seeing there.

um to access the data and documentation

um it's at that NDA NIH website where

you get the fuller osteoarthritis

initiative um data if you click on the

full download um it will bring you to a

screen that shows all of the different

data that you can get and there's um a

separate button for the FNIH project um

and so this if you go to download this

it says biopecimens it actually includes

everything so I included a screenshot of

some of the things that you would

download if you were to click on um that

biopecimen SAS and again it's all of the

data for the entire FNIH including the

clinical data um reference intervals

this is showing the bone shape from etc

so you'll get dozens of files um but

this would be everything you need for

FNIH and again I I realize it says

biopecimens but this is going to give

you everything if you click on

downloading that

um so I think it's important just to

quickly go through the clinical data so

this would be that clinical FNIH SAS

data

These data are one line per participant

and one knee per participant. So again

just going back to that um flowchart,

one knee per participant was selected

for inclusion into the study. Um so this

clinical data set will have 600

observations in it. I think the the most

important variables that you need from

the clinical data set is that case

control status. And so this is in two

places. um it's in this case variable

where it's a numeric variable um with

the um grouping seen here. And then

there's also group type which is just

actually written out in text. And so

that first participant is in group two

which is the joint space loss only

progressor. Um and that last participant

is in group four which is a

non-progressor so did not progress on um

x-ray or pain.

These data can be linked as I said to

the larger OI data set um simply using

the ID variable. So it's the same ID in

both data sets. Um the top is that

clinical data from the FNIH download and

the bottom is the all clinical data from

the larger OI. So for example, if you

wanted to get coorbidities and WAC

function and merge that in with the FNIH

data, you could do that by ID. So it's

really pretty straightforward.

The one thing to keep in mind again is

that um again one need per participant

was selected for the FNIH case control

study. And so if you are getting data

from um the all clinical data set for

example you have to make sure that

you're pulling the right variable. So

that first participant their right knee

is included you want to make sure that

you're getting the right knee score for

WAC disability and the second

participant the left knee is included in

the FNIH study. And so you want to make

sure you're pulling the data for the

left knee.

Um, and I'm sure um, if you watched

Jeff's webinar last month um, he

probably went through all of this, but

the way that the side is included in the

OI data set depends on which data set

you're looking at, whether you're

looking at the clinical data, whether

you're looking at imaging data. So just

a note of caution um, to to pay special

attention to the side that's included.

Again, that will be in that clinical

FNIH data set um, to make sure that

you're pulling the right information.

Um so I wanted to just spend a few

minutes talking about analytic

opportunities with this data set. Um so

the first obviously is modeling that

case control outcome. Um and so we have

again our baseline biomarkers and then

short-term change measured over baseline

to 24 months where we'd be predicting

that case control status. Um and again

this is the most straightforward way to

use the data to model the case control

outcome. So again to examine

associations between baseline and/or

short-term change um with joint space

narrowing and pain case control status.

I think you know a word of caution here

for the FNIH data but really for any OI

analysis is that um many people have

used these data um and there have been

lots of different analyses both using

the available FNIH bio biomarker data I

mean also by groups who've reanalyzed

the images and quantified new

biomarkers. So I think you know as with

any OI analysis it's a really good idea

to do a thorough literature review

before you undertake any new analysis.

Um so what are some opportunities that I

see with this data that maybe have been

a little bit underststudied? I think

mediation analysis would be a really

interesting way to use these data. And

this is an example of a paper that was

recently published investigating whether

cineitis mediates the association

between bone marrow lesions and knee

pain using data from the FNIH OA

biomarkers consortium. Um and so the

question with mediation analysis is that

we may see an um an association between

an exposure so in this case BML size

score an outcome which in this case was

wne pain. And the question that we ask

is whether there's um a mediator

um through which um some of that

association is actually mediated. So in

this case um the investigators asked

whether some of the association between

BMLI score and WAC knee pain is actually

mediated through cineitis score. And I

think the FNIH offers a really unique

opportunity to do some of these analyses

um because we have longitudinal

biomarker data at baseline 12 and 24

months. So there's really an I think a

really interesting opportunity um for

longitudinal mediation analysis.

Um I think there's also um an an

opportunity to apply novel methods to

assess composite biomarkers um using

machine learning or otherwise. Um and so

we have all of these data. We've done

logistic regression. We've done um

random forests. But are there novel ways

that we can try to investigate composite

biomarkers? And I've included a couple

um papers um on the slide here that I

think would sort of offer interesting

approaches. Um I'm really interested in

in generalized additive models, which is

a flexible approach that allows for

nonlinear associations between

predictors and outcomes. And so I think

there's still a lot here in terms of um

trying to investigate combinations of

biomarkers that may um predict the case

control outcome.

So just in terms of analytic

considerations um the question I hear a

lot is can I undertake an analysis other

than predicting case control status. So

what if for example we wanted to look at

a different outcome we wanted to predict

total knee replacement or functional

decline instead of pain or if we wanted

to do maybe a cluster analysis for

phenotyping. Can we kind of reuse these

data for another question other than

that case control um outcome? And the

answer is yes, but with an asterisk. Um,

so you can, and I think it is totally

reasonable to do secondary analyses of a

nested case control study using a

different outcome or answering a

different question. Um, you just really

have to keep in mind that this is a

nested case control study and not a

random sample from the OI. And as I

mentioned at the beginning, um, cases

were oversampled. And so um when

undertaking a secondary analysis we

really have to be mindful of the

selection bias and sort of um what

population this generalizes to.

Um and there are um statistical methods

for this. So um you know any estimate of

incidence or prevalence is going to be

subject to selection bias. The

recommendation is to use inclusion

probability weighting to account for the

outcome dependent sampling. So you

essentially rewe your estimates by the

chance that that knee got selected um

for inclusion into the um case control

study. And then another thing to do

would be to stratify the analysis by

case status as a sort of sanity check.

So let's say for example you were

interested in whether baseline imaging

biomarkers were associated with

functional decline. Um you might ask

whether the association between that

baseline biomarker and functional

decline is the same for cases as for

controls. And if it's not, then you

really do have to start to think more

about that selection bias and what does

that mean for the generalizability of

your findings. And again, um there are

statistical methods to deal with this.

I've included a paper here that includes

some nice methods for doing this

inclusion probability waiting. Um I

think the the analysis itself is

relatively straightforward, but actually

calculating those weights um is not

something that's available in the

publicly available data. And so that's

something that you would have to work

through.

Um, and I did just want to provide a

little bit more information on the

FNIHOA biomarkers consortium.

Um, so on the FNIH website, you can get

an overview of phase one of the

biomarkers consortium project. Um, I'll

also point you to this um, slide deck

that's on the ORS website. Way back at

the beginning of the project in 2015, um

the investigators gave um one of the

pre-ongress workshops at the ORS meeting

on the project and those slides are

available on the ORS website.

Um finally um I would like to just um

mention that this nested case control

study was phase one of the OA biomarkers

project. In phase two, which was the

progress OA study, we attempted to

externally validate the highest

performing markers um from phase one um

in data from the placebo arms of

clinical trials. So the progress OA

study is not part of the OAI. It includes

data from the placebo arms of clinical

trials and um again we attempted to

externally validate the highest

performing markers. We are working on

making those data publicly available and

um working with Jeff to make sure that

once we do make those data publicly

available, we can get the word out and

let everybody access that. Um but in the

meantime, I've highlighted a publication

and um the FNIH description of the

progress that we study. And then it's um

sort of a full circle moment today and

talking about phase one of the

biomarkers consortium. Um we're

recording a podcast this afternoon with

David Hunter for his joint action

podcast

that will describe the third phase of

the biomarkers project in which um we're

attempting to get data from both the

placebo and treatment arms of clinical

trials to investigate some of these

biomarkers. So be on the lookout for the

announcement about the third phase of

the trial and again that will go up in

the coming weeks on David Hunter's Joint

Action podcast.

Um and so I wanted to end with the

announcement for um the next speaker in

this series, Dr. Michelle Yao, who's an

assistant scientist at Hebrew Senior

Life, an assistant professor of medicine

at Harvard Medical School, is going to

speak um next month about genetic data

resources in the OAI, a primer. Um for

those of you who don't know Michelle,

she's really one of the preeminent

researchers in OA and genetics. Um and I

think this is going to be a really

fantastic seminar.

And so with that, I'd like to say thank

you and I'm um happy to take any

questions.

Well, thank you so much Dr. Collins for

that excellent presentation. Uh so we'll

move on to Q&A. Uh anyone has any

questions, feel free to uh type them in

the Q&A box. Um I have a question. Uh,

can the FNIH MRI based data be merged

with other projects that measure the

MRIs?

Yes, that's a great point. Um, and

something I probably should have put on

the slides. So, um, there's several

other studies um, that have done um, MRI

assessments and OAI. Um, the one that

comes to mind is POMA, which I can't

deabbreviate at the moment, but I think

that was looking at um people without OA

at baseline and predicting long-term

total knee replacement. Um, and so those

data can certainly be merged together. I

think there is some overlap across the

studies though. So if you see um, you

know, semi-quantitative imaging data

from both POM and FNIH, there is overlap

and who was included in those. And so

you want to make sure that you're

checking um to make sure you're not

including somebody twice, but those data

can all be merged. It's the same study

ID for all of the data sets.

not seeing any other questions but at

the moment uh so I want to say thank you

again so much for uh coming and giving

our talk today and also want to thank

you all for uh joining and coming

joining and attending uh this live

webinar. Uh as Dr. Collins mentioned

next month we have another webinar with

Michelle Yao. I sent a link in the chat

to register. So definitely uh help you

can uh make that if you're available.

And as I also mentioned earlier, we have

a post webinar survey uh that should

populate after the end of the meeting

and we'd really appreciate if you take

the time to do that. So without uh

further ado, thank you so much and we

all hope you all have a great rest of

your day.

watch on YouTube