Live Webinar with Dr. Michelle Yau
Genetic Data Resources in the OAI: A Primer
April 13, 2026
Welcome everyone. Thank you all so much for attending our oi.edu live webinar series hosted by the OI core knowledge base. I want to thank today's speaker
Dr. Michelle Yau who is an assistant scientist at the Marcus Institute for Aging Research at Hub Senior Life and an assistant professor of medicine at
Harvard Medical School. Title of today's talk is genetic data resources in the OEI and primer. Following the
presentation, we should have a few minutes for Q&A. So, if you have any questions, feel free to type them into the chat and we'll try to get to them at the end. And now, I'll let Dr. Yau take over. Great. Thank you.
All right. So, today I'm going to go over some of the genetic data resources in the OAI.
uh if I can get this to advance. Um so the data that we have available it's um
whole genome genotyping uh which was originally obtained back in 2009-2010 um and they were obtained on most of the OAI participants using the aluminina omniquad 2.5 million array. This was part of an ancillary study um called the genetic components of knee osteoarthritis the GeCKO study um which
was led by Dr. Rebecca Jackson um and uh it included um over 4,000 individuals
uh most of which um are European ancestry. uh we do have almost a thousand um African ancestry individuals
who are um genotyped um and they were genotyped as part of um at at the
translational genomics research institute.
So with this data uh we had a number of quality control metrics that we used. Uh first were the sample exclusions. So we excluded individuals who had call rates less than 98% and I have the number of
um European ancestry and African ancestry individuals who met those criteria in the parentheses. Uh and we
also look for mismatches between uh reported and genetically determined gender as well as looking for second
degree or higher relationships among samples um to ensure that um we had independent samples. Um all though nowadays you might just calculate a genetic relationship matrix and use that instead. Uh but at the time we had just excluded um individuals who had um second-degree or higher relationships with other samples um in the cohort. We also excluded individuals uh with large chromosomal abnormalities uh and there were a handful that met those criteria. Additionally, um you'll note that what we've provided are these self-reported uh races. And if you look at the genetic data, there are a number of people who have a self-reported race that is different from their genetically determined race. So, that is something that you would want to look into. Um, if you're doing um genetic analyses in OAI, uh, and depending on your question, you may or may not want to use the self-reported race. Uh, typically for the genome wide association studies, we've often corrected this and um, use their genetically determined race when splitting it by European versus African ancestry analyses. We also have uh SNP variant filters. Um and so we removed those SNPs that were uh duplicated. Uh we also um removed SNPs that had call rates that were less than 95%. And we did have um a number of planned duplicate samples so that we could look at discordanted genotypes and where there were two or more discorded genotypes uh we excluded those SNPs. Um and then it's up to you whether you want to imputee these data or not. uh we've oftentimes imputed the data um to the haplotype reference consortium uh and we also have imputed to the TOPMed uh imputation panel um but it sort of depends on um you which panel you want to use um and uh which um which uh reference you want to use. Uh so we've we've used these data um initially um in this publication back in uh I think it was 2013 uh where we looked at BMD associated SNPs with knee osteoarthritis and so we had taken our genotyping data in OAI and we wanted to see whether the current uh genome wide association study for BMD and at the time I think this was from the GFOS consortium uh and we wanted to see whether those BMD snips were associated with OA. Uh we found a number of SNPs that were associated with OA and these were um snips that were associated with high BMD increasing the risk of OA. Um but when we put these SNPs into a polygenic risk score, we found that it was not associated with OA risk. So it was just specific loci that we found that were associated with OA. We then um did a genomewide association study with structural knee osteoarthritis um including a number of other cohorts um including the Johnston County osteoarthritis initiative the MOST multic-enter osteoarthritis study and the GO um cohort genetics of oa cohort um genetics of generalized OA cohort um and so here what we identified was one novel locus and then two um that were already known. So this was the GDF5 locus and FTO. Um and so we at the time we had decided to split it
um by European ancestry versus African ancestry. So we published two separate GWAS for that. Um and we had another one looking at genetic determinance of knee osteoarthritis in African-Americans.
uh and we found one novel locus and um interestingly didn't find um any of the European ancestry variants uh that were associated with neoa in African ancestry individuals and most of this was just driven by um a lot of allele frequency differences between the two populations and we also had quite a small sample size for this um so it made it quite difficult to to find um genomewide significant loci Okay. In this effort, um, but these data have been deposited into dbGaP. Uh, and you'll see that you can request access for these data. Uh, and there's a description here about the data. Um, and note that, uh, this is controlled access only. Um, and so you
have to have, um, meet certain requirements in order to receive the data. Um so um the data access committee
reviews your application. You have to provide uh the purpose of your research,
why you're using the data. You have to demonstrate that you have IRB approval to use these data. Um and also agree to the dbGaP data use certification. And I'll go over that a little bit more um in the next few slides.
Uh one thing to note here is with the phenotypes in dbGaP um we only have the
age, gender and race. So you will have to go to the OAI knowledgebase uh to pull the phenotypes but then you can link uh these genetic data to the phenotypes in OAI.
And so a note about this um uh data access uh piece of it. Um so you do have
to sign a data use certification agreement and starting in 2025 they uh made it a little bit more stringent in
terms of um the uh amount of security you need in order to store this data. So um any server that you plan to put these data on um have to meet the NIST SP 800-171 criteria. And so this is essentially a federal guideline uh that gives you the requirements that you need to protect unclassified controlled unclassified information within a non-federal system or organization. Um, and it's just to ensure participant safety and uh, privacy. Um, and it's a
it's a pretty stringent criteria. I think a lot of um, uh, institutions have had some trouble meeting these criteria.
So, do check with your IT to make sure that your server uh, meets this this criteria. Um, and also note the data cannot be put on your personal computer. um it has to go to a secure um location.
Okay. Um so now fast forward um a few more years um we uh contributed OAI data
uh to the largest effort to date looking at genetics of OA um as part of the genetics of OA consortium. Um and so
this effort identified almost a thousand loci now um associated with OA of which about half of them um were novel and so what's really really exciting about this effort is that even though we were a teeny tiny piece of it I think because of the depth of phenotyping that is available in the OAI it gives you the opportunity to be able to uh look at these variants identified uh in this large GWAS was in more detail.
Um and so uh again it's you can try to uh refine OA phenotypes using um
the uh GO consortium GWAS was um data and so one thing we've done is uh look to
see whether those uh variants identified in the GO consortium GWAS um how they're associated with structural
versus pain phenotypes. uh we found that uh there are much stronger um associations with pain phenotypes than
with structural phenotypes. Um additionally OAI provides opportunity to look at a number of imaging um
phenotypes that are not available in um other cohorts or biobanks. uh and so there you can take the uh go consortium GWAS variants and look to see how those are associated with novel MRI phenotypes for example or any other imaging phenotypes. I think a number of people are doing uh machine learning on some of these images and there could be some novel phenotypes derived from those efforts. Um I think um there was also an effort to look at genetics underlying um bone knee bone shape um as a novel phenotype. Um and so I think there are a number of opportunities to really dive into uh the go consortium GWAS um to really refine uh those phenotypes that are associated with the variants that are identified.
Um the other novelty to OAI is that you have longitudinal data. Um so this gives
the opportunity to look at associations genetic associations with longitudinal change in OA outcomes. Um and in fact we we did this many many years ago uh and we had looked at joint space narrowing progression, osteophyte progression, KL progression and progression to joint replacement. And what you'll see in these Manhattan plots is with the OAI data alone, you you can't really find anything that's genome wide significant.
There's a number of um peaks that look um suggestive or or uh look promising.
Uh but it's really difficult to just do a GWAS was in the OAI and be able to identify anything in it. Um so I think
really um the novelty of the OAI genetics data is to to really refine those um those genetic variants that
have been identified through these you know huge huge efforts uh and be able to dial in on um those variants that could be associated with progression. Um another place where you can use these genetic data are in polygenic risk score associations. Um we've used this to look at um OA risk stratification using the go consortium data you can come up with a number of polygenic risk scores. You can uh look at polygenic risk scores um for any oa hand-oa knee-oa um any phenotype that they've looked at in the GO consortium GWAS you can derive a
polygenic risk score from that um and uh apply it to oi um and look at how those
polygenic risk scores um perform um predicting OA outcomes in the OAI.
Additionally, we've used the uh polygenetic risk score uh approach to look at relationships with other traits
through genetic risk. So, um other traits have large GWAS as well. Um and
so we've been able to derive a lipids genetic risk score for example an Alzheimer's disease genetic risk score
um to look at how those genetic risk scores um associate with OA um into OAI.
Um so I think this is a place where um it's you could really really um sort of
harness um what is available in OAI uh with the genetics data and really
combine what's known um about genetic risk for OA and other traits uh in order
to sort of refine and and dive into uh the deep phenotyping available in OA and see what's associated there.
Um, and of course for any uh genetic analyses, you're always looking uh for validation. Um, so it's hard to just um
do an analysis in one cohort and get that published. Um and so there are a number of cohorts uh that have similar data, the genetics data and the OA phenotypes available and those cohorts
are um in the Framingham Heart Study, uh Johnston County osteoarthritis study and
also the multicenter osteoarthritis study. Um, and so I just name a few, but I think there are a few more, but um, if
you're looking at um, North American based uh, OA cohorts, these would be the primary three, I think.
And finally, um, I think there are opportunities to, uh, integrate these genetic data with other multi 'omics data.
Um so there is the musculoskeletal knowledge portal where we're working on um adding a number of functional
genetics data um into this portal as well as uh GWAS data from um all musculoskeletal phenotypes. Um and what you can do is if you do a gwaz of your favorite trait in OAI you can upload those data to the musculoskeletal knowledge portal. Um and within it there's a number of um gene expression data, single cell data um and you can integrate uh those findings from your GWAS was uh with single cell data uh what's known about uh gene expression data u epigenetics data u and I think there's a tool within the musculoskeletal knowledge portal for you
to be able to integrate those data um or if you want more simply just to look something up. Um, you can do that as well. Um, and it and this portal actually sits within um, an umbrella of a whole bunch of other portals. Um, so there's also a metabolomics portal. Um, and so there are ways to be able to link what you're finding in OAI uh, to what's happening across other systems um, and across other genetic um, functional data.
And I think those are just a few of the ideas that come to mind in terms of um using the OI genetics data, but there are a number of other possibilities. Um and I think a question is are there other 'omics data that are available in the OAI? And I think there are um but I have not seen them on a publicly
uh released platform. Uh and so if uh you know of any um ancillary study data
that would like to make those data publicly available, please let me know.
Um, I think this would be a a huge benefit to be able to integrate um the
genetics data with other 'omics data that people have collected. Um, and so I think there's more to come in in terms of um uh integrating um or collecting more 'omics data uh within the OAI.
And I think also looking forward what we want to do is be able to aggregate more 'omics data in the OAI. So collect more 'omics data um and also be able to combine these data with other longitudinal cohorts um in a mega analysis. Um so just figure out a way to just combine everyone's efforts um into one uh to be
able to yield more uh from the data that everybody is is collecting.
Um and then also um right now the OEI knowledge base and the musculoskeletal
knowledge portal are um separate entities but I think we're thinking about how we can integrate these two
knowledge bases in order to benefit um the musculoskeletal community. Um on the one hand OAI has deep deep
phenotyping for OA that's extremely valuable to the community and then the muscular skeletal knowledge portal has a number of um um genetics data um and functional data uh that's available in
musculoskeletal tissues. Um, we're working on collecting more single cell data on tendon, um, cartilage, bone, um, anything that's relevant, um, to, um, a musculoskeletal trait. And finding a way to integrate these two, I think,
would be ideal to enrich both resources. Um, and so I think in summary, um, there's a number of possibilities for using what's already been generated in the OAI. Uh but I think there are um
there's more to come. Um and so uh if you have any questions, please let me know. Um I will be at ORS. So if you want to connect um uh just find me u and then um if you're having trouble with
second the OAI stuff, let me know. I can try to help you out. Um but anyways, uh thank you for your time. Um and I will end there. Great. Thank you so much, Dr. Yau, for that excellent presentation. Uh, we have a few minutes for Q&A. So, if you haven't done so, feel free to ask a
question in the chat. And I'll get started here. Um, do we know if the people who have GWAS was data are similar to the overall OAI study characteristics?
Um they should be because we have genetic data on most of the OEI participants.
Great. Um another question are OAI genetic data continually updated?
So no they are not. Um this is based on a one-time effort to collect genotyping data on these participants. So it's all based on the uh . million array. Um but what you can do is imputee um these
data to different reference panels. Um so in that sense you you could update the genetic data um but that's just
updating the imputed data but the raw genotyping data itself is not being updated.
Um can the GWAS was data be linked to the publicly available OAI data and if so how? For example, de merge the files by the public OID.
Uh, it should be by the public OAI ID. Okay.
Um, great. Thank you. I don't see any other questions in the chat right now.
Oh, actually, we got we just got another question. Um, someone asked, "Can you say more about the pain analysis?"
Um, was there something specific you were looking for?
Um so we we had looked at a polygenic risk score uh with various um OA outcomes. So
looking at um its association with structure and its association with various pain phenotypes. Um, and we
really found that these polygenic risk scores were very much driven by uh by pain. Uh, and the associations were much
stronger for pain than they were for structure. Um, luckily a lot of the associations, if you looked at the knee away polygenetic risk score, it was
associated with knee away. Um, if you looked at the hand away polygenetic risk score, it was associated with hand away.
Um, but um, and so they intended to be joint specific. Um but if you look at um
just its effect size, I think pain for for some reason um a lot of these polygenic risk scores just had an underlying association with pain.
Um we have a followup to that.
They said it's interesting that the genetic markers are strong versus biochemical markers require more markers to identify pain phenotypes. Where are
your pain results published? Very interesting.
It is published on my desktop and I will I will get this paper out. I promise.
Um, another question does OAI have 'omics data?
Does OAI have 'omics data? Um so I know that there are ancillary studies that
have either um collected um these 'omics data in a subset of OAI individuals. I
don't think it's been done on a large scale in the OAI. Um so I I think there's methylation data that um has
been collected. Um I think Francisco Blanco's group has uh done a done a
number of um uh I think genetic analyses using hletaping and and methylation
data. Um so those are the ones that I know about. Um there might be more I
I've not seen it. Um, additionally, I think um, what would be really helpful is if these data were publicly available
um, for others to use. And um, I've not I've not seen um, those publicly available, but I could be wrong. Um,
I've I've not seen it.
Um, okay. We have one last question before we end. How do you curate knee MRI scores? like worms and is there a way to like to link them to omics data?
Yeah. So, I I would probably punt this to Jeff. Um and so I think what you're talking about u are the phenotypes that are available in the OAI knowledgebase.
So um I think anything that's in the knowledge base should be linkable uh to the genetics data. uh they may not be linkable to the methylation data that others have collected uh but certainly
they should be linkable to the genetics data.
Yes. Thanks Michelle. Any of the MRI data that is publicly available through the um NDA website would be linkable
with the GWAS data. Um it's similar to using other MRI scores. it's important
to think about which read projects they were derived from. But um that's certainly something too that the knowledge base is willing to sit down with you and talk through um the feasibility and which pro read projects can be merged and caveats to thinking through how to do that and what statistical analytic considerations we need to have about doing those merges. um but they can be merged with the publicly available clinical data as well as the G-W was data um and to Michelle's point too there are some nested studies within the osteoarthritis initiative that have done um omit metrics for example um there was a case cohort of people who had plasma metabolomics completed um one of the goals of the knowledge base is to engage more of these investigators to help get some of this data publicly available So hopefully you'll be seeing that in the near future.
Great. Uh thank you. Uh that's all the time we have for today. Um thank you again Dr. Yau for that excellent
presentation and um thank you all for joining us today. And as we wrap up, I want to invite you all to our upcoming
webinar on May 11th with Dr. Charles McCulloch who will be talking about data analysis strategies for uh the OAI. It's going to be another great session. So we hope you all can make it and I sent a link to register in the chat along with other links to our website and the webinar survey. So that concludes today's webinar. Thank you all again and have a great rest of your day.
