Our mission is to increase endometriosis awareness, fund landmark research, provide advocacy and support for patients, and educate the public and medical community.
Founders: Padma Lakshmi, Tamer Seckin, MD
Donate Now

Maximizing AI success for endometriosis scientific discovery and patient care - Stacey Missmer

Maximizing AI success for endometriosis scientific discovery and patient care - Stacey Missmer
International Medical Conference

Endometriosis 2024:
Elevating Sampson’s Century Legacy via
Deep Dive with AI

For the benefit of Endometriosis Foundation of America (EndoFound)

May 2-3, 2024 - JAY CENTER (Paris Room) - NYC

Getting the slides to come up. We've had this beautiful start of how ai, which is this catchall term for many different technologies, many different approaches, philosophies, theoretical approaches. So I'm going to talk just a little bit about the influence of AI and really how we think of that from the big data and patient level data standpoint of great opportunities, but also challenges that we face. And reminding that through this really comprehensive program the next two days, there's many different aspects and experts including Noelle. Hadad will be speaking and she is a very senior statistical expert and machine learning and uses a lot of the data, the types of data that I'll be talking about at a really much higher level of analytics. And so there's many, many fields coming together for what we're doing. And from the patient level we start with and know that there's hugely varied presentation of patients.

Of course there's variation in the symptoms, not just presence and absence, but also severity, duration, time of onset response to treatments. All of those are nuanced levels of data that the more information that we have allows us into these more advanced analytic approaches to comprehensively and agnostically. Consider how they are clustered, how they are varying across time, and how that then can influence what we understand better about pathophysiology interventions and long-term outcomes. We also know, of course, that there are these varied lesion specific presentations, and then there are whole body health and life course health issues of what are comorbidities, what other inflammatory and angiogenic and rheumatologic and other aspects are involved. And so really the key is to take these multi-layers of very deeply phenotyped data static and dynamic information. One of the things we really need to get better at incorporating is that while the genome is static epigenetics and the transcriptome, the proteome metabolome are changing over time and what is influencing that change and how patients respond and why some have changes and some have more constant levels of these pathways involved is critical to know then also how those are related to these presenting clinically evident phenotypes and how they're influencing this long-term milieu.

And so we are already seeing quite a bit of evidence from the data that we have so far about informing these heterogeneous clusters and pathways. So looking at the genetic data led by nlu, we do see now 42 loci for endometriosis overall. But as we start to delve into how that might vary by patient-specific presentation, we see actually that 38 of these 42 loci are primarily associated with stage three four presenting endometriosis, not with stage one, two. And in fact, when we look a little more deeply we into that reality, it's actually the patients with endometrioma that have these strong associations with these loci. It's not strongly associated so far with this discovery for patients who are not presenting with endometrioma. And I'm going to come back to some future work about that in a moment. We also are seeing some very interesting interface across biologic markers. And

Let me interrupt Stacy for one second while I tell her something,

Just an extra hour to talk.

Okay, thank you. All right. I don't know if I can go that slow, but we'll go slower. That's okay. We'll fix. So we have a little extra time, so I will slow down a little. So when we're looking at pathophysiologic pathways and trying to determine biomarkers by proteins, our team led by nail comodo has been very interested in looking at variation by some of these sub phenotypic patient presentations. And so looking in a group of adolescents with endometriosis, these are some pathways identified by proteins. Now I'm not going to go into the detail of each of those different protein driven and defined pathways. Each of the rows is a different biologic pathway as determined by the proteins discovered. But each of well and the circles represent up and down regulation of protein expression. Okay, so red is upregulation, blue is downregulation, and what I do want you to pay attention to is not the individual pathways but these columns.

So the columns are actually defined by the appearance of the peritoneal lesions in these patients. And so red, yellow, hold on, I have to see better on my screen. White, blue, black and brown are the column. So red, yellow, white, blue, back and brown. And as you can see, we are already seeing differential protein expression based on this very macro and to most of us seemingly unin informative or less informative information about these patients, completely different pathways. Now when we look in these same patients as these proteins, but now we're going to look where the column is now symptoms. And so once again, red is upregulation, blue is down regulation, the rows are protein pathways, but now the columns are the pain presenting symptoms for these adolescents. And so dysmenorrhea as cyclic pain, the first two columns on the left, the column all the way to the right is bowel pain.

Second from the right is bladder pain. And then the two center are whether the patient is experiencing life impacting, life altering persistent pain. And so that third column is presence or absence of life impacting symptoms. But that fourth column is whether a year after onset of treatment, they're still experiencing that same level of life impacting pain. And so the take home message here is that as we start to add these very rudimentary initial layers, as we start to look at these biologic pathways, we are seeing differentiation by both the peritoneal lesions themselves but then also the symptoms and response to treatment in these patients. And we also see differences. This now is a little more simplistic. There's less pathways that jumped out here. This is now whether the adolescents have localized pelvic pain or whether they have widespread pain, whether they also have for example migraines, whether they're also experiencing joint pain and other manifestations of pain more distal from the pelvis.

And so the take home message here again is these are very early days of applying these more deeply phenotyped methods, but that we are starting to see early hints of real heterogeneity that hopefully then can be perturbed and explored both for potentially underlying informative different physiologic pathways, but also starting to identify which patients we can now predict better will respond to certain treatments that are currently available, but also highlight new pathways for novel treatments. Another area that's very important is this concept of how patients do over time. And so now Naoko and also Amy chare for our team has led analyses around two years of follow-up in these adolescents, all of them have had surgically confirmed, surgically treated excised endometriosis and are now one to two years in their post-surgical journey. And so when we look at their, this is now looking at non menstrual acyclic pain.

This is called a sanky diagram. I always want it to be a snakey diagram, but it's a sanky diagram. And just to orient you, this is baseline year one and year two we see that in pink. These are the patients who at baseline were experiencing severe reporting on a numeric rating scale of seven and greater. The moderate are in yellow or yellow lines. They had a numeric rating scale of four to six, and then the blue, which understandably is the smallest group, they had reported from one to three for their as cyclic pelvic pain. And so when we look over time, we're now seeing where those same patients were at year one and were at year two and a few hallmarks that again, we really want to zero in on what possibly can predict and what can be evident to the clinician, but also could be evident to discovery scientists around treatment development and expectations.

We see this group who remain at severe the entire time really with the surgical intervention and the hormonal interventions. We're not seeing any change in these patients. So we want to be able to predict what is it about that group that could better be intervened upon what it is about that group that perhaps their pain experience is independent and not tied to endometriosis or their gynecologic care, and how can we then develop new treatments but also ensure that they're having the correct comprehensive care interventions. We're also equally interested in this group. So this is the group who improve. By year one, they're at lower pain and that group has become much larger thankfully, but they're staying at that below a numeric rating scale of three. They've now achieved that and are staying there longer term. We now have data at three and four years, and we still do see these distinct groups.

There are still patients who even many years out, even with repeat surgeries, are staying in that very high severity group and there are the patients who improve and then stay improved. And so discerning what those groups are is absolutely important part of precision personalized medicine. And so the key thing for personalized medicine is yes, stratifying patients, but also meeting patients at a personal level of what their current symptoms are, what their current goals are, what the greatest life impacting circumstances are. And so in our adolescence, the focus is solely on their pain and pain management. But as they age, their focus may shift to conception and reproductive health. Their focus may change as they accumulate or improve on different aspects of symptomatology. And so this level of data to look at and having it to look at with the precision rigor and volume that's necessary for experts such as Dr.

El Hadad to look at with these more advanced modeling techniques is critical. That's where we hit the key challenges. So the data that I've presented to you is from a cohort of 1500 patients that we've been following for many years. It's a very time intensive and data collection infrastructure intensive endeavored, but 1500 patients despite the ability to start to discern some of these details is still quite too small for much of the discovery that we want to do at this level of interaction and nuance. And so here is the complexity. These are some data from, we pulled medical records and electronic and both paper form and electronic across many years for patients across the United States. And one of the realities, which is often a surprise at meetings such as this where you are all extremely engaged, thorough, detailed, endometriosis focused clinicians, you are documenting with quite high rigor and detail.

However, for the typical patient, for the majority of patients, this actually isn't happening. And so we see that there's actually a pathology report available for less than half of patients with surgically diagnosed endometriosis. We know for these patients, for certain they've had a surgery, the surgery has determined that they have endometriosis, they've been treated as such. But when we go to the medical record and try to abstract some of these phenotypic details that we wish to have, it just does not exist. There's no record at all for those that have a record and including the surgical report and the pathology information, we see that stage, and this is across three different waves of different groups of patients across the United States. We see that only a minimum have any comment whatsoever on endometriosis staging. So no determination whatsoever, not a score, not a statement, nothing at all.

And so again, that information cannot be radically abstracted from the electronic medical record. Now, when we want to even step back to just the macro level of presence or absence of an endometrioma presence or absence of deep endometriosis, now the data are even more thin. There's actually less than 10% mention an endometrial. And I am not saying we're diagnosed with an endometrial, I'm saying abstractable information to even determine if there had been yes or no in endometrioma. And for deep endometriosis, it's nope, I flipped them. For deep endometriosis, it's less than 10%. For endometrioma, it's between 10 and 30% depending on the records. And so again, when we're thinking of this high level data that we're trying to abstract, we're just not influencing the field enough to ensure that even this minimum data is being routinely documented. And again, I think that one of the things that we have to pay close attention to is that the types of clinicians who are paying attention to our literature, the types of clinicians who are coming to meetings such as this and having discourse are documenting more thoroughly.

So recognizing what for these large AI intentional abstractions and data are the influence is important for us to recognize and try to influence through our national obstetric and gynecology commissions, but also thinking of our local hospital systems, what is being documented? This actually gets worse, which won't surprise anyone here for adenomyosis. So using the Optum Health database, I had a very small contribution to this work, but pulling data from millions of records using natural language processing for adenomyosis, they found that about the rate for 18 to 55 year olds for documented adenomyosis was 1.7 per 1000 women. This was using the general prevalence in using the ICT codes and then adding natural language processing. A very important thing though is that when we pull in all of the information from the natural language processing, we actually see almost a threefold increase at 30% higher that now has adenomyosis evidence.

So for most of AI work in large claims and national and international medical systems databases, they're using just the ICD code, but we all know that the ICD code for adenomyosis is not accurately and adequately documented. Now, when you add this layer of natural language processing, which again needs advanced machine learning skills and collaborative processes, we now see this increase from 1.7 to two 19 per 1000 women documented. And if we look in 36 to 55 year olds, we again see about a 10 times more adenomyosis documented through the natural language processing than just by ICD codes. And we're even seeing this in the younger patients. And so again, we are missing 90% of the patients. I want to step back though and remind again what this room doesn't need to be told, but what the general GYN and health discovery world needs to know is that this is only the patients who successfully had a clinician who wrote something about adenomyosis in their record.

We know that we're missing many, many other patients who are never evaluated, never documented. And so this is truly just the tip of the proverbial iceberg in terms of what we're missing for who gets information documented, who gets included as a case and who doesn't. I also want to make a plug to this room. As everyone here knows, we have a large problem with the ICD coating itself. It's highly inaccurate for endometriosis. We're now six years, five years passed when a multi-Professional Society team led by Andrew Hor and Lucy Whitaker pulled together recommendations for change to the ICD code for ICD 11. ACOG also submitted some recommended changes that Ted Lee's team had suggested. These have not been yet taken up adequately. One important thing is not only this additional effective for both clinical care, but also discovery for the nuances of endometriosis, but really treating adenomyosis adequately as a separate code and not conflating it as the N 80.0 endometriosis of the uterus, which again, as we know is not only inaccurate, but that inaccuracy means that it's really haphazardly and inconsistently used as a code by clinicians and surgeons.

Another important thing that we need to pay attention to is that we have to be a driving force behind the reality that for most of the readily available large scale tissue and biomarker level databases encode roadmap, GTE, they don't include information on female reproductive system tissues and female reproductive system biomarkers. And so if you are a junior scientist or junior clinician in other fields, you can readily test hypotheses utilizing these large publicly available data spaces. It draws people to the field. It draws people to create curious and novel hypotheses. It keeps people interested in those areas. And when our junior excited, enthusiastic, intelligent, creative people go to these databases because they have a question about the female reproductive system in general, but certainly the endometrium and endometriosis or adenomyosis specific, those data are largely absent. And when they're present, I think GTeX now has a sample of eight.

It might be 18 total from endometrium. There are some data sets that have some ovarian tissue drawn in by interest around the ovarian cancer discovery. But we are very far behind in making sure that we are representing these databases. And again, the critical problem is that these are quite easily accessed data sets that are generating information and discovery in other fields, and we are missing that advancement. So as many of the things that we've been doing to tackle this area a little bit is on an individual investigator level. We have the WF effect, the endometriosis genome and biobanking harmonization project that now a decade ago recommended a surgical form for standardized documentation and more thorough documentation for the endometriosis surgical patient. It also has a self-report, clinical questionnaire that brings in a lot of detailed information about symptoms and life course exposures that are important in these AI models.

And then also recommended standardized tools for fluids and tissue collection storage and documentation. There now are 58 sites, I think actually as of a couple of days ago, we're now up to 60 sites across the globe who are utilizing these tools, mostly utilizing them in their own local work. But this has also advanced quite a few multi-site international collaborative projects. And I'm very proud of this contribution. It's absolutely produced some excellent and evolving discovery, but the numbers are still small because it's driven by individual clinicians and research teams. And so for example, coming back to that, the genetic discoveries where we have those 42 loci for endometriosis overall, we've been able to delve down into the endometrioma realities of driving those relationships. But we see that for the sub phenotype analyses, although we have over 60,000 endometriosis cases overall, we start to get to very small numbers, less than 6,000 per these with data adequate to include in these phenotyping analyses.

And so we're currently doing the next phase of these meta analytics. This pulls in even more data. The wonderful thing is that in great parts, because of Endo found as there has been more public knowledge and awareness and attention to endometriosis, there is genuinely a large rise in the volume of data available internationally. And so we have now data sets with both genomic and endometriosis data that's now going to exceed a hundred thousand cases across the globe. And we've more than doubled the number of cases with adequate sub phenotypic data, many because of these additional data from effect sites. But even doubling, we're now reaching for some of these sub phenotypes, about 10,000 human beings with genetic and phenotypic data, which for genomics discovery is actually still quite small. And so we have more work to do there. Some of you may be aware that just last month we have some additional effect tools that we've been working on.

Paul Young has led a physical exam documentation tool that again is hoping to impact what data are being collected and documented on a large scale level. This was just published last month. I think it's still in that pre-print mode, not the full final proof, but it's available online. Matthew Leonardi and George Condos are leaving an imaging tool, which again, these tools are not meant to influence clinical care or action. They are intended to be harmonizing documentation tools. And so that we all have data that is equally rigorous but easily harmonized to contribute to these larger modeling approaches. We also have new experimental models, tools that are attempting to bring together what standard protocols are being used across the globe, but also again, that documentation and reporting so that one scientist knows when they're reading an article how they're modeling an approach is similar to or may differ from another scientist at a level that we can make more thoughtful comparisons.

Katie Burns is leading one of these tools. And Kay Bruner Tran and Aaron Greaves are the overall coordinators of this initiative. And so again, trying to bring together communication, standardization, harmonization to increase sample sizes for these AI approaches that are emerging and evolving, but then also to really improve collaboration. And so, although I don't think I've stretched enough for what Dan was hoping, my summary is that we are at an incredibly optimistic and successful point where there are many tools available in these many aspects of what we lump together as AI that have been successfully applied to other areas of health that are readily available to immediately apply to endometriosis and adenomyosis. And that's occurring more. It needs to occur at an even faster pace. This is also where funding and attention come to the fore, but we all have to be continuing to push.

That's specific to reproductive systems and reproductive health that our data and information and the tissue details that are specific to the women's female reproductive system are actually being included thoughtfully and at a large volume. We have to demand that inclusion in these data sets, but also even at levels for things like the National Institute of Health, for example, has a common data elements core, and they have thousands of recommended ways of collecting different data questions. So many of them are at the individual like how to ask about knee pain, how to ask about hrsm, those very specific questions. They are despite having, again, tens of thousands of common data elements, endometriosis associated phenotypes, endometriosis specific details are completely absent. And even menstrual characteristics are absent across most of them. The Heal initiative, which is the NIH high level pain mechanisms focused discovery, they have what they call the pain detect form and the promise form. And so those have some pain collection elements. It again, largely ignores pelvic pain, specifically dysmenorrhea, acyclic, pain, dyspareunia. And so once again, we all have to be using our forces and voices and influence to make sure that female specific variables and evidence aren't absent. This is one of the key things that, again, I'll go off on a bit of a tangent for time, for a moment anyway,

The time I gave you, it's taken away on the next.

Excellent. Alright, so we found this with the Covid vaccine trials. So many women ended up reporting that they felt they were experiencing menstrual changes after having the vaccine, but yet none of the trials asked about menstrual characteristics. And so ensuring that we are at the forefront in this space is absolutely critical. And that was my last statement. And so I thank you. And I'm just going to finish. We'll have the 16th annual Endo found medical conference next spring, and I'm hoping to see you all following immediately after that at the next world Congress of endometriosis, which Jason Abbott and Gita Misra are leading. And we'll have a really nice interface of a surgical focus lens and a bench and population science lens. Thank you.