Endometriosis Foundation of America
Medical Conference – 2012
Why we need new classification systems
G. David Adamson, MD
Thank you very much. It is really a pleasure to be here, a bit daunting to be the last speaker after so many absolutely outstanding presentations. You have heard a lot of speakers who are taking you into the future about where we can go with endometriosis. In a sense I am going to take everyone back a little bit first. Then we will hopefully look forward to moving into the future and see why this talk, which on its face, could appear to be a little bit mundane, in fact is critically important to all the aspects of understanding endometriosis and finding better approaches to it.
Let’s talk a little bit about why we want to classify endometriosis better. I know you have heard some comments this morning from Stacey Missmer to understand why this is such a problem. In many cases you see a lot of studies in literature and people are not talking a common language. What is severe disease? What is extensive disease? What is middle and mild disease? And in fact, though often when you see minimal, mild, moderate disease the specificity of the diagnosis of whether they are talking about endometriosis lesions, or whether they are talking about fibrotic tissue or deep infiltrative tissue is really not entirely clear. In the absence of this clarity on what people are talking about it is very difficult to standardize comparisons. Are we talking about apples and apples or are we talking about apples and oranges? This is especially important of course in all the research applications that we are looking at. If we are not talking about the same thing we cannot really evaluate outcomes and we cannot make plans for how to move forward in the future.
In actual fact, trying to decide what we are going to talk about is much more difficult than it might initially seem. There are a number of items on this list here that you can see that are very important for determining a classification system. Indeed, as you will see in a minute there has not been a lot of agreement historically over this. Getting a consensus about what should be done is in particular, among surgeons who are often fairly individual types of people, has not been easy. There has not been a lot of evidence that has been applied to it. One of the biggest problems though that we have in endometriosis is that it presents with so many different symptoms and so many different signs that people have not been able to agree on what is the most important; is it pelvic pain that occurs with menses, is it general pelvic pain, is it infertility, is it ovarian masses, is it rectovaginal disease, is it pain that occurs with other conditions? How is it related to chronic pelvic pain? In the absence of definitions there has been a great deal of confusion when people try to talk about the literature and about what works and what does not work. So we get down to this last item here: Recognizes confounding variables. As you know and what you have heard today the confounding variables are all those things that make interpretation of cause and effect very difficult.
I very briefly want to run through the history of endometriosis classification systems because it really demonstrates the complexity of this issue and maybe helps to guide us a bit to the future and of course you have all heard of Sampson who categorized hemorrhagic cysts first of all. But in the next few years Wicks and Larsen used histology. There were other data that were presented regarding anatomic presentation and malignancy. But right here we can see all the different things that were used, criteria that were used to stage disease. Then we got into further histopathology and pain and then different kinds of relationships and then structure involvements, so nobody agreed on the types of elements that should be included in a classification system; physical examinations and surgery findings. This system by Acosta in 1973 was quite popular. It looked at the site and distribution of lesions. Then we had surgical staging based on malignancy type of staging system. Dmowski and Cohen in 1975 had more systems and Kistner had a very popular system in 1977 but it was quite complex and someone had to spend a lot of time after surgery describing all the different types of lesions. B.C. Buttram from Houston revised and expanded on this and Cohen added laparoscopic findings.
The American Society of Reproductive Medicine was really the first group to come out with a reasonably accepted, widely agreed upon system which we can see here. This was done way back in 1979 so we can see this over 30 years ago now. This is the first system that came out. But there were some real issues with this system. It was felt to be flexible. Fortunately there was a form so that people would actually write down what they found and it did put numbers on things although these numbers were all arbitrary. And, again, there were arbitrary cut off points. Arbitrary meaning that experts sat in a room and decided what they thought was important or not and what was the difference between minimum, mild, moderate and severe disease.
Now there were many papers right after the AFS system came out that were published to evaluate the classification staging system and say, “Well did it really work or not?” In 1985 based on a lot of these papers the AFS revised the system slightly and published it again. Really, that is the system that stayed in place since that time. This is what it looks like and this is what many people have used. But of course this system did not particularly apply to pain or not particularly apply to infertility, although it was thought that it would and it could and it should, but in fact in many cases it did not. So there were some major limitations with this. It was very arbitrary. The scoring system was arbitrary. How many points do you get for an ovary, how many did you get for peritoneum and how much do you get for deeply infiltrative disease? Of course the problem was there really was not anything for that. There was 40 points for cul-de-sac obliteration but there really was no attention paid to bowel disease, ureteral disease or rectal vaginal disease. Indeed, when different people looked in a pelvis and tried to score or stage disease they got very different types of results.
The type of lesion, whether it was red or white or black, what have you, was not taken into account on this scoring system although it is still not clear what gain we would get from that with the current system. We will discuss that some more. And, most importantly, there was very poor correlation between extent of disease and pelvic pain. There really has not been a system that showed a correlation so how can you use a system if it does not in fact benefit the patient?
There were some studies done that showed that the staging system did correlate with the tubal condition and some papers were done MRI and laparoscopy and endometriomas but indeed at the end there were very few papers that showed a lot of utility of this system. There were many papers that showed an absence of correlation with endometriosis staging system. Even though people had been trying to stage endometriosis since 1921 we get to the late mid 1980s and 1990s and there is still 65/70 years later no system that really worked. The question is the current staging system does it help? The answer is unfortunately a resounding no.
I want to share with you three newer staging systems which are coming along. The last one being one that I have developed for fertility but I want to mention just briefly a couple of other ones that are out there now that I think have some interest. We will see what happens. The first is the Enzian Classification System which comes from Germany. This was an interesting system that was put together by a number of experts in Germany. I do not have time today to go through a lot of details of these at all but they wanted to look at a number of different aspects of endometriosis and how it might be staged. The people who got together to do this were surgeons by and large and so this was really a surgical staging system. They had a very complex set of numbers and do not even try to remember them; I can give you the reference for this if you want because it is a very, very complex staging system. They did have numbers that they wanted to use to correlate this with pain. But at the end of the day this system is detailed enough that there is almost nobody right now outside of Germany who is attempting to use it because it is just too complicated, and it was not in fact that clear how it should initially be used.
The AAGL has made a real effort over the past several years to develop a classification of endometriosis and again we could spend an hour talking about this and we do not have an hour. I just want to hit a couple of highlights about this and I think this is a very important initiative. Chuck Miller is here today. He has been involved with this and Mauricio Abrao from Brazil has been very involved with this. I have been a bit involved along with a number of other people in the room here today to look at developing a system. One of the reasons for doing this was the need to identify deep infiltrative endometriosis within the classification. If there has been a singular criticism of the AFS system other than the fact that it did not really predict much, it was that it did not define deeply infiltrative disease, especially of the bowel, ureter and rectovaginal area. Really the only aspect of it was managed was with the cul-de-sac obliteration of 40 which could occur with all different types of disease but really was not very descriptive. I think this is a completely legitimate criticism of the system.
People have started arguing about this and part of it I think came about because of the improved technology, much of which you have seen today that we had at surgery. As surgeons got better at seeing what was in the pelvis, first through laparoscopy where we just looked down the laparoscope, and then with better imaging, and now with much improved cameras and visualization of the pelvis we can see so much more in the pelvis. People came to recognize that endometriosis really had a lot of presentations that had not been all that appreciated before. Few surgeons had appreciated it but not really very many. As people started recognizing all this extra disease they said, “Well how are we going to describe this? What are we going to call this? Is this just bowel disease, ureteral, how big is it, what do we do with that?” So, more interest came about and you can see more articles came about.
I want to thank Chuck. The next three to four slides on AAGL came from Chuck Miller who already talked with you today. The AAGL looked at a large number of patients. You can see this is Mauricio’s paper here, and you can see the types of patients that were included. Importantly a lot of the patients had pain and you can see with deep endometriosis over half the patients had pain. Then others had both pain and infertility and a small number just infertility. You can see the distribution of patients. This is something that is very, very important because when you develop any type of system you have to know what kind of patients you are talking about because they are going to have different presenting problems, different types of disease, you need to know that.
When this system was started it was actually just a tabulation system. It went through a couple of iterations. So tabulation meaning just documentation as opposed to trying to make a classification system because to really make a classification system you have to tabulate the data first and then you have to analyze it. The idea here was to adequately describe morphology or shape of the disease and get some quantification. What does it look like and how much is it basically. This system actually then computed totals of David Redwine who was very involved with this. It allowed calculation of Fertility Index, which is a publication I will talk with you about in a minute, and included a patient information section, had statistical calculations. The outcome, the desired outcome, from collecting all these data - if we collect all these data on a lot of people we will be able to analyze and determine a better classification system.
One of the really important things to look at of course is to rate the severity of symptoms, and in particular pain. Pain is very, very difficult to assess and I personally like this scale, analogue scale of 0 to 10, where somebody draws a line about how much pain they are having. But there are some, believe it or not, very significant and legitimate criticisms of this as a measure of pain. The NIH, for example, does not use this as a measure of pain when they are looking at studies and will not accept it as a measure of pain. Here we are in 2012 and we still have very significant differences among smart people who are trying to deal with this problem about how we measure pain, how do we describe pain. So, clearly if it is difficult to describe it, and to measure it, we are going to have a difficult time deciding what it means and what types of interventions can help to improve it.
In this tabulation system that was started by AAGL there was a lot of information through this but the concept was abandoned because it was very, very labor intensive. I can tell you that it is because personally I keep track of what I have done at surgery after surgery. You probably know you have to go and write orders on the patient, write notes about what happened and then you have to dictate an operative note. I have always kept special forms on my patients. It actually takes me about 20 minutes after an operation by the time I get finished with all the different kinds of descriptions and things that are necessary. It takes real time and a lot of people are not going to spend the time to do this. If you want to get a broad amount of data you have to make it simpler. The AAGL moved onto a different approach to this and 30 internationally recognized experts in endometriosis surgery were asked to score endometriosis based on location, amount and surgical difficulty. Probably everybody who has talked here today has participated in this. We all filled out this form, and again, I am giving you a very abbreviated version of this today because we do not have time to go through it all. We basically put down numbers about how much of a number do we think a different lesion ought to get based on its impact. How much is this going to affect pain, affect fertility and you can see all these different numbers for the different surgeons that were down here and each of us went across. I do not know where – I am down in here somewhere, I remember that. I do not think I had too many big outliers. It is always interesting to look and see who thought something was worth a whole lot more or a whole lot less in points. But in actual fact there was really a lot of agreement on this.
After this was done and the patients were sort of organized into minimal, mild, moderate or severe they sort of did a little comparison with the ASRM stage and the count of AAGL and what you see is that there actually is some correlation. There are also some real differences, if you look at stage three, almost even distribution, not quite, but pretty even. Stage four tends to be more severe and that is what you would expect. There is a lot of overlap and this just reflects the fact that the ASRM system has some serious difficulties with it and the AAGL system is trying to be more predictive of outcome. When you go and look at the total pain, I am dealing mostly with pain on this, the average AAGL staging you can see that as a stage goes up the number goes up but it does not really change much for two, three or four. There is some discrimination here between really minimal disease but not so much among the more severe disease. If you look at the ASRM there is hardly any difference at all, which has been the criticism of it. The scores really did not predict anything. The AAGL system seems to have some discrimination in this range. Whether additional analysis will allow a different way of looking at the numbers or breaking them down to give us a more predictive ability will be interesting to see.
The concept is that once this scoring system is validated recognizing the scoring system came from 30 experts, probably ten of them in the room today, and said, “This is what we think the point score ought to be for an ovarian lesion or a peritoneal lesion or a bowel lesion, etc. We will try to analyze that and create a statistically based classification system to see if we can find differences. The idea is to have a staging system that will tell us something about how to manage pain.
I just want to finish by very briefly going through the endometriosis fertility index, which is an index used to predict outcome for fertility and this is sort of a lifetime labor of love for me. I started coding patients for this about 1982 or 1983 and did it for quite a while up through to about 2000. I still code them but for the people that went in this study. We had 801 patients in this study. There are actually about 1400 but we had really extensive data on about 801 patients, all of whom wanted to get pregnant. I want to emphasize that this EFI is only for fertility, it does not apply to pain. The concept was that while the ASRM system was a staging system that was good for standardizing reporting, we have already mentioned it had very limited success at predicting pregnancy. The objective I had in this study was to develop a useful clinical tool; a tool is something that can do something, called the Endometriosis Fertility Index (EFI) that predicts pregnancy rates in patients with surgically documented disease who attempt non-IVF conception. I will just mention that there has been an abstract that shows it actually does predict some outcomes in IVF. There are reasons for that I could discuss.
One of the interesting concepts in this was that we talked about a least function score, which is how good did things look after surgery and this was considered. This ovary here had just a very little amount of disease so we give that a score of four. This was an ovary with an endometrioma that you can see. That endometrioma was removed. She actually had bilateral endometriomas on both sides but the ovaries looked pretty good after removed so those ovaries got a score of two out of four, which is moderate disease. This is a tube with a bit of endometriosis on it that too would get a two out of four. Here is some fimbre at the end of the tube and you can see there is really scarring in here and this amount of scarring would make this tube about a two as well. I just wanted to share the principles – this is an endometrioma that has been taken out of an ovary that was sutured and this ovary would also get a two because it was smaller. This is a really severe endometriosis here where you can see we actually did shaving of the bowel here and there is salpingitis we are going to dose in the tubes and this patient mostly got one, so really severe disease.
So out of these scores of what things looked like after surgery, we made what is called the least function score. It is not important to get into the details except it basically said what is the worst looking structure on the left side and the right side and we added up scores, which went down here. We then looked at other factors that predicted pregnancy rates but I did not choose these arbitrarily. We looked at about 400 to 500 variables and we let the data tells us what mattered and what did not, and how much it mattered. Not surprisingly, what mattered for fertility was a patient’s age; younger patients got more points because they had a better chance of pregnancy. The years of infertility really mattered. Less than three years is good and more than three years is not good. A patient with a prior pregnancy had a better chance of pregnancy, not surprising. Of the ten points in the Endometriosis Fertility Index five come from history that you know before you go to the operating room. A total of a possible three points came from the least function score which is how well did the pelvis look after surgery. And then we had only one point out of the ten that came from the AFS endometriosis score and one out of the ten from the total score, which means lesions and adhesions. Only 20 percent of the whole score came from the AFS score. These were the only aspects of it which mattered. You add the score up and we then followed patients, and we actually validated this prospectively on an additional 200+ patients, and we got the score to see how patients did. The bottom line was patients with 0 to three had only a 10 percent chance of pregnancy over even a whole year, and as the score went up the pregnancy rates got higher.
What we do is we use this after surgery, after we have done laparoscopy, say how good does the pelvis look, what are this patient’s chances? Patients down here go straight to IVF and patients up here can be left for some time with different non-IVF treatment to try to get pregnant.
The point of the Endometriosis Fertility Index is that it is simple, it is robust, it is validated, it has been shown to work and it predicts pregnancy rates. It is useful in talking to patients about treatment plans. I think this is a system that can work for fertility and I know others in the room, Thomas D’Hooghe, who is here, and his group are using it to see how well it works in their hands. There is a group in Brazil I think, Mauricio Abrao, is using it to see how it works in his situation. I am hoping a lot of people use it. Maybe it can be improved so that we can have a system to predict outcome to help manage patients. What we need now, as you can see, is a similar system for pelvic pain. I am hopeful that maybe the AAGL system, once it is completely analyzed, will give us some increased measure of ability to predict outcome after patients have had surgery so that we can manage their care better.
Thank you very much.
Endometriosis Foundation of America