Скачать 0.59 Mb.
MR. DAWSON: Good morning, members of the panel. Thank you very much for affording the time for us to present the results of our analysis of the data that has been submitted by the sponsor. And thanks also to the company for making the data freely available and also especially for providing excellent documentation, without which nicely provided data usually is not terribly helpful.
We agree with the company that it is a good idea to provide a data summary to the physicians with the downloaded data as an aid in interpreting the trend data. We have, it turns out, gone in a rather different direction. So, what I am going to present to you for the next few minutes is something that we suggest as a possible statistical evaluation scheme that would go with the results for each patient.
Let me start with a table that you have already seen. The company referred to this as category agreement. I have the meter data on the columns; whereas, the company had the meter data on the rows. So, it is just a transpose of what you have already seen and I converted the frequencies into percentages.
I want to make the point here that agreement as both the company and the Agency have interpreted is a function of how many times the observations agree in terms of the glycemic ranges. That is the main diagonal of the 3 by 3 table. If you add up those percentages on the main diagonal, you get about 88 percent of the observations falling in the same ranges. That is, when the meter sees an observation as being below 56 units, then we have this percentage of the time that the sensor is also coming up with the same observation.
The sensor observations that we are using, of course, are those that have been produced by the calibration. In the data set that we had, about 1,300 of the observations were used for calibration and that left us another 4,000 to use to evaluate the results of the calibrations.
One other thing I need to say about this table is that it does have equivocal zones or the ambiguous cases excluded so that while 70 to 180 units was indicated as the in control range, a substantial number of cases have been omitted because of the ambiguity; the 20 percent zone that is placed around both of the 70 and 80 cutoff points.
There is a problem, obviously, of losing a great many observations. Out of 4,000 observations available, 2,500 have been lost to these ambiguous zones. Along with that, we have introduced the possibility of a biased estimation of the agreement.
So, because of our concerns about the loss of data and the possibility of biased estimation, I want to present to you the same table using all 4,000 observations. I think if you look at the two tables side by side, they don't look drastically different. The agreement rate, based on the main diagonal percentages, is now about 73 percent versus 87 or 88 percent.
So, that is the price that you pay for using all of the observations and it may reflect the possibility that the unbiased estimate of agreement is somewhat less rosy than with the pivotal cases removed.
Now I want to shift over to the subject of using regression to evaluate the results of the calibration. So, what we have done then is to use the data, the evaluation data, the 4,000 observations, to see how well the calibration worked. If the calibrated results are regressed on the meter results, then in a linear regression, you have a slope and an intercept and you would like to see the slope be 1.0. If it is 1.0, then there is a one to one agreement between the sensor and the meter.
And you would like to have an intercept of zero, meaning that the regression line would go through the origin, which would be further consistent with a one to one relationship between the calibrated sensor result and the meter result.
What this shows is the -- I beg your pardon -- I want to first of all just to sum up what you get if you look at the three by three table for each of the sensors individually. What this shows is that 90 percent to a hundred percent agreement is not really very unusual. In fact, something over a quarter of the meters and we had 331 meters to work with, something over a quarter of the time, the individual meters had all or nearly all of the observations falling on the main diagonal. So, it means that a substantial proportion of the meters were successful as regards this kind of agreement.
But it also shows that by no means all of the observations were not -- by no means, all of the sensors are going to perform that well. So, this is kind of a theme that has run through everything that we have done with the data, which is that the meter clearly agrees with the sensor a lot of the time, but also it does not always agree.
So, now, back to what I prematurely got into to, using regression to evaluate how well the sensor agrees with the meter. What we have done here is for each of the 300 or so sensors is to regress the calibrated results onto the meter results, using all of the evaluation observations for that sensor; that is, the net of those that were used for calibration.
Ideally, we would have slopes always in this category of about .9 up to 1.1, which sort of brackets the ideal 1.0. So, you can see that, again, a substantial proportion of the meters had good agreements or good agreements between the sensor and the meters, but by no means always. In fact, there is a considerable proportion of the time that the slope is below 1.0 by a fair amount.
So, again, this shows that there are times when there is good agreement and there are times when there is not.
Now, as to the other regression parameters, that is, the intercept, ideally we would have an intercept that is close to zero in this category here, based on my own massaging of the data, I came up with a category to me that represented a reasonable bracketing of an intercept of zero and that is this bar right here. You can see that, again, often there is good agreements and often there is not.
This shows a substantial amount of the time that that intercept is high. So, we have a recurring pattern of low slopes and high intercepts. And Greg Campbell, who is going to follow me, is going to tell you something about the significance of that.
Now I want to show you two examples of actual meters and this is a picture that looks very much like what the company has shown you earlier, which is the trend data for a particular sensor and what I have drawn in here is the in control range and for this purpose, I have not excluded the ambiguous cases. This is basically the 180 to 70 and these, again, are the glucose measurements and this is time along the horizontal axis.
The red line represents the evaluation results for the sensor -- the post-calibration results for the sensor. The x's represent the validation results. Those are the times when the sensor can be compared with the actual finger stick result obtained by the patient. This is what we would call a good picture because you have got consistently those x's lined up with the sensor results.
So, this, to us, represents an example where there is good performance on the part of the sensor relative to the meter. Now, what goes along with that are the slope and intercept and we have a slope of .83, which is fairly comfortably close to 1.0 and an intercept of 17.4. Going along with that is the regression R squared of 88 percent. A hundred percent would mean everything falling directly on the line.
Let me just make sure that it is clear that the regression results here are obtained by taking the validation points represented by the x's and then the corresponding points from the sensor reading so that those represent little pairs of values, the x's and the point on the red line that is closest to it corresponds to it in time.
So, it is those pairs then that are entered into the regression. So, you don't expect to see a straight line here. You only get a straight line when you compare the actual paired observations.
Okay. The agreement table shows that out of 23 observations available for validation -- I am sorry I put 13 there -- it is really 23 -- most of the time they are in good agreement in the sense of both being in the in control range. Of 17 out of the 23 observations, both the sensor and the meter are indicating that the patient is in control.
So, we are suggesting that this type of information might be provided to the health care provider along with the trend data, so that in looking at the trend data, I suppose if you look at that, you could -- practically everybody will say, well, this worked. I can really rely on these observations at the points in time where there is not a corresponding meter value. So, you could probably look at a picture like that and get a feeling of comfort that you can really interpret the data.
What we are suggesting is that these numbers down here, since they do basically agree with the picture, may be of assistance when the picture is not as clearcut and this is not in conflict with anything that the company was suggesting to provide in the way of averages, minimum or maximum values and absolute differences and correlations.
That is what I meant by saying that I thought we had gone a rather different direction, but not inconsistent with what the company has done.
Here is the second of our examples and this time the result is not as attractive. You can see that the x's are not always reasonably approximate to the points on the red line. So, this, to us, represents an example of a situation where a sensor has been worn for three days and when you download the information, it looks like it didn't work very well. Something happened, something about that particular device, perhaps, or the way it has been used by the patient or the patient's activities, something has kept it from performing the way you would expect it to or what you would expect to get if the meter could be used in real time basically.
Corresponding to that, we have a slope of .14. That .14 is rather close to zero, rather close to showing no linear relationship at all and intercept very high, 85.7 and an R squared of 13 percent. So, the point I want to make here is that these kinds of numbers can help you in the sense that there is going to be a certain relationship between a good picture in those numbers and a corresponding relationship between those numbers and a bad picture and it may help with the in between cases, where it is not clear what to make of the picture.
Now, one last point that I want to make is that the agreement is still rather good, even though you look at the picture and you look at the regression diagnostics and you don't like it. So, what this indicates is that agreement percentage is difficult to interpret when it is not flat out 100 percent. If it is flat out 100 percent, then all the observations in the validation data agree between the meter and the sensor. But when it is less than a hundred percent, then it is a little ambiguous and it is just because it is a rather course measurement scale when you come right down to it, just the three point ordinal scale.
So, that is probably not overall a terribly useful evaluation statistic or may not be. The same thing goes for correlation coefficient. We have problems interpreting correlation coefficients because of the non-linear relationship of that statistic makes it susceptible to outlier values. Single observations can drive a correlation very high and if you look at the scattergram, you find out that it is because of an outlier effect. So, you can be considerably misled by a correlation if you don't also have a scattergram.
Thank you very much.
DR. NIPPER: Thank you.
The next presenter is Dr. Gregory Campbell.
While we are changing speakers, I would like to acknowledge that Dr. Falls has joined us. We have all told who we are and why we are. Maybe you could do that while we are getting our next speaker ready to go, Beverly.
DR. HARRINGTON FALLS: I apologize for my lateness, but it has been quite an adventure today.
I am Beverly Harrington Falls. I am in private practice in OB/GYN in High Point, North Carolina and I am a regular member of the panel.
DR. NIPPER: Thank you. We are glad you are here.
Agenda Item: Remarks on Calibration
DR. CAMPBELL: Thank you.
I am the director of the Division of Biostatistics in the Center for Devices. What I would like to do is make some general remarks of things that one might have learned by looking at this particular submission. In particular, in the outline, what I will do is talk about notation, some of the assumptions that are used in the model, one assumption, in particular, of equality of variances, introduce an issue called measurement error and the attenuation of slope and then have three other issues at the end.
So, let me use some notation here. At the risk of making this a little too technical, let me call "X" the finger stick glucose estimate from the home use product in milligrams per deciliter and "W" is the output of the machine before it is downloaded, when it is downloaded, that is to say. So, that is the electrical stimulation value from the interstitial fluid measurements.
The CGMS value that I will talk about here will be the predicted blood glucose that one gets by estimating the glucose using the finger stick values and using the electrical stimulation values. So, at the end of three days, the data is then downloaded. It is married with the finger stick data at the right times and there is a prediction. That CGMS is a prediction.
So, the more general comment is this is the predicted value of blood glucose in a particular experiment.
The situation is that the model of interest, the fitted model, is assumed to be linear; namely, that the predicted value of blood glucose, CGMS value, is a linear function, B0 plus B1W, where "W" is the electrical output of the machine.
The way one would obtain these values of B0 and B1, the intercept and slope of the line, would be through some kind of linear modeling procedure that uses the data W and X.
A general comment is that it is not a sound procedure in a calibration situation, which is what this is, to force the calibration to go through the -- the line to go through the origin. That generally does not optimize the performance.
So, what I would like to do is talk about some of the statistical issues that are present in this submission and that really -- and really might be applicable to other ones as well, because this really is a statistical problem, although it doesn't appear to be.
In particular, the kinds of assumptions that one needs to make in doing this kind of calibration is that the underlying relationship between W -- that is the electrical stimulation output from the machine -- and X -- X is the blood stick glucose meter readings -- is inherently linear.
Now, the company in their submission is aware that there are times when it is not linear and they have some correction in involving the intercept to try and correct for that situation.
The second assumption is that the errors are independent from time period to time period. This is the errors in the model. These errors would be independent -- more reasonable to be independent if the time intervals are long than if the time intervals are short. So, when one is dealing with meter reading paired with predicted glucose values that are very close to each other in time, you may need to worry about things like the autocorrelation structure, the correlation of the errors at that point.
The last assumption then is an assumption that the variance of W at a fixed X is constant no matter what value of X you are at. X, remember, is the blood stick glucose value. W is the electrical stimulation output. The assumption in doing linear regression is that that variance is constant.
So, what I would like to do is address that notion in particular in the context of the submission. So, what one might want to do is plot W versus X or CGMS minus W for X and look for unequal variances. A simple way, I guess, to look at that is this. This is the data from one of the sites for the pairs W and X. This is based on the calculated blood glucose. So, these are the measurements that are used in the calibration stage. CGMS-2 refers to the second calibration scheme that the company used, which is the linear regression calibration scheme.
What one would hope to see here is that the bands were parallel. One would expect to see that instead of this V shape or this fan shape, one would expect to see that all the points are in some parallel bands about the line in this case with slope 1 and intercept 0. So, that may be evidence that there is -- that the variances are not equal and the question in that case is what could one do to try and address that situation.
Well, in 1980, there was a paper in a book called The Biostatistics Case Book, which in fact, the example was glucose monitoring systems and it was suggested there that what might happen is that the standard deviation, which is the square root of the variance is linearly related to the value of blood glucose.
If that were the case, a way to stabilize the variances and do the analysis would be to do logarithms of the X's and logarithms of the W's, do the analysis on the log-log scale and then transform back. You still get a line when you are finished, but you stabilize the variances.
Why is this important? This is important because if you have very large or very small values that are outliers, they will have an undue influence if you ignore this variance in equality issue. There are other ways -- you can also do iteratively related squares.
Now, another issue, which is somewhat concerning and this is actually data for the entire submission, not merely one site, this is what is called a residual plot in the regression, where you take the predicted value on the X axis, that is CGMS-2 in this case, and you look at it plotted against what are called the residuals.
What is the difference between the blood glucose measured by the blood stick home use glucometer and the predicted value? What you should see is no pattern there. You should see no pattern and you should see no fan-shaped thing either. The concern here is that there appears to be a slight but perhaps important downward slope in this, which is suggestive that maybe the calibration in this case is not quite right.
That is borne out here in the table at the bottom, which suggests that the slope is .08 and is statistically significant. I think it is important to note that the company has, as has been mentioned by the previous two speakers, been very forthcoming with the data and providing it electronically, and I think that is really moved the process forward immensely. It has accelerated greatly our ability to review this submission. That was provided in two CDs at different times, as well as a transmission over the Internet.
The second issue that I would like to talk about is one that is sort of hidden in the background. It is lurking but it is something that one needs to think about in problems of this sort. What I would call this is measurement error and the attenuation of slope.
The problem basically is that the blood glucose stick, capital X, that is used in the home use product may not accurately reflect the actual true blood glucose. Lower case x is the true blood glucose at a particular time. There is some error component. Call that U, that is associated with the blood stick measurement.
Let's assume for the moment that that error, which is unmeasurable has mean zero and variant sigma squared sub U. The problem basically is this. If you ignore this measurement error situation, you are trying to predict the true blood glucose and because of that, what happens is what is called the slope attenuation problem.
The slope should be 1 and the intercept should be 0 if everything is done correctly. The problem is if there is measurement error, that will not be the case. What will happen is the slope will attenuate, attenuate towards zero. It will become less than 1. And the other problem is the intercept, which is supposed to be zero will move up.
Now, what does that mean? In the worst case scenario if there is a lot of attenuation, what you get is the predicted value would be very close to the mean. Everything would look like the average. The meter in that case wouldn't work. The meter would just give you an average value all the time. It wouldn't be able to distinguish when people were hypoglycemic, when people are hyperglycemic.
So, that turns out to be an important consideration in these kinds of problems. This is a particular example of what is called regression to the mean. In fact, a slope regresses toward the mean. And it is also called in statistics the errors in variables problem.
There are solutions to this problem. If you know, for example, the variance of the error in U, the difference between the true blood glucose and the home use meter, then if you have a quantitative value for that, then you can adjust for the slope. You essentially blow the slope back up. So, instead of being less than 1, you bring it back up to 1 and you adjust the intercept correspondingly.
There are three other issues that I would like to just briefly mention. One is the issue of calibration versus validation and it is a simple issue that if one is trying to work on your calibration scheme, you have to prospectively validate it. If you get into a loop where you start to do the calibration and then you look at the validation and it is not so good and you go back and you do it again, it begins to have a bias associated with that procedure. So, it is very important if the calibration scheme is changed, that there is some effort to validate it prospectively.
The second issue is that there needs to be some kind of quantitative assessment of the fit. In the two figures that John Dawson had presented, it was pretty clear in a gestalt kind of way if you looked at the first one that that was a pretty good fit. And if you looked at the second one, well, that wasn't as good a fit and one might wonder if you should use the output of the continuous glucose meter in the second case.
There are a number of ways to quantify how well you are doing. You could use R squared. You could use the slope. You could use the absolute difference between the predicted and the actual value of the blood stick and you could use some matrices like the three by three matrices that John talked about or the five by fives. It is important that the schema you use to figure out how well you are doing, that you identify that before you collect the data and try and validate it.
The last issue that I want to briefly address relates to how many data points do you need in the calibration. The company in their presentation mentions that they are up to four points per day and it is calibrated every day, every sensor. In point of fact, almost 40 percent of the time there are one or only two values that are in the time ranges that they have specified.
What that means in terms of the calibration is you are not going to expect that it is going to do very well because one or two points determines a line. You can't measure the error associated with that. So, some way in which you might be able to calibrate across days might be helpful so you would have more points in the calibration scheme.
I think that is a good place to stop and turn it back over to Dr. Gutman for the questions.
DR. GUTMAN: I would like to simply read through the FDA questions that we would like to put on the table for your discussion throughout the course of the day. You will have a chance to revisit these and these don't in any way restrict you in pointing out strengths or weaknesses in either the clinical, scientific or statistical design of this device that we have missed.
The first question is: Does the type of data generated by the MiniMed sensor provide information that will be useful in the management of diabetes; that is, use of the device for continuous measurements up to 72 hours on an occasional basis?
The second question is: Patient-to-patient performance differences continue to be observed with the new calibration algorithm developed by the sponsor. Is there any way to identify successfully calibrated patients?
The third question: FDA believes prospective data is necessary to challenge and validate the calibration algorithm, which has been developed using the retrospective data set collected by MiniMed. What suggestions does the panel have on the types of data most useful in such a supplementary setting?
And a reminder that under FDAMA '97, our modernization act, we are always looking for a minimum reasonable data set.
Question No. 4: Testing for interference was done entirely using bench or in vitro studies. Should additional data, in vitro or in vivo, be generated to enhance our understanding of factors potentially confounding device results? What suggestions does the panel have on the types of data most useful in such expanded studies?
Question No. 5: FDA regulations indicate -- and I will quote the regulation and I think Sharon will quote it again later -- "There is a reasonable assurance that a device is effective when it can be determined that in a significant portion of the target population, the use of the device for its intended use and warnings against unsafe use, will provide clinically significant results."
Is the product as currently configured, calibrated and studied likely to be an effective aid in the management of diabetes? And if your answer is "yes," what additional data should be obtained and should this data be obtained as a premarket or postmarket condition of approval?
If your answer is "no, then what suggestions do you have on how to move forward with this new device?
Question No. 6: The sponsor claims the sensor can be used for up to 72 hours. Does the data presented support this claim? If not, what alternative claims or what additional data sets should be requested?
Question 7, there are a series of labeling questions. I would like to actually defer those individual components until we come back to talk about potential labeling.
Last, but in our view certainly not least, what suggestions does the panel have for a device of this type to enhance or to make sure you have an optimal education package for both patients and/or health care providers to help them understand the use of this device?
DR. NIPPER: Thank you, Dr. Gutman. And thank you to the other FDA presenters.
In the roughly half hour we have left before lunch, I would like to proceed to questions and if the panel agrees, maybe we could question the submitter first and then questions for the FDA second.
So, what I would like to do in this particular -- and the way that the panel is run is I would like to just go around the panel rather than have shows of hands. I do it sort of the random toss of the coin and I am going to choose to call on Dr. Rej first. We will move around this way. Then we will start with Dr. Janosky and move around. That way, Dr. Falls gets called on last. So that way she can catch up maybe.
|Veterinary medicine advisory committee||Ranch hand advisory committee|
|Advisory committee on immunization practices||National Vaccine Advisory Committee (nvac)|
|External Advisory Committee on Cities and Communities||Wildlife Diversity Policy Advisory Committee|
|Schedule 5 Appendix c other Medical Devices||Peer reviewed by the Arizona Department of Commerce Economic Research Advisory Committee|
|Food and drug administration national institutes of health advisory Committee on: transmissible spongiform||Advisory Committee, Cuyahoga Valley School-to-Career Consortium, Broadview Heights, Ohio 1996-2002|