On the bottom of this page, you will find the topic for discussion and the name of the contributor.
Please add comments and then click on the "Add comment" button.
A few random comments 
 This is a lot of work in SAS compared to the ordinal package in R (handles random effects)
 Sparseness does not have to do with nonproportional odds except in a strange way: the SAS PROC LOGISTIC test for proportional odds (which doesn't reference my student Bercedis Peterson's paper which invented the test and pointed out its shortcomings) will strongly reject H0:proportional odds even when there is perfect proportional odds, when cells are sparse.
 I suspect that your observations about standard errors when things become more sparse is related to the above. For proportional odds assessment I rely on partial residual plots.
Regards, Frank
Greetings, I can offer a couple more examples of pain analyses. Both designs are 3arm RCTs examing pain control during interventional radiology procedures. Thus, at baseline, before the procedure begins, average pain level should be nearly equivalent in the three groups. Patients are repeatedly asked their pain (and anxiety) levels at regular intervals (repeated measures). One key feature is that as patients' procedures are completed, they "drop out" so that observation end when the patient with the longest duration completes. With increasing sparsity, the standard errors around the groups' curves grow substantially! In study #1 we used a normal mixed model (SAS PROC MIXED or BMDP 5V); in study #2 we used a proportional odds model with random intercepts (SAS PROC NLMIXED). The proportional odds assumption failed owing to the sparsity of data at later times and at higher pain levels. We collapsed levels 9 and 10 into level 8 and then we were OK. We struggled to find an understandable graph of results and ended up showing binary "splits" at various thresholds. I've appended some SAS code for analyses in #2, prepared mostly by Ms. Xinyu Li, M.S., coauthor on the second paper. If I had it to do again, I would think more about controlling baseline covariates and the MAR assumption we relied on. I hope this is helpful, and I'd welcome any comments on the approach we took.
#1 Lang, Elvira V., Benotsch, Eric G., Fick, Lauri J., Lutgendorf, Susan, Berbaum, Michael L., Berbaum, Kevin S., Logan Henrietta, and Spiegel, David (2000). Adjunctive nonpharmacological analgesia for invasive medical procedures: a randomised trial. The Lancet, vol. 355, issue 9214, pages 14861490, April 2000. doi:10.1016/S01406736(00)021620
#2 Lang EV, Berbaum KS, Faintuch S, Hatsiopoulou O, Halsey N, Li X, Berbaum ML, Laser E, Baum J. 2006). Adjunctive selfhypnotic relaxation for outpatient medical proceudres: A prospective randomized trial with women undergoing large core breast biopsy. Pain, 126(13): 155164, Dec 15, 2006. PMID: 16959427. Best regards, Mike
 Michael L. Berbaum, Ph.D., Director Methodology Research Core Institute for Health Research and Policy (MC 275) University of Illinois at Chicago 1747 West Roosevelt, Room 558 Chicago, Illinois 60608 Tel: (312) 4130476 Fax: (312) 9962703 Email: mberbaum@uic.edu IHRP web site: http://www.ihrp.uic.edu
I hear ya.
Thanks Frank
The fundamental difference from many other methods is that ambiguities are allowed. We don't need to make strong assumptions (proportionality, linearity, independence) to ensure that the pairwise ordering among all subjects can be decided.
Knut
Hi Knut,
I'm in England, with apologies for the slow reply. Yes your example isn't controversial. Other combinations are harder to figure, which is why I like to adjust for baseline as a covariate instead.
Frank,
I agree that this discussion I proving more interesting than I had expected, including two of your recent remarks.
In fact, I wonder whether we should edit this discussion and make it available in some form.
First, I fully agree that "making the subject its own control" is heavily overrated among clinicians. As we have seen, it is less than trivial to formalize the concept of "change", being it difference, ratio, sign, ... Still, there may be cases where baseline values should be incorporated.
Which brings is to your second comment.
Uscores for multivariate data (Hoeffding 1948) are based on the assumption that  everything else being the same  more in any of the pain characteristics is worse. No linearity, proportionality, or independence being assumed. For instance, if subject A is coming in with a lower baseline and a higher outcome VAS than subject B, then A had less of a response than B. I would not expect too much of a controversy here.
The fundamental difference from many other methods is that ambiguities are allowed. We don't need to make strong assumptions (proportionality, linearity, independence) to ensure that the pairwise ordering among all subjects can be decided.
Knut
Hi Ron,
Thanks for the clarification!
Of course, knowing the scale does not automatically lead to the method, but it restricts the methods one can use. Chisquare only for nominal, utest for ordinal, u or ttest for ratio/interval only.
It is also important to remember that rank tests are no panacea (see Scheffe, 1959, chapter 10) and that the (apparent, assumed, ...) distribution of the data is not very helpful in choosing between tests that are both asymptotically distributionfree (like the t and the utest).
Still, I'd rather use a test that is approximately right, than one that is exactly wrong. If all we were interested in is alpha, we might simply toss 17 coins and if we get less than five of either heads or tails, we have a test for the 5% level, no need to even gather data Hence, we need to also consider what alternatives the tests are sensitive against, like deviations from the arithmetic mean being zero for the ttest vs the tendency among paired comparisons from 50:50 (not the median!) for the utest.
What's missing in many of these cook book rules is that we cannot choose a test by looking at the characteristics of the data (distribution) and the variables (scale), we also have to formalize the question (type of alternative) of interest.
Knut
Same here! Great discussion! Frank
(k) Enough ramblings for now. Thanks to all for the excuse to avoid doing other stuff on my plate.
Cheers, Ron Thisted
It is easy to convert the odds ratio and other parameters in the model to the mean or median pain score. Also, exceedance probabilities come straight out of the model and are easily interpreted by clinicians.
(h) Changes in pain (or in other symptoms that also have a subjective or selfreport element) may make sense in some contexts and not in others. For instance, pain after surgery eventually gets better. The focus may be on how rapidly this occurs. On the other hand, if one is studying chronic pain (that is, pain that one would not expect to improve in the natural course of things), then the focus is definitely on how much improvement in pain can be achieved, and in what fraction of patients.
(i) Consider the situation in which patients are randomized to two treatments, and two hours later, a VAS pain measurement is taken. If the point of an exercise is testing the null hypothesis of no difference between groups, lots of sensible and familiar tests will work just fine, in the sense that they will be valid tests of H0 and will have (approximately) the right size. For this purpose, the difference between a ttest and a proportionalodds regression based measure (taking each unique observed VAS score as a "cutpoint") will depend upon the alternatives against which wants to have greatest power.
(j) If the point is to estimate the size of the treatment effect, then one has to have some sense as to what differences on some scale mean. In the context of anesthesia for certain particular operative procedures, for instance, VAS pain measurements of 30 or below are considered adequately low scores, and pretreatment scores of 50 or 60 are typical. In this context the mean and SD of VAS scores has a clinical interpretation (which may not extend to other contexts). The odds ratio (from a proportional oddsbased analysis) would not be easily understood or communicated, and it would be hard to relate to what clinicians already understand about how the VAS works in this particular contexteven if it were the basis for a more sensitive test of the differences between the VAS distributions under the two treatments.
I haven't experienced that problem. You can model baseline using dummies or using quadratic or spline functions.
(g) I agree with Knut that changes in pain can (and often should) be analyzed using methods other than simply taking the numerical difference in pain scales. Transition models (with a small number of defined ordered categories) are often successful at doing this. Proportional odds models, while incredibly useful for comparing groups at a single point in time (such as the completion of a randomized clinical trial), are less easily used when one wants to make inferences conditional on, say, a baseline variable that itself is measured as an ordered category (for instance, baseline pain assessment).
I don't think that follows. I agree that clinicians think this is more interpretable but I think they are largely fooling themselves, mainly because of floor and ceiling effects. An unbiased estimate of current status is going to be quite useful, and can be calibrated in the sense you are saying, by including baseline level (or a spline function of it) as a covariate.
(f) Changes in pain scores (as opposed to changes in other kinds of scores) can be particularly important, since withinsubject pain scores are likely to be much better calibrated than betweenpatient scores. So from an interpretability standpoint, clinicians and others often find changes to be more compelling than raw scores. And as we know, if the withinsubject correlation exceeds 0.5, there are efficiency gains to the use of difference scores.
Often I see major nonproportionality yet the PO model fit better than all the other models I was entertaining.
(e) I agree with Frank that the proportional odds and related models are not known (or used) widely enough. As with almost all models, the assumptions under which they work best (constant proportionality of odds between groups) always hold only approximately. Conditional on actually using a proportional odds model, examining the extent to which proportionality holds, and critically assessing the extent to which it really matters whether it holds, are also not done widely enough.
The real problem with the central limit theorem is that for a given dataset we don't know if it applies (this is more true for highly skewed Y).
(d) The utility of a particular analysis depends more on the study design and the substantive question than on the scale of measurement. The central limit theorem works wonders in many situations. (For instance, if one applies the twosample ttest to binary dataordinal scale at mostthe test is essentially equivalent to the chisquared test for comparing proportions.)
Very nice discussion Ron. It should be noted that the Wilcoxon tests almost always tests a stochastic ordering hypothesis that is relevant. We tend to get ourselves in trouble when we use the t or normal approximation for getting Pvalues with the Wilcoxon. If you have scale differences (or other simple translation differences) you can get very accurate Pvalues using the general Ustatistic standard error, as implemented in the R Hmisc package's rcorr.cens function.
Dear Laura Lee, Frank, Knut, Greg, et al:
A few random thoughts on pain stimulated by the (less random) notes of others:
(a) Regarding Laura Lee's original request, Thomas Permutt at FDA has done some very thoughtful work on analyzing pain outcomes in the context of clinical trials. I am not sure if his work has been published, but it has been influential in the design and analysis of Phase III studies of drugs intended to affect pain. A key reference is the IMMPACT recommendations (2005, "Core outcome measures for chronic pain clinical trials: IMMPACT recommendations," Pain 113: 919).
(b) The emphasis on scale of measurement (ordinal, interval, ratio, etc) has the potential to sidetrack us from the most important questions of design, analysis, and inference. As often as not, focusing on scale of measurement can be misleading. It is particularly pernicious when it leads to automatic choices of the "correct" statistical analysis based on measurement characteristics and not consideration of the study design, distributional characteristics of the measurements, subjectmatter knowledge, and identification of the question that really needs to be answered. The outstanding paper by Velleman and Wilkinson makes a convincing case. [Velleman, P. F. & Wilkinson L. (1993). Nominal, ordinal, interval, and ratio typologies are misleading. American Statistician, 47, 6572.]
(c) The identification of a particular statistical test with a scale of measurement is often gets things badly wrong. For instance, it is commonly stated that the ttest assumes an interval scale, while the Wilcoxon (MannWhitney) test assumes only an ordinal scale. That is not correct. In fact, the twosample Wilcoxon procedure relies on the assumption that the two distributions differ only in location and not in shape. In particular, that variances and skewness are identical in the two distributions, and that one is simply a shifted version of the other. Simply having an ordinal scale of measurement is not sufficient to justify the validity of the Wilcoxon test. Indeed, if the two groups are normally distributed and have the same mean, but one standard deviation is twice the size of the other, the size of a nominal 0.05 Wilcoxon test is actually 0.074 (JW Pratt, JASA 1964, 59: 66580).
Hi Laura,
If it adds to it, our group have also worked on brain imaging and metaanalysis aspects of pain research:
Leung A, Duann J, McGreevy? K, Li E, Xu R, Donohue M, et al. The supraspinal pain pathway of the thermal grill illusion. NeuroImage?, 2009; 47(Supplement 1): S61S61.
Leung AY, Donohue M, Xu R, Lee R, Lefaucheur J, Khedr E, Saitoh Y, Andre Obadia N, Rollnik J, Wallace M, Chen R. rTMS in suppressing neuropathic pain: a metaanalysis. The Journal of Pain, 2009; 10(12): 120516.
Thanks, Ronghui (Lily) Xu Professor Division of Biostatistics and Bioinformatics Department of Family and Preventive Medicine and Department of Mathematics Director, CTRI Design and Biostatistics University of California, San Diego 9500 Gilman Drive, Mail Code 0112 La Jolla, CA 920930112
Hi Knut,
Good discussion. I think the score you've specified will make even more assumptions than the proportional odds assumption though.
I don't think that change will do better adjusting for baseline differences, because of floor and ceiling effects.
Best, Frank
A deterministic reply to a random comment: analyzing changes in pain could potentially adjust for differences in baseline pain perception without the need of making assumptions about proportionality. Of course, "analyzing changes" does not necessarily mean "computing differences of scores". For instance, one could score a particular subject's response (outcome vs baseline) as
 the number of subjects with a largerorequal baseline and a smallerorequal outcome (smaller effect) minus  the number of subjects with a smallerorequal baseline and a largerorequal outcome (larger effect).
These 'uscores' would score changes on one (or several) ordinal outcomes without computing differences.
Knut
A random comment: I think it is a mistake to analyze change in pain status. The difference in two ordinal scales is not ordinal. There are many reasons to have the final pain severity as the outcome, adjusted for initial severity as a baseline covariate.
A nice feature of the proportional odds model is that you can have as many categories as you have unique Y values.
Frank
Hi all
Like John Connett I was involved in the Shlay (1998) study. The more we looked and analyzed the Gracely continous scale, I think it is fair to say as the study went on, the less we believed that it measured something real. We also had a “Global Pain Relief Scale” that was ordered “Complete, A lot, …none, Pain got worse” that was more believable in interpretation and in analysis. Fortunately results were consistent.
The rationale for the Gracely scale was that a previous study in diabetic peripheral neuropathy had used the scale – and so for the HIV and acupuncture design there was data – and there was a lot of support for using the same scale from clinicians. In retrospect not a great idea to perpetuate a bad endpoint.
With hindsight, the simple global pain relief scale made a much better endpoint and analysis and was much more interpretable. We used ordinal response models.
Kathryn
Kathryn Chaloner, PhD
319 384 5029
Hi Laura Lee,
I find the empirical "validation" for using methods based on the linear model for individual VAS scales less convincing, but the real problem lies in the complexity of measuring complex phenomena, such as pain, on a variety of scales. Shlay (1998): Patients rated their pain in a diary once daily, choosing from the Gracely scale of 13 words that describe the intensity. The words had been assigned magnitudes on the basis of ratioscaling procedures that demonstrated internal consistency, reliability, and objectivity.  Comparison of treatment groups for the primary end point of change in pain, as measured by the pain diary, used a linear model with baseline characteristics, clinical unit, and option (factorial or single factor) as covariates. Griffith (2008): The primary outcome measure was the mean difference in the subjects’ selfreported pain scores before and after the administration of the initial medication treatment. A pain score reduction of 3 or more points after the initial treatment was considered clinically effective and used as a cutoff point to dichotomize the primary outcome measure for multivariate statistical analyses. I agree with Frank that ordinary regression may not be appropriate to generate valid comprehensive scores. Trying to avoid the problem by dichotomizing the outcomes at an arbitrary cutoff point may also not be a good solution.
Under WebServices/MuStat, CTSPedia offers biostatstical tools (spreadsheets, R package, and Web server) that help to resolve some of these problems by creating scores/metrics that are intrinsically valid, because fewer assumptions need to made and, thus, empirically "validated".
BTW, in a collaboration with the NINR we are corrently using the same WebServices/MuStat to screen for genetic risk factors of fibromyalgia, yet another way of addressing the many open questions in pain research using the novel methods and tools developed by BERD.
Here are the references:
Morales (2008): www.bepress.com/sagmb/vol7/iss1/art19/ (complex phenotypes, such as pain)
Wittkowski (2010) www.ncbi.nlm.nih.gov/pubmed/20652502 (comprehensive overview in a book with many CTSA contributions)
Rubio (2011): www.ncbi.nlm.nih.gov/pubmed/21284015 (on the crossfertilization of BERD developing metrics applied both by and to BERD practicioners)
Knut
Laura,
An example of pain measurement and analysis in an acupuncture study:
Shlay, Chaloner et al. (1998) "Acupuncture and amitriptyline for pain due to HIVrelated peripheral neuropathy," JAMA 280: 15901595.
John C.
Greg,
I'm amazed that I still see people analyzing ordinal scales using ordinary regression. The proportional odds model and its cousins are still not known to vast areas of research.
Frank
Laura,
Pain research usually involves a visual analog scale (VAS) measurement of pain. There is confusion, however, if these can be analyzed as a continuous variable (interval scale), or if should they be considered an ordered categorical variable (ordinal scale). That is, there is inconsistent in how these scales are analyzed. You could clear up this confusion in your talk.
I introduce this topic on page 3 of Ch 26 of my course manual that is available in CTSpedia, the educational materials section. Another website for it is given in the footnote on the first page. In that chapter, I provided citations and justification of why it can be analyzed as an interval scale. I have attached two papers I cited.
On p 41 of Ch 21, I give the taxonomy of levels of measurement, which you might use as background material.
Thanks,
Greg
Dear Laura Lee,
Attached is a study that was published in Journal of Pain and conducted at Northwestern University Department of Emergency Medicine by one of our senior residents, a junior faculty person, and me. We conducted a retrospective cohort study to compare "Metoclopramide Versus Hydromorphone for the Emergency Department Treatment of Migraine Headache."
I use this study in my Intermediate Epidemiology course to illustrate the different types of confounding by indication. While we adjusted for potential confounding by severity of the migraine headache in the adjusted relative risk comparison of reduction in migraine pain, we did not adjust for the potential confounding by indication for nausea or vomiting that is frequently associated with more severe migraine headaches and is often treated with metoclopramide which is an antiemetic medications. Thus, there is still potential for confoudning in the study.
Let me know if this is useful to you and if you have any questions.
Demetrios N. Kyriacou (Jim)
Title  BERD  Pain Presentation 
Description  Problem to be explored 
I have been asked to give a 20 minute synopsis of “Resources available through the CTSA Biostatistics, Epidemiology and Research Design (BERD) Key Function Committee” on Wednesday to the CTSA Pain research interest group. I plan to ask the group what research design issues are currently at the forefront for developments in pain research and will report back on what needs and questions I am asked. That said, can folks on our BERD list email me with ideas or examples I can use in my talk? I plan to go through the BERD watch series looking for ideas but hoped you all might have additional thoughts. I will also talk about CTSpedia, and say hey, each of your CTSAs have folks very interested in talking with you early and often! Plus, local BERD people can use the KFC to find expertise that may not be available locally. 
Contributor/Email  Laura Lee Johnson (johnslau@mail.nih.gov) 
See Also 

Disclaimer  The views expressed within CTSpedia are those of the author and must not be taken to represent policy or guidance on the behalf of any organization or institution with which the author is affiliated. 