Rating scales are increasingly the primary outcome measures in clinical trials. However, clinically meaningful interpretation of such outcomes requires that the scales used satisfy basic requirements (scaling assumptions) within the data. These are rarely tested. The SF-36 is the most widely used patient-reported rating scale. Its scaling assumptions have been challenged in neurological disorders but remain untested in Parkinson's disease (PD). We therefore tested these by analyzing SF-36 data from 202 PD patients (54% men; mean age 70) to determine if it was legitimate to report scores for the eight SF-36 scales and its two summary measures of physical and mental health, and if those scores were reliable and valid. Results supported generation of the eight SF-36 scale scores and their reliabilities were generally good (> or = 0.74 in all but one instance). However, we found limitations that question the meaningfulness of four scales and other limitations that restrict the ability of four scales to detect change in clinical trials (floor/ceiling effects, 19.6-46.2 %). The two SF-36 summary measures were not found to be valid indicators of physical and mental health. This study demonstrates important limitations of the SF-36 and provides the first evidence-based guidelines for its use in PD. The limitations of the SF-36 demonstrated here may explain some unexpected findings in previous studies. However, the main implication is a general one for the clinical research community regarding requirements for reporting rating scale endpoints. Specifically, investigators should routinely provide scale evaluations based on data from within major clinical trials.