Tags:
create new tag
, view all tags, tagging instructions
Return to Statistical Graphics Home

Please rate the graphic:

  • 5 stars = highest rating
  • 1 star = lowest rating
  • SCORE = average of all votes

Graph Rating
Score: 0, My vote: 0, Total votes: 0

Please add comments and then click on the "Add comment" button.

 

StatClassForm edit

Title Scatterplots and Bivariate Density Plots
Graph Displayed class
Graph Subgroup General Principles
Classification-Graph Type Scatterplot
Code Added Yes
Date Original
Original Date April 14, 2011
Modified Date

Contributor/Email Richard Forshee (email: Richard.Forshee@fda.hhs.gov)
Contributor1 Modification/Email

Contributor2 Modification/Email

Contributor3 Modification/Email

Disclaimer The opinions expressed in this document are those of the author and may not represent the opinions of the U.S. Food and Drug Administration or other authors.
Type of Data Continuous
Type of Analysis Bivariate
Description and Purpose Scatterplots are used to represent and explore the relationship between two continuous variables. The two axes represent the continuous variables, and each data point is plotted at the x,y coordinates corresponding to its values for the two variables. Scatterplots can help to identify whether two variables have a linear or non-linear relationship or whether they are not related to one another. Unusual observations, such as extreme values, possible coding errors, or possible outliers, can often be identified by scatterplots. Unusual clusters of observations that do not seem to conform to the general patter may be observed in a scatterplot. Exploring these divergent clusters of observations can lead to important insights.
Datasets CDISC Datasets
Data The following plots are based on the ADLBC data set available on the CTSPedia website (Links to CDISC. The scatterplots examine the relationship between blood glucose levels (mmol/L) at baseline and week 26 in the clinical trial.
Example1Title Basic Scatterplot
Example1Description The Example1 scatterplot suggests a linear relationship between baseline blood glucose levels and the levels measured at week 26 in the trial. There is one extreme value at about baseline 22 mmol/L and week 26 15 mmol/L. While extreme, the value does not appear to be discrepant from the overall pattern in the data. There is a large cluster of observations in the general vicinity of baseline 5 mmol/L and week 26 5 mmol/L, and the large number of observations in this area of the graph has produced some overstrikes (two or more data points overlapping).
Example1Image
Click on figure for enlarged image
Example2Title Scatterplot Overlays Confidence Ellipses
Example2Description A variety of overlays can be added to scatterplots to improve the visualization of the relationship between the two variables.

Confidence Ellipses
Confidence ellipses (Ref. 1, Ref. 2, Ref. 3) draw an ellipse that contains a specified percentage of all the observations. Confidence ellipses can help to identify the nature of the relationship between the two variables. A narrow ellipse that is tilted away from the horizontal axis suggests a strong linear between the two variables. An ellipse that is closer in shape to a circle indicates that there is not a linear relationship between the two variables.

The confidence ellipse plots for the blood glucose lab results suggest a linear relationship between baseline and week 26 results.


Example2Image
Click on figure for enlarged image
Example3Title Scatterplot Overlays Linear Fit
Example3Description Linear fit overlays
It may be helpful to overlay a scatterplot with a statistical model fit to the data. Here we add two overlays to assist in the interpretation of the data. First, a line for the linear regression estimates and a pair of curves for the 95% confidence interval of the prediction is plotted to show the association between the blood glucose levels at baseline and at week 26. Second, the reference line of identity, y=x, has been added for the particular purpose of assessing the similarity of blood glucose levels at baseline and at week 26.
Example3Image
Click on figure for enlarged image
Example4Title Scatterplot Overlays Fractional Polynomial Fit
Example4Description Fractional polynomial fit overlays
It is also possible to include a non-linear model that has been fit to the data. Here we use a fractional polynomial model. The results are very similar to the linear model for these data.
Example4Image
Click on figure for enlarged image
Example5Title Scatterplot by Subgroups
Example5Description Plotting by sub-groups
Contrasting symbols
Scatterplots are often used to explore possible differences between subgroups. Men and women, for example, may respond differently to a medical treatment. The scatterplot below uses squares as the marker symbol for females and circles as the marker symbol for males. It is difficult to see any patterns by sex in the scatterplot because of the large cluster of overlapping points in the general vicinity of baseline 5 mmol/L and week 26 5 mmol/L.
Example5Image
Click on figure for enlarged image
Example6Title Scatterplot by Subgroups - Separate Panels
Example6Description Plotting by subgroups in separate panels
An alternative method to compare sub-groups is to plot the data in side-by-side panels. We have also added a linear fit overlay to these plots for this example. The differences between men and women are more clear in this scatterplot. The relationship between baseline and week 26 glucose values appears to be stronger for women, as indicated by the steeper slope of the fitted regression line. This may be influenced by the extreme value at about 22 mmol/L and week 26 15 mmol/L.
Example6Image
Click on figure for enlarged image
Example7Title Sunflower Density Plots
Example7Description Sunflower density plots
As mentioned previously, there is a cluster of values in the vicinity of baseline 5 mmol/L and week 26 5 mmol/L. The density of this cluster causes some overstrikes and makes it difficult to see the details of the distribution in a standard scatterplot. In these situations, an alternative approach is to use bivariate density plot, such a sunflower density plot (Ref. 4). In a sunflower density plot, the plot area is divided into bins.

Example7Image
Click on figure for enlarged image
Example8Title Scatterplots for Change versus Average Measurement
Example8Description Scatterplots for Change versus Average Measurement
The Bland-Altman plot, also known as the Tukey mean-difference plot, is a method for comparing the agreement between two different assays. It plots the mean of the two assays on the x-axis and the difference between the two on the y-axis. It is used to check the assumption that differences are independent of initial values, i.e., that the difference is computed on the correct transformation of the response. A B-A plot that indicates non-constant variation about the horizontal line or that indicates a non-horizontal trend provides evidence that the incorrect change score has been used.



1-Wikipedia: Bland-Altman plot. This article is licensed under the GNU Free Documentation License. It uses material from the Bland Altman plot Wikipedia article. The list of authors can be viewed in the history page.

2-Altman DG, Bland JM (1983). "Measurement in medicine: the analysis of method comparison studies". Statistician 32: 307317. Click icon

3-Bland JM, Altman DG (1986). "Statistical methods for assessing agreement between two methods of clinical measurement". Lancet 1 (8476): 30710.

Example8Image
Click on figure for enlarged image
Example9Title

Example9Description

Example9Image

References Alexandersson, A. 1998. gr32: Confidence ellipses. Stata Technical Bulletin 46: 10-13. In Stata Technical Bulletin Reprints, vol. 8, 54-57. College Station, TX: Stata Press.Click here for article external

Alexandersson, A. 2004. Graphing confidence ellipses: An update of ellip for Stata 8. Stata Journal 4(3): 242-256.Click here for article external

McCartin, B. 2003. A Geometric Characterization of Linear Regression. Statistics: A Journal of Theoretical and Applied Statistics 37(2): 101-117. link to article:Click here for article external

Dupont WD and Plummer WD: Density distribution sunflower plots. Journal of Statistical Software, 2003; 8:(3)1-5. Posted at http://www.jstatsoft.org/v08/i03/paper. Accessed June 14,2011. See also the Journal of Computational and Graphical Statistics, 2003;12:247.

Dupont WD, Plummer WD: Using density distribution sunflower plots to explore bivariate relationships in dense data. Stata Journal, 2005;
5(3):371 - 84.



Reference1
Alexandersson, A. 1998. gr32: Confidence ellipses. Stata Technical Bulletin 46: 10-13. In Stata Technical Bulletin Reprints, vol. 8, 54-57. College Station, TX: Stata Press.Click here for article external
Reference2
Alexandersson, A. 2004. Graphing confidence ellipses: An update of ellip for Stata 8. Stata Journal 4(3): 242-256.Click here for article external

Reference3
McCartin, B. 2003. A Geometric Characterization of Linear Regression. Statistics: A Journal of Theoretical and Applied Statistics 37(2): 101-117. link to article:external
Reference4
Dupont WD and Plummer WD: Density distribution sunflower plots. Journal of Statistical Software, 2003; 8:(3)1-5. Posted at http://www.jstatsoft.org/v08/i03/paper. Accessed June 14,2011. See also the Journal of Computational and Graphical Statistics, 2003;12:247.

Software Program R, Stata
Software Stata 11
Graphical Parameters

CodeExample1 - Attachment

CodeExample1 Code for Examples
Stata 11 Software

**********************************************************
**
** Scatterplot Examples for FDA-Industry-Academia Safety Graphics WG
** Richard Forshee, FDA/CBER/OBE
**
** Last updated April 12, 2010
**
**********************************************************

clear

cd "C:\Documents and Settings\forshee\My Documents\CTSPedia"

use ADLBC // This data set is available on CTSPedia in SAS Transport format
datasignature confirm // Confirming that the data have not changed

keep if lbtestcd=="GLUC" // Selecting only glucose tests

** Checking summary statistics for quality control
summarize blstresn lbstresn
bysort visit: summarize blstresn lbstresn

stem blstresn if visit=="WEEK 26"
stem lbstresn if visit=="WEEK 26"

** Creating basic scatterplot comparing baseline and week 26 results

twoway (scatter lbstresn blstresn if visit=="WEEK 26", msize(small)), ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(off)


graph export scatterbasic.eps, replace

** Generating data for a 50% and 90% confidence ellipse
** Requires --ellip-- from the SSC
** Use --ssc install ellip-- to install

ellip lbstresn blstresn if visit=="WEEK 26", c(f) level(90) g(ey90 ex90)
ellip lbstresn blstresn if visit=="WEEK 26", c(f) level(50) g(ey50 ex50)

** Creating a scatterplot with a 50% and 90% confidence ellipse overlay
twoway ///
(scatter lbstresn blstresn if visit=="WEEK 26", msize(small)) ///
(scatter ey90 ex90, connect(l) msymbol(none)) ///
(scatter ey50 ex50, connect(l) msymbol(none) lpattern(dash)), ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) xlabel(0(5)25) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(order(2 3) rows(2) label(2 "90% of observations") ///
label(3 "50% of observations"))

graph export scatterellip.eps, replace

** Creating a scatterplot with a linear regression overlay
twoway ///
(lfitci lbstresn blstresn if visit=="WEEK 26", clpattern(blank) ciplot(rline) alpattern(dash)) ///
(scatter lbstresn blstresn if visit=="WEEK 26", msize(small)) ///
(lfit lbstresn blstresn if visit=="WEEK 26"), ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(row(2) order(4 1) label(1 "95% Confidence Interval of Prediction") ///
label(4 "Linear Prediction"))

graph export scatterlfit.eps, replace

** Creating a scatterplot with a fractional polynomial overlay
twoway ///
(fpfitci lbstresn blstresn if visit=="WEEK 26", clpattern(solid) ciplot(rline) alpattern(dash)) ///
(scatter lbstresn blstresn if visit=="WEEK 26", msize(small)), ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(rows(2) order(2 1) label(1 "95% Confidence Interval of Prediction") ///
label(2 "Linear Prediction"))

graph export scatterfpfit.eps, replace

** Scatterplot comparing men and women (single plot)
twoway ///
(scatter lbstresn blstresn if visit=="WEEK 26" & sex=="M", msize(small) msymbol(smcircle)) ///
(scatter lbstresn blstresn if visit=="WEEK 26" & sex=="F", msize(small) msymbol(smsquare)), ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(label(1 "Males") label(2 "Females"))

graph export scattermf.eps, replace

** Scatterplot comparing men and women in two panels with lfit overlay

twoway ///
(lfitci lbstresn blstresn if visit=="WEEK 26" & sex=="M", clpattern(solid) ciplot(rline) alpattern(dash)) ///
(scatter lbstresn blstresn if visit=="WEEK 26" & sex=="M", msize(small) msymbol(smcircle)), ///
title("Blood Glucose Lab Results for Males") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)", margin(medium)) ///
ytitle("Week 26 Result (mmol/L)", margin(medium)) ///
xscale(range(0 25)) xlabel(0(5)25) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(order(2 1) label(1 "95% CI") label(2 "Prediction"))

graph save male, replace


twoway ///
(lfitci lbstresn blstresn if visit=="WEEK 26" & sex=="F", clpattern(solid) ciplot(rline) alpattern(dash)) ///
(scatter lbstresn blstresn if visit=="WEEK 26" & sex=="F", msize(small) msymbol(smcircle)), ///
title("Blood Glucose Lab Results for Females") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)", margin(medium)) ///
ytitle("Week 26 Result (mmol/L)", margin(medium)) ///
xscale(range(0 25)) xlabel(0(5)25) yscale(range(0 25)) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(order(2 1) label(1 "95% CI") label(2 "Prediction"))

graph save female, replace

graph combine male.gph female.gph
graph export scattermf2panels.eps, replace


** Creating sunflower density plot
twoway sunflower lbstresn blstresn if visit=="WEEK 26", ///
title("Blood Glucose Lab Results") ///
subtitle("Baseline Compared to Week 26") ///
xtitle("Baseline Result (mmol/L)") ///
ytitle("Week 26 Result (mmol/L)") ///
xscale(range(0 25)) yscale(range(0 25)) xlabel(0(5)25) ylabel(0(5)25) ///
aspectratio(1) xsize(3) ysize(4) ///
legend(rows(3) label(1 "Single Observation"))

graph export sunflower.eps, replace

**
** Plotting Change vs Average
** Bland-Altman or Tukey mean-difference plot
**

gen avg = (lbstresn+blstresn)/2
label var avg "Average glucose at baseline and visit"

gen diff = lbstresn-blstresn
label var diff "Difference in glucose between visit and baseline"

twoway (scatter diff avg if visit=="WEEK 26", msize(small)), ///
title("Blood Glucose at Baseline and Week 26") ///
subtitle("Difference versus Average") ///
xtitle("Average Blood Glucose at Baseline and Week 26 (mmol/L)") ///
ytitle("Difference Between Blood Glucose" "at Week 26 and Baseline (mmol/L)") ///
yscale(range(-10(5)10)) ylabel(-10(5)10)

graph export scatterblandaltman.eps, replace
CodeExample2 - Attachment ]
CodeExample2

CodeExample3 - Attachment*

CodeExample3

CodeExample4 - Attachment*

CodeExample4

CodeExample5 - Attachment*

CodeExample5

CodeExample6 - Attachment*

CodeExample6

R-Code - Attachment R-code for sunflower
R-Code

SAS-Code - Attachment

SAS-Code

Stata-Code - Attachment

Stata-Code

Other Code - Attachment

Other Code

Keywords

Disclaimer The opinions expressed in this document are those of the author and may not represent the opinions of the U.S. Food and Drug Administration or other authors.
Permission Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT OLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Reference Image Forshee_Scatterplot_Example1_200.png
Topic revision: r11 - 21 Jun 2012 - 11:30:55 - MaryBanach
 

Copyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback