Tags:
create new tag
, view all tags, tagging instructions
Return to Statistical Graphics Home

Please rate the graphic:

  • 5 stars = highest rating
  • 1 star = lowest rating
  • SCORE = average of all votes

Graph Rating
Score: 0, My vote: 0, Total votes: 0

Please add comments and then click on the "Add comment" button.

 

StatClassForm edit

Title Basic Histogram
Graph Displayed class
Graph Subgroup General Principles
Classification-Graph Type Histogram
Date Original
Original Date January 4, 2010
Modified Date

Contributor/Email Richard Forshee (email: Richard.Forshee@fda.hhs.gov)
Contributor1 Modification/Email

Contributor2 Modification/Email

Contributor3 Modification/Email

Disclaimer The opinions expressed in this document are those of the author and may not represent the opinions of the U.S. Food and Drug Administration or other authors.
Type of Data Continuous
Type of Analysis Univariate
Description and Purpose Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.
Datasets

Data

Example1Title Basic Histogram Composite
Example1Description All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

Stem-and-Leaf Plot of Randomly Generated Data Used for Examples
plot in units of 0.01
0* | 334
0. | 6778889
1* | 011112333344
1. | 5555667889
2* | 0123333444
2. | 55666678999
3* | 00122222222334
3. | 55566889999
4* | 011114
4. | 5567789
5* | 1223
5. | 5788
6* | 1
Example1Image
Click on image to enlarge
Example2Title Complementary Examples
Example2Description Types of complementary examples:
Kernel density overlays
Theoretical distribution overlays
Rug plots

Example2Image
Click on image to enlarge
Example3Title Historgram Pitfalls
Example3Description Histogram Pitfalls: Visual representation is very sensitive to the choice of bin size
Example3Image
Click on image to enlarge
Example4Title

Example4Description

Example4Image

Example5Title

Example5Description

Example5Image

Example6Title

Example6Description

Example6Image

Example7Title

Example7Description

Example7Image

Example8Title

Example8Description

Example8Image

Example9Title

Example9Description

Example9Image

References These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 6688 was particularly helpful as I prepared this description.
Click here for article external
Reference1

Reference2

Reference3

Reference4

Software Program Stata
Software Stata v. 11
Graphical Parameters

CodeExample1 - Attachment basichist.eps: basichist.eps
CodeExample1 *******************************************************
*
*
Histogram Examples for FDA-Industry-Academia Safety Graphics WG
** Richard Forshee, FDA/CBER/OBE
**
** Last updated September 17, 2010
**
** This file benefited from Cox NJ, Speaking Stata:
** Graphing Distributions. Stata Journal 2004.
**
**********************************************************

** Generate random data from a beta distribution alpha=2, beta=5
** This set of parameters generates highly skewed data between 0 and 1

clear
set seed 85360497 // serial number from the first dollar bill in my wallet
set obs 100
gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

** Basic histograms
twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace)
twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace)
twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, ///
row(1) xsize(6) ysize(3) title("Basic Histogram Examples") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

** Histograms with overlays

** Kernel Density
twoway (histogram x, start(0)) (kdensity x), ///
title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Kernel Density")) ///
saving(over_kd, replace)

** Normal
summ x // Generate summary statistics
local m=`r(mean)' // Place mean into a local macro
local sd=`r(sd)' // Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,`m',`sd'), range(0 1)), ///
title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Normal Distribution")) ///
saving(over_normal, replace)

** Rug Plot
gen pipe = "|" // Create a vertical line symbol
gen where=-0.1 // Create a variable for vertical placement of the rug plot

** Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) ///
title("Rug Plot Overlay") ///
saving(over_rug, replace) ///
plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) ///
legend(off) plotregion(margin(medium))

graph combine over_kd.gph over_normal.gph over_rug.gph, ///
row(1) xsize(6) ysize(3) ///
title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")


** Pitfalls
** Bin width

histogram x, start(0) width(0.1) ///
title("0.1 bin width") saving(width_10, replace)
histogram x, start(0) width(0.05) ///
title("0.05 bin width") saving(width_05, replace)
histogram x, start(0) width(0.01) ///
title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, ///
title("Bin Width Can Affect the Shape of a Histogram") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100") ///
row(1) xsize(6) ysize(3)
CodeExample2 - Attachment

CodeExample2

CodeExample3 - Attachment*

CodeExample3

CodeExample4 - Attachment*

CodeExample4

CodeExample5 - Attachment*

CodeExample5

CodeExample6 - Attachment*

CodeExample6

R-Code - Attachment

R-Code

SAS-Code - Attachment

SAS-Code

Stata-Code - Attachment

Stata-Code

Other Code - Attachment

Other Code

Keywords bin size, complementary graphs, pitfalls
Disclaimer The opinions expressed in this document are those of the author and may not represent the opinions of the U.S. Food and Drug Administration or other authors.
Permission Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT OLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Reference Image Histogram_examples_reference_200.jpg
Topic revision: r16 - 19 Jun 2012 - 13:41:53 - MaryBanach
 

Copyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback