Tags:
create new tag
, view all tags, tagging instructions

Type of Graph:

Last updated by ___ on __

Type of data:

Type of analysis:

Description and purpose:

Examples:

Basic Examples - Attached

Complementary Graphs - Attached

Potential pitfalls - Attached

Reference:

Code ( ):


Histograms

Last updated by Richard Forshee on September 17, 2010

Type of data: continuous

Type of analysis: univariate

Description and purpose:

Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.

Examples:

All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

Stem-and-Leaf Plot of Randomly Generated Data Used for Examples plot in units of 0.01

  0*   | 334
  0.   | 6778889
  1*   | 011112333344
  1.   | 5555667889
  2*   | 0123333444
  2.   | 55666678999
  3*   | 00122222222334
  3.   | 55566889999
  4*   | 011114
  4.   | 5567789
  5*   | 1223
  5.   | 5788
  6*   | 1

Basic Histogram Examples

pic1.png

Complementary Graphs:

Kernel density overlays
Theoretical distribution overlays
Rug plots
pic2.png

Potential pitfalls

Visual representation is very sensitive to the choice of bin size pic3.png

Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 6688 was particularly helpful as I prepared this description.

Code (Stata 11):

**********************************************************
** 
**   Histogram Examples for FDA-Industry-Academia Safety Graphics WG
**   Richard Forshee, FDA/CBER/OBE
**
**   Last updated September 17, 2010
**
**   This file benefited from Cox NJ, Speaking Stata: 
**	 Graphing Distributions.  Stata Journal 2004.
**
**********************************************************

**  Generate random data from a beta distribution alpha=2, beta=5
**  This set of parameters generates highly skewed data between 0 and 1

clear
set seed 85360497 // serial number from the first dollar bill in my wallet
set obs 100
gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

**  Basic histograms
twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace)
twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace)
twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, ///
	row(1) xsize(6) ysize(3) title("Basic Histogram Examples") ///
	subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

**  Histograms with overlays

**  Kernel Density
twoway (histogram x, start(0)) (kdensity x), ///
	title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
	legend(order(2) label(2 "Kernel Density")) ///
	saving(over_kd, replace)

**  Normal
summ x				// Generate summary statistics
local m=`r(mean)'	// Place mean into a local macro
local sd=`r(sd)'	// Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,`m',`sd'), range(0 1)), ///
	title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
	legend(order(2) label(2 "Normal Distribution")) ///
	saving(over_normal, replace)

**  Rug Plot
gen pipe = "|"  // Create a vertical line symbol
gen where=-0.1  // Create a variable for vertical placement of the rug plot

**  Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) ///
	title("Rug Plot Overlay") ///
	saving(over_rug, replace) ///
	plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) ///
	legend(off) plotregion(margin(medium)) 
	
graph combine over_kd.gph over_normal.gph over_rug.gph, ///
	row(1) xsize(6) ysize(3) ///
	title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") ///
	subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

**  Pitfalls
**  Bin width

histogram x, start(0) width(0.1) ///
	title("0.1 bin width") saving(width_10, replace)
histogram x, start(0) width(0.05) ///
	title("0.05 bin width") saving(width_05, replace)
histogram x, start(0) width(0.01) ///
	title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, ///
	title("Bin Width Can Affect the Shape of a Histogram") ///
	subtitle("Randomly generated data, Beta(2,5) distribution, n=100") ///
	row(1) xsize(6) ysize(3)

Topic revision: r1 - 02 Feb 2011 - 20:28:03 - MaryBanach
 

Copyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback