Tags:
create new tag
, view all tags, tagging instructions

Reference:

### Description and purpose:

Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.

### Examples:

All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

Stem-and-Leaf Plot of Randomly Generated Data Used for Examples plot in units of 0.01

```  0*   | 334
0.   | 6778889
1*   | 011112333344
1.   | 5555667889
2*   | 0123333444
2.   | 55666678999
3*   | 00122222222334
3.   | 55566889999
4*   | 011114
4.   | 5567789
5*   | 1223
5.   | 5788
6*   | 1
```

### Basic Histogram Examples ### Complementary Graphs:

Kernel density overlays
Theoretical distribution overlays
Rug plots ### Potential pitfalls

Visual representation is very sensitive to the choice of bin size Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 6688 was particularly helpful as I prepared this description.

### Code (Stata 11):

```**********************************************************
**
**   Histogram Examples for FDA-Industry-Academia Safety Graphics WG
**   Richard Forshee, FDA/CBER/OBE
**
**   Last updated September 17, 2010
**
**   This file benefited from Cox NJ, Speaking Stata:
**	 Graphing Distributions.  Stata Journal 2004.
**
**********************************************************

**  Generate random data from a beta distribution alpha=2, beta=5
**  This set of parameters generates highly skewed data between 0 and 1

clear
set seed 85360497 // serial number from the first dollar bill in my wallet
set obs 100
gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

**  Basic histograms
twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace)
twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace)
twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, ///
row(1) xsize(6) ysize(3) title("Basic Histogram Examples") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

**  Histograms with overlays

**  Kernel Density
twoway (histogram x, start(0)) (kdensity x), ///
title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Kernel Density")) ///
saving(over_kd, replace)

**  Normal
summ x				// Generate summary statistics
local m=`r(mean)'	// Place mean into a local macro
local sd=`r(sd)'	// Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,`m',`sd'), range(0 1)), ///
title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") ///
legend(order(2) label(2 "Normal Distribution")) ///
saving(over_normal, replace)

**  Rug Plot
gen pipe = "|"  // Create a vertical line symbol
gen where=-0.1  // Create a variable for vertical placement of the rug plot

**  Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) ///
title("Rug Plot Overlay") ///
saving(over_rug, replace) ///
plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) ///
legend(off) plotregion(margin(medium))

graph combine over_kd.gph over_normal.gph over_rug.gph, ///
row(1) xsize(6) ysize(3) ///
title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

**  Pitfalls
**  Bin width

histogram x, start(0) width(0.1) ///
title("0.1 bin width") saving(width_10, replace)
histogram x, start(0) width(0.05) ///
title("0.05 bin width") saving(width_05, replace)
histogram x, start(0) width(0.01) ///
title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, ///
title("Bin Width Can Affect the Shape of a Histogram") ///
subtitle("Randomly generated data, Beta(2,5) distribution, n=100") ///
row(1) xsize(6) ysize(3)
```

Topic revision: r1 - 02 Feb 2011 - 20:28:03 - MaryBanach

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback