Tags:
create new tag
, view all tags, tagging instructions

# Histograms

( Last updated by Richard Forshee on September 17, 2010)

# Description and purpose:

Histograms are used to represent the distribution of a single continuous variable. A histogram groups individual observations into bins (mouseover to define) of a specified (usually equal) width and counts the number of observations in each bin. Rectangles are drawn so that the height (or width for a horizontal histogram) represents the frequency, percentage, or density of the number of observations in each bin. By convention, the rectangles in a histogram touch one another.

Histograms are distinct from bar charts (link). Bar charts are for categorical data and by convention the rectangles in a bar chart do not touch.

# Examples:

All examples use 100 data points that were randomly generated from a Beta(2,5) distribution. The Beta(2,5) is a skewed distribution that is bounded between 0 and 1. The actual data is shown in a stem-and-leaf plot.

## Stem-and-Leaf Plot of Randomly Generated Data Used for Examples plot in units of 0.01

```  0*   | 334
0.   | 6778889
1*   | 011112333344
1.   | 5555667889
2*   | 0123333444
2.   | 55666678999
3*   | 00122222222334
3.   | 55566889999
4*   | 011114
4.   | 5567789
5*   | 1223
5.   | 5788
6*   | 1```

## Complementary Graphs - Attached as JPEG file

Kernel density overlays
Theoretical distribution overlays
Rug plots

## Potential pitfalls - Attached as JPEG file

Visual representation is very sensitive to the choice of bin size

Reference: These concepts have been discussed by many authors, but Cox NJ, Speaking Stata: Graphing distributions, The Stata Journal (2004) 4, Number 1, pp. 66–88 was particularly helpful as I prepared this description.

# Code (Stata 11):

## Highlighted Code

|

```********************************************************
* Histogram Examples for FDA-Industry-Academia Safety Graphics WG
* Richard Forshee, FDA/CBER/OBE
*
* Last updated September 17, 2010
*
* This file benefited from Cox NJ, Speaking Stata:
* Graphing Distributions. Stata Journal 2004.
*
*********************************************************

* Generate random data from a beta distribution alpha=2, beta=5
* This set of parameters generates highly skewed data between 0 and 1

clear set seed 85360497 // serial number from the first dollar bill in my wallet set obs 100 gen x = rbeta(2,5)

label var x "Response Variable, arbitrary scale of 0-1"

stem x, round(0.01)

** Basic histograms twoway histogram x, title("Frequency") freq start(0) saving(basic_freq, replace) twoway histogram x, title("Percentage") percent start(0) saving(basic_perc, replace) twoway histogram x, title("Density") start(0) saving(basic_dens, replace)

graph combine basic_freq.gph basic_perc.gph basic_dens.gph, /// row(1) xsize(6) ysize(3) title("Basic Histogram Examples") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

** Histograms with overlays

** Kernel Density twoway (histogram x, start(0)) (kdensity x), /// title("Kernel Density Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Kernel Density")) /// saving(over_kd, replace)

** Normal summ x // Generate summary statistics local m=`r(mean)' // Place mean into a local macro local sd=`r(sd)' // Place standard deviation into a local macro

twoway (histogram x, start(0)) (function y=normalden(x,`m',`sd'), range(0 1)), /// title("Normal Distribution Overlay") xtitle("Response Variable, arbitrary scale of 0-1") /// legend(order(2) label(2 "Normal Distribution")) /// saving(over_normal, replace)

** Rug Plot gen pipe = "|" // Create a vertical line symbol gen where=-0.1 // Create a variable for vertical placement of the rug plot

** Histogram with a scatter plot underneath to produce rug plot

histogram x, start(0) /// title("Rug Plot Overlay") /// saving(over_rug, replace) /// plot(scatter where x, ms(none) mlabel(pipe) mlabpos(0)) /// legend(off) plotregion(margin(medium))

graph combine over_kd.gph over_normal.gph over_rug.gph, /// row(1) xsize(6) ysize(3) /// title("Histograms with Kernel Density, Normal Distribution, and Rug Plot Overlays") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100")

* Pitfalls * Bin width

histogram x, start(0) width(0.1) /// title("0.1 bin width") saving(width_10, replace) histogram x, start(0) width(0.05) /// title("0.05 bin width") saving(width_05, replace) histogram x, start(0) width(0.01) /// title("0.01 bin width") saving(width_01, replace)

graph combine width_10.gph width_05.gph width_01.gph, /// title("Bin Width Can Affect the Shape of a Histogram") /// subtitle("Randomly generated data, Beta(2,5) distribution, n=100") /// row(1) xsize(6) ysize(3)
```

## Disclaimer

DISCLAIMER: The views expressed within CTSpedia are those of the author and must not be taken to represent policy or guidance on the behalf of any organization or institution with which the author is affiliated.

## Permission

PERMISSION: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Topic revision: r10 - 09 Jul 2012 - 14:47:49 - MaryBanach

Copyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding CTSPedia? Send feedback