Think Bayes in Scala
A Scala implementation of the classes and functions used in the great book Think Bayes by Allen B. Downey, available for free (and open-source) here.
Quick start
The code in this repository is available as a library and can be used in Scala 2.11.x projects by adding the following dependency to build.sbt
:
libraryDependencies += "net.ruippeixotog" %% "think-bayes" % "0.1"
Core classes
Probability mass functions
The Pmf
class is arguably the core collection in Think Bayes, due to the latter's focus on problem solving using discrete approximations instead of continuous mathematics. The way to build a Pmf
and manipulate it is pretty simple:
scala> import thinkbayes._
import thinkbayes._
scala> val pmf = Pmf('a' -> 0.2, 'b' -> 0.2, 'c' -> 0.6)
pmf: thinkbayes.Pmf[Char] = Map(a -> 0.2, b -> 0.2, c -> 0.6)
scala> pmf.prob('a')
res0: Double = 0.2
scala> pmf.prob(_ < 'c')
res1: Double = 0.4
scala> pmf.sample()
res2: Char = c
scala> pmf.printChart()
a 0.2 ##########
b 0.2 ##########
c 0.6 ##############################
A Pmf
is implemented as an immutable map and can be used as such:
scala> pmf.size
res3: Int = 3
scala> pmf.map { case (k, v) => ((k + 1).toChar, v) }
res4: thinkbayes.Pmf[Char] = Map(b -> 0.2, c -> 0.2, d -> 0.6)
scala> pmf.filter(_._1 == 'a').normalized
res5: thinkbayes.Pmf[Char] = Map(a -> 1.0)
scala> pmf.foldLeft("")(_ + _._1)
res6: String = abc
scala> pmf.toList
res7: List[(Char, Double)] = List((a,0.2), (b,0.2), (c,0.6))
Specialized Pmf
merging methods can model more complex problems in a very concise manner:
scala> def die(n: Int) = Pmf(1 to n)
die: (n: Int)thinkbayes.Pmf[Int]
scala> die(6)
res8: thinkbayes.Pmf[Int] = Map(5 -> 0.16666666666666666, 1 -> 0.16666666666666666, 6 -> 0.16666666666666666, 2 -> 0.16666666666666666, 3 -> 0.16666666666666666, 4 -> 0.16666666666666666)
scala> die(6).mean
res9: Double = 3.5
scala> (die(6) ++ die(6)).printChart() // sum of two dice
2 0.0277 #
3 0.0555 ##
4 0.0833 ####
5 0.1111 #####
6 0.1388 ######
7 0.1666 ########
8 0.1388 ######
9 0.1111 #####
10 0.0833 ####
11 0.0555 ##
12 0.0277 #
scala> val bag = Pmf(List(die(4), die(6), die(8), die(12), die(20))) // a bag containing 5 different dice
bag: thinkbayes.Pmf[thinkbayes.Pmf[Int]] = Map(Map(5 -> 0.08333333333333333, 10 -> 0.08333333333333333, 1 -> 0.08333333333333333, 6 -> 0.08333333333333333, 9 -> ...
scala> bag.mixture.printChart() // roll of a random die from the bag
1 0.135 ######
2 0.135 ######
3 0.135 ######
4 0.135 ######
5 0.0850 ####
6 0.0850 ####
7 0.0516 ##
8 0.0516 ##
9 0.0266 #
10 0.0266 #
11 0.0266 #
12 0.0266 #
13 0.0100
14 0.0100
15 0.0100
16 0.0100
17 0.0100
18 0.0100
19 0.0100
20 0.0100
The Distributions
extension provides methods for creating common Pmf
such as Gaussian or Poisson distributions.
Bayesian suites
The implementation of Suite
provided in this library does not extend Pmf
; it is rather provided as a trait which applications can implement to model specific problems:
scala> case class Dice(hypos: Seq[Int]) extends SimpleSuite[Int, Int] {
| val pmf = Pmf(hypos) // which dice from `hypos` are we rolling?
| def likelihood(data: Int, hypo: Int) = if(hypo < data) 0 else 1.0 / hypo
| }
defined class Dice
scala> val prior = Dice(List(4, 6, 8, 12, 20))
prior: Dice = Dice(List(4, 6, 8, 12, 20))
scala> prior.printChart()
4 0.2 ##########
6 0.2 ##########
8 0.2 ##########
12 0.2 ##########
20 0.2 ##########
scala> val posterior = prior.observed(6) // after a 6 is rolled
posterior: thinkbayes.Suite[Int,Int] = thinkbayes.Suite$$anon$1@120fb03e
scala> posterior.printChart()
4 0.0
6 0.3921 ###################
8 0.2941 ##############
12 0.1960 #########
20 0.1176 #####
The same prior could be built directly with:
scala> val prior = Suite[Int, Int](Pmf(List(4, 6, 8, 12, 20))) { (d, h) =>
| if (h < d) 0 else 1.0 / h
| }
prior: thinkbayes.Suite[Int,Int]{val pmf: thinkbayes.Pmf[Int]} = thinkbayes.Suite$$anon$1@130dd39f
Multiple observations can be given to the Suite
in bulk, which can yield results more stable numerically:
scala> posterior.observed(6, 8, 7, 7, 5, 4).printChart()
4 0.0
6 0.0
8 0.9432 ###############################################
12 0.0552 ##
20 0.0015
Cumulative distribution functions
A Cdf
can be created just like a Pmf
. It supports efficient querying for the cumulative probability on a given value (prob
) and for the value at a given percentile (value
):
scala> val cdf = Cdf('a' -> 0.2, 'b' -> 0.2, 'c' -> 0.6)
cdf: thinkbayes.Cdf[Char] = CategoricalCdf(Vector((a,0.2), (b,0.4), (c,1.0)))
scala> cdf.prob('b')
res10: Double = 0.4
scala> cdf.value(0.5)
res11: Char = c
scala> cdf.value(0.35)
res12: Char = b
scala> cdf.printChart()
a 0.2 ##########
b 0.4 ####################
c 1.0 ##################################################
Unlike Pmf
, Cdf
does not implement the Map
trait and, therefore, does not inherit the common Scala collection methods. If you need to use those, you can convert easily a Cdf
to and from a Pmf
:
scala> cdf.toPmf
res13: thinkbayes.Pmf[Char] = Map(a -> 0.2, b -> 0.2, c -> 0.6)
scala> cdf.toPmf.toCdf
res14: thinkbayes.Cdf[Char] = CategoricalCdf(Vector((a,0.2), (b,0.4), (c,1.0)))
Probability density functions
A Pdf
can be created from a Scala real-valued function and provides a density
method for calculating the density at a given value:
scala> val pdf = Pdf { x => math.max(-x * x + 1, 0) }
pdf: thinkbayes.Pdf = thinkbayes.Pdf$$anon$3@744cb6e3
scala> pdf.density(0)
res15: Double = 1.0
scala> pdf.density(0.5)
res16: Double = 0.75
A BoundedPdf
is a Pdf
whose domain has known lower and upper bounds.
scala> val bpdf = Pdf(-1.0, 1.0) { x => math.max(-x * x + 1, 0) }
bpdf: thinkbayes.BoundedPdf{val lowerBound: Double; val upperBound: Double} = thinkbayes.Pdf$$anon$2@397820d5
Both can be converted to a Pmf
given a range or sequence of discrete values to compute. A BoundedPdf
can alternatively be given a step value only. In both cases, the probabilities of the returned Pmf
are normalized:
scala> pdf.toPmf(0.0 to 1.0 by 0.1).printChart()
0.0 0.1398 ######
0.1 0.1384 ######
0.2 0.1342 ######
0.30000000000000004 0.1272 ######
0.4 0.1174 #####
0.5 0.1048 #####
0.6000000000000001 0.0895 ####
0.7000000000000001 0.0713 ###
0.8 0.0503 ##
0.9 0.0265 #
1.0 0.0
scala> bpdf.toPmf(0.2).printChart()
-1.0 0.0
-0.8 0.0545 ##
-0.6 0.0969 ####
-0.3999999999999999 0.1272 ######
-0.19999999999999996 0.1454 #######
0.0 0.1515 #######
0.20000000000000018 0.1454 #######
0.40000000000000013 0.1272 ######
0.6000000000000001 0.0969 ####
0.8 0.0545 ##
1.0 0.0
The Distributions
extension provides methods for creating common Pdf
such as Gaussian or Exponential distributions.
Extensions
This library was designed such that only the core operations needed for the creation and manipulation of the structures presented above are included in the class themselves. Additional features can be added by importing modules from the package extensions
.
Plotting
The Plotting
module provides support for graphical plotting, leveraging the powerful JFreeChart library with a custom theme. Pmf
, Suite
, Cdf
and BoundedPdf
instances can be plotted, as long as their keys have an Ordering
(for plotting bar charts) or Numeric
(for plotting XY line charts) implicit in scope:
scala> import thinkbayes.extensions.Plotting._
import thinkbayes.extensions.Plotting._
scala> val xyChart = bpdf.plotXY("-x^2 + 1")
xyChart: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@290e640d
scala> val barChart = prior.plotBar("prior")
barChart: scalax.chart.CategoryChart = scalax.chart.ChartFactories$BarChart$$anon$3@5c3e1ebe
New series can be added to a previously created chart. This is useful for comparing differences between two distributions or Bayesian suites:
scala> posterior.plotBarOn(barChart, "after a 6 is rolled")
res17: barChart.type = scalax.chart.ChartFactories$BarChart$$anon$3@5c3e1ebe
Other attributes of the chart, such as the title and the axis labels, can be optionally specified.
Distributions
The Distributions
module provides integration with the distribution implementations from Apache Commons Math, as well as several methods for creating Pmf
and Pdf
instances for common distributions:
scala> import thinkbayes.extensions.Distributions._
import thinkbayes.extensions.Distributions._
scala> poissonPmf(3.0).plotBar("")
res18: scalax.chart.CategoryChart = scalax.chart.ChartFactories$BarChart$$anon$3@6736cd9d
scala> val tri: Pdf = new org.apache.commons.math3.distribution.TriangularDistribution(0.0, 0.5, 2.0)
tri: thinkbayes.Pdf = thinkbayes.extensions.Distributions$$anon$1@7b5cdeb6
scala> tri.bounded(0.0, 2.0).plotXY("")
res19: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@55a7c8a3
Finally, we can estimate a Pdf
from a sequence of samples using kernel density estimation:
scala> estimatePdf(Seq(1, 2, 2, 4, 4, 4, 9, 9, 9, 9, 11, 11, 15, 19)).bounded(0, 20).plotXY("")
res20: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@1c15725
Stats
The Stats
module is a simple extension that provides the calculation of quantiles and credible intervals to Pmf
and Cdf
instances:
scala> import thinkbayes.extensions.Stats._
import thinkbayes.extensions.Stats._
scala> normalPmf(2.5, 1.5).quantile(0.5)
res21: Double = 2.5
scala> normalPmf(0.0, 1.0).credibleInterval(0.9)
res22: (Double, Double) = (-1.6440000000000001,1.6440000000000001)
Sampling
Using Pmf
merging methods such as mixture
or join
yield results as accurate as they can be, but they are also computationally expensive. The Sampling
module aims to provide probabilistic alternatives based on sampling, which can be the only choice for large Pmf
:
scala> val dieList = Seq.fill(100)(die(6)) // a hundred dice
dieList: Seq[thinkbayes.Pmf[Int]] = List(Map(5 -> 0.16666666666666666, 1 -> 0.16666666666666666, 6 -> 0.16666666666666666, 2 -> 0.16666666666666666, 3 -> 0.1666666666666666, 4 -> 0.16666666666666666),...
scala> val xyChart = dieList.reduce(_ ++ _).plotXY("exact")
xyChart: scalax.chart.XYChart = scalax.chart.ChartFactories$XYLineChart$$anon$17@30015846
scala> sampleSum(dieList, 10000).plotXYOn(xyChart, "sampled")
res23: xyChart.type = scalax.chart.ChartFactories$XYLineChart$$anon$17@81f0a53
Examples
A number of examples and problems explored throughout Think Bayes are implemented in the package examples
in the test directory. They are always accompanied by the original problem description and I made an effort to make the steps of each problem as clear as possible.
Copyright
Copyright (c) 2014-2017 Rui Gonçalves. See LICENSE for details.