Variable binned and unbinned fitting in RooFit

Kak W. 2020/03/09

This compares regular binned fit, variable binned fit, and unbinned fitting in RooFit. There are caveats in preparing histogram in a variable binned fit, and the result tends to have bias. It looks like unbinned fit is a better choice if regular binned fit causes problems.

In [1]:
import ROOT as r
from ROOT import RooFit as rf
from array import array
from time import time
%jsroot
Welcome to JupyROOT 6.18/04
In [2]:
def midline():
    line = r.TLine(-3,0,3,0)
    line.SetLineStyle(3)
    line.SetLineWidth(3)
    return line
l = midline()
def residual( frame ):
    hresid = frame.residHist()
    frame2 = x.frame(rf.Title("Residual")) 
    frame2.addPlotable(hresid,"P")
    frame2.SetYTitle("Residual")
    frame2.Draw()
    l.Draw()
    return hresid, frame2
def pull( frame ):
    hpull = frame.pullHist()
    frame3 = x.frame(rf.Title("Pull")) 
    frame3.addPlotable(hpull,"P")
    frame3.SetYTitle("Pull")
    frame3.Draw()
    l.Draw()
    return hpull, frame3
In [3]:
r.gRandom.SetSeed(int(time()))
In [4]:
xbin = [-3,-2,-1,-0.2,0.1,0.3,0.5,0.8,1,1.3,1.5,2,3]
xbinreg = [-3,-2.5,-2,-1.5,-1.,-0.5,0.,0.5,1.,1.5,2.,2.5,3.]
tbin = r.RooBinning(len(xbin)-1, array('d', xbin))
h1 = r.TH1F("h1","gaussian", len(xbin)-1, array('f', xbin))
h2 = r.TH1F("h2","gaussian", 12, -3, 3 )
## fill with fake data
h1.FillRandom("gaus")
h2.FillRandom("gaus")
h1a = h1.Clone()
h1.SetLineWidth(2)
h1a.SetLineWidth(2)
h1a.SetLineColor(2)
h2.SetLineColor(6)

Trick

Key is to normalize the histogram correctly

When plotting a pdf against a histogram, RooFit takes the bin width multiplied with sum of bin content to get the correct normlization for the pdf. So, normally, the ratio is the bin width. This is what RooFit shows on the y-axis label

For a variable binned histogram, the effective bin width is calculated by the ratio of width-weighted integral and the naive intergral (sum of bin contents). In other words, RooFit needs to normalize histogram to bin width when plotting and fitting (red line), but plot pdf according to the unrewieghted histogram (Blue line)

In [5]:
binratio = (xbin[-1]-xbin[0])*1./(len(xbin)-1)
binratio
Out[5]:
0.5
In [6]:
h1a.Scale( 1. ,"width" )
h2.Scale( 1. , "width" )
In [7]:
nratio = h1a.Integral("width")/h1a.Integral()
print "effective bin width: %g  actual number of events: %g  sum of bin content: %g"\
            %(nratio, h1a.Integral("width"), h1a.Integral())
effective bin width: 0.369736  actual number of events: 5000  sum of bin content: 13523.2
In [8]:
 h1.Integral("width")/h1.Integral()
Out[8]:
0.542479999089241
In [9]:
c=r.TCanvas()
h1a.Draw("e")  # variable binning
h1.Draw("e same")  # reweighted variable binning
h2.Draw("hist same")  # even binning


c.Draw()
In [10]:
wk = r.RooWorkspace("gaus") 
wk.factory("Gaussian::pdf(x[-3,3], mean[0,-2,2], sigma[1,0,5] )")
wk.factory("Gaussian::originalpdf(x, mean0[0], sigma0[1] )")
Out[10]:
<ROOT.RooGaussian object ("originalpdf") at 0x69260f0>
In [11]:
x = wk.var("x")
func_pdf = wk.pdf("pdf")
func_pdf_ori = wk.pdf("originalpdf")

Import histogram to RooFit. Reweighted histogram is good for visualization but RooFit likes the un-reweighted version

In [12]:
datahist  = r.RooDataHist( 'datahist',   'data', r.RooArgList(x), h1 )
datahist2 = r.RooDataHist( 'datahist2',  'data2',r.RooArgList(x), h2 )
[#1] INFO:DataHandling -- RooDataHist::adjustBinning(datahist): fit range of variable x expanded to nearest bin boundaries: [-3,3] --> [-3,3]

Generating data for unbinned fit

In [13]:
data = func_pdf.generate(r.RooArgSet(x),5000)
In [14]:
wk.Print()
RooWorkspace(gaus) gaus contents

variables
---------
(mean,mean0,sigma,sigma0,x)

p.d.f.s
-------
RooGaussian::originalpdf[ x=x mean=mean0 sigma=sigma0 ] = 1
RooGaussian::pdf[ x=x mean=mean sigma=sigma ] = 1

In [15]:
x.setRange("center", -2.5, 2.5)
x.setRange("full", -3.,3.)
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'center' created with bounds [-2.5,2.5]
[#1] INFO:Eval -- RooRealVar::setRange(x) new range named 'full' created with bounds [-3,3]

Fitting Step

We import the unreweighted histogram into RooFit and make sure it uses the bin error instead of poisson error.

In [16]:
fitresult = func_pdf.fitTo(datahist, 
               rf.SumW2Error(True),
               rf.Save(r.kTRUE),
               rf.PrintLevel(-1),
               #rf.Range("center")
              )
[#1] INFO:Minization -- createNLL: caching constraint set under name CONSTR_OF_PDF_pdf_FOR_OBS_x with 0 entries
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Fitting -- RooAbsPdf::fitTo(pdf) Calculating sum-of-weights-squared correction matrix for covariance matrix
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [17]:
wk.var("mean").Print()
wk.var("sigma").Print()
RooRealVar::mean = -0.0134654 +/- 0.0149597  L(-2 - 2) 
RooRealVar::sigma = 1.03873 +/- 0.0113845  L(0 - 5) 

Fit result with variable bins

The result is closest to correct for now with variable binning. The mean and sigma are both correct to first order, and the plotted normalization looks correct.

Still the fit biases towards the tails which have small error bars and center further away from the mean than the actual pdf. This causes mean to be off center, and larger-than-expected sigma ( to a lesser extent, a regular binned fit has the same issue, see below )

In [18]:
canvas = r.TCanvas("cfit","Fitter",800,500)
frame = x.frame( rf.Title("Variable Binning") )
datahist.plotOn( frame )
func_pdf.plotOn( frame, rf.LineColor(2))# rf.Normalization(1/1.4,r.RooAbsReal.Relative))
frame.Draw()
canvas.Draw()
In [19]:
canvas2 = r.TCanvas("cfit","Fitter",800,300)
res = residual(frame)
canvas2.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: cfit
In [20]:
pul = pull(frame)
canvas2.Draw()

Unbinned fit

An unbinned fit performs the best but it will take some time to prepare data beforehand. The fitted pdf correctly passes below the data points on the tails.

In [21]:
func_pdf.fitTo(data,
               #rf.Range("center"),
               rf.SumW2Error(True),
               rf.Save(r.kTRUE),
               rf.PrintLevel(-1),)
Out[21]:
<ROOT.RooFitResult object ("fitresult_pdf_pdfData") at 0x7037030>
[#1] INFO:Minization -- createNLL picked up cached consraints from workspace with 0 entries
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Fitting -- RooAbsPdf::fitTo(pdf) Calculating sum-of-weights-squared correction matrix for covariance matrix
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [22]:
wk.var("mean").Print()
wk.var("sigma").Print()
RooRealVar::mean = 0.0280298 +/- 0.0142249  L(-2 - 2) 
RooRealVar::sigma = 0.993054 +/- 0.0106349  L(0 - 5) 
In [23]:
canvas = r.TCanvas("cfit","Fitter",800,500)
frame = x.frame(  rf.Title("Unbinned Fit")  )
data.plotOn( frame  , rf.Binning(tbin) )
func_pdf.plotOn( frame, rf.LineColor(2) ,
                 #rf.NormRange("full"),
                  #rf.Range("full"),
               )
frame.Draw()
canvas.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: cfit
In [24]:
canvas2 = r.TCanvas("cfit","Fitter",800,300)
res = residual(frame)
canvas2.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: cfit
In [25]:
pul = pull(frame)
canvas2.Draw()

regular binning

As a comparison here is the regular way to do it with even binning

In [26]:
func_pdf.fitTo(datahist2,
               rf.SumW2Error(True),
               rf.Save(r.kTRUE),
               rf.PrintLevel(-1))
Out[26]:
<ROOT.RooFitResult object ("fitresult_pdf_datahist2") at 0x7038fe0>
[#1] INFO:Minization -- createNLL picked up cached consraints from workspace with 0 entries
[#1] INFO:Minization -- RooMinimizer::optimizeConst: activating const optimization
[#1] INFO:Fitting -- RooAbsPdf::fitTo(pdf) Calculating sum-of-weights-squared correction matrix for covariance matrix
[#1] INFO:Minization -- RooMinimizer::optimizeConst: deactivating const optimization
In [27]:
wk.var("mean").Print()
wk.var("sigma").Print()
RooRealVar::mean = 0.00287352 +/- 0.0148111  L(-2 - 2) 
RooRealVar::sigma = 1.02964 +/- 0.0112295  L(0 - 5) 
In [28]:
canvas = r.TCanvas("cfit","Fitter",800,500)
frame = x.frame(  rf.Title("Regular Binning")  )
datahist2.plotOn( frame ) 
func_pdf.plotOn( frame, rf.LineColor(2)) 
frame.Draw()
canvas.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: cfit
In [29]:
canvas2 = r.TCanvas("cfit","Fitter",800,300)
res = residual(frame)
canvas2.Draw()
Warning in <TCanvas::Constructor>: Deleting canvas with same name: cfit
In [30]:
pul = pull(frame)
canvas2.Draw()
In [31]:
!bash /home/kakw/forjupyter/jupyterconvert.sh /home/kakw/public_html/ 2020_03_09_RooFit_variablebins.ipynb 
Thu Mar 12 01:09:29 EDT 2020
[NbConvertApp] Converting notebook 2020_03_09_RooFit_variablebins.ipynb to slides
[NbConvertApp] Writing 431656 bytes to 2020_03_09_RooFit_variablebins.slides.html
[NbConvertApp] Converting notebook 2020_03_09_RooFit_variablebins.ipynb to html
[NbConvertApp] Writing 426619 bytes to 2020_03_09_RooFit_variablebins.html
In [ ]: