Appendix B: Fitting with few counts/bin

Cash (1979) showed that the minimization criterion is a very bad one if any of the observed data bins had few counts. A better criterion is to use a likelihood function:

where are the observed data and the values of the function. Minimizing C for some model gives the best-fit parameters. Furthermore, this statistic can be used in the same, familiar way as the statistic to find confidence intervals. One finds the parameter values that give

where N is the same number that gives the required confidence for the number of interesting parameters as for the case.

Castor (priv. comm.) has pointed out that a better function to use is :

This differs from the first function by a quantity that depends only upon the data. In the limit of a large number of counts this second function does provide a goodness-of-fit criterion similar to that of and it is now used in XSPEC. It is important to note that the C-statistic assumes that the error on the counts is pure Poisson, and thus it cannot deal with data that already has been background subtracted, or has systematic errors.

With background

Arnaud (2003, ApJ submitted) has extended the method of Cash to include the case when a background spectrum is also in use. Note that this requires the source and background spectra to both be available, it does not work on a background-subtracted spectrum.

Suppose we have an observation which produces events in the spectral bins in an exposure time of . This observation includes events from the source of interest along with background events. Further suppose that we perform a background observation which generates events in an exposure time . If the source count rate in bin i is then the new fit statistic is

where

and

The sign of d_i in f_i is chosen so that f_i > 0.

In the limit of large numbers of counts/bin a second-order Taylor expansion shows that W tends to

[sum i=1,N]([S_i-t_s*y_i-t_s*f_i]^2/(t_s*y_i+t_s*f_i) + [B_i-t_b*f_i]^2/(t_b*f_i))

which is distributed as with degrees of freedom, where the model has M parameters (including the normalization).

Practice

No background

XSPEC’s default minimization algorithm is a variant of Marquardt's method (the Levenberg-Marquadt algorithm) described in §11.5 of Data Reduction and Error Analysis for the Physical Sciences by Bevington (2002). (The reader is advised that this description is designed to be read in conjunction with Bevington.) This method is appropriate for fitting models that are twice differentiable. We illustrate the implementation of this W statistic using this method. The algorithm works by finding a matrix and a vector such that the equation :

gives sensible values of the change in parameters, , for the fitting function. Bevington §11.4 gives the derivation of a and b and shows that b is parallel to the gradient of .

Now the C-statistic has a gradient with respect to the parameters of the fitting function of :

So, following Bevington, expand about :

substitute into C and minimize with respect to the changes in the parameters :

dC/d(da_k) = 2*[sum over i][dy_0(x_i)/da_k - y_i*(y_0(x_i)+[sum over j]((dy_0(x_i)/da_j)*da_j))^-1 * dy_0(x_i)/da_k]

setting this to zero we obtain, to first order in the parameter changes,

[sum over i]((y_i/y_0(x_i)-1)*dy_0(x_i)/da_k) = [sum over j]([sum over i]((y_i/y_0^2(x_i))(dy_0(x_i)/da_j)(dy_0(x_i)/da_k)da_j))

or :

where:

These a and b then are substituted for those used in the case and the algorithm works as required. Note that is to first order in partial derivatives in y, evaluated at y₀.

There is one further difference in XSPEC between the and likelihood methods, which is caused by the fact that XSPEC uses an analytic formula for setting the model normalization. In the case, this means multiplying the current model by:

where is the error on . In the likelihood case the corresponding factor is:

With background

An analogous argument to the above can be followed through for the W statistic. We need the partial derivatives of W which are evaluated as follows.

$dW/da_j = 2*[sum over i]({t_s+g_i(t_s+t_b)-S_i(1+g_i)/(y_i+f_i)-B_i*g_i/f_i}dy_i/da_j)$

$d^2W/(da_j*da_k) = 2*[sum over i]({(t_s+t_b)h_i - S_i*h_i/(y_i+f_i) - S_i(1+g_i)^2/(y_i+f_i)^2 - B_i*h_i/f_i - B_i*g_i^2/f_i^2}(dy_i/da_j)(dy_i/da_k))$

where

and

Note that in this case there is no analytic formula that can be used to set the model normalization.

References

Arnaud, K. A. 2003, Astrophysical Journal, submitted

Bevington, P. 2002 Data Reduction and Error Analysis for the Physical Sciences Third Edition (McGraw Hill:Columbus)

Cash, W. 1979 Parameter estimation in astronomy through application of the likelihood ratio Astrophysical Journal 228, 939