Linear Regression and Bayesian Inference: An Astrophysics Perspective
I recently had the excellent experience of being a mentor at a machine learning + astrophysics summer school for undergraduate students at l’Université de Montréal (https://www.astro.umontreal.ca/astromatic/). As part of my duties (which were really a pleasure), I was asked to put together a hands-on example for students to learn about linear regression and Bayesian inference. Enjoy!!!
In this hands-on example, we are going to use two different techniques to determine the best-fit parameters and their associated errors for the famous M-sigma relation. This relation maps an easily observable quantity, the velocity dispersion of a galaxy, to a non-observable — the mass of the galaxy’s central supermassive black hole. We are going to be using two methodologies: 1) a classic linear regression approach and 2) a Bayesian inference approach. Throughout this notebook, you will be exploring how we go about applying these techniques and what they mean for our analysis.
Pre-requisite: Imports
Part 1: Familiarization with the data
In this section, we are going to familiarize ourselves with the data. This portion will be so that we can all have a baseline understanding of the data.
We are going to be using data from the 2009 paper entitled: THE M–σ AND M–L RELATIONS IN GALACTIC BULGES, AND DETERMINATIONS OF THEIR INTRINSIC SCATTER. In this paper, the authors fit a functional form of the relationship between the mass of the supermassive black hole and the velocity dispersion of stars in galaxies. The functional form is the following:
If, however, we simply take this equation, we can ask ourselves what values of α and β best fit the data. Also, we can determine the errors on these values. Let’s take a look at the data which was taken directly from the paper.
So we have fifty galaxies in our sample. All masses are quoted in terms of solar masses (M_⊙), all dispersions in terms of km/s, and all distances in kiloparsecs. What does this look like if we plot the masses vs the dispersions including errors?
Part 2: Data Analysis
In this section, we will explore how to use linear regression and bayesian inference to set estimates with errors on the parameters governing the M-sigma relation
In this example, we are going to assume that our cost function is the *mean square error* (MSE) which has the following form:
where $f_{\theta}(x)$ is the assumed functional form of the relationship between $x$ and $y$ evaluated for some values of $\theta$ where $\theta$ represents our set of variables. $m$ is the number of data points in our sample.
We will be using the *scikit-learn* implementation called by **mean_squared_error**.
Since you have already seen the theory of linear regression, we will simply be using a commonly-used implementation of linear regression that is present in *sklearn*.
Let’s see how well we did! We can take a look at the reduced chi-squared value which is hopefully near-ish to one. We can then plot our line of best fit and see how it worked!
coefficient of determination: 0.698146674681597
$\alpha$ : -8.20E-01
$\beta$ : 3.91E+00
Great!!! So we can see that our analysis did a pretty decent job but note that we didn’t take errors into account! We will do this using Bayesian Inference via the package *emcee*.
In order to run the Bayesian analysis, we need to define our likelihood and our prior functions. Let’s get down to it!
Finally, we can define our initial values and walk! We are going to take our solution from a standard linear regression as our initial guesses
Additionally, we can use the corner package to see the posterior functions.
Part 3: Final Results
Awesome! Let’s see how this looks in our M-$\sigma$ plot.
And there you go!
I hope this was an interesting and instructive example of how to use linear regression and include errors (note I only added the y-axis errors) in calculations to obtain the best-fit parameters and their uncertainties!