Regression is a popular approach to modelling where a response variable is modelled as a function of certain predictors – to understand the relations between variables. I used a linear model in a previous post, using the bread and peace model – and various ways to solve the equation.

In this post, I want to fit a 2 level linear model, using the concept of Random Effects modelling. There is a lot of literature available on this subject, and some references are shown below:

[1] lme4: Mixed-effects modelling with R. Douglas M. Bates (2010). link

[2] West, B. T., Welch, K. B., Ga, A. T., & Crc, H. (2007). LINEAR MIXED MODELS A Practical Guide Using Statistical Software. Statistics in Medicine (Vol. 27). http://doi.org/10.1002/sim.3167

[3] Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2013). Bayesian Data Analysis, Third Edition (Texts in Statistical Science). Book.

The code in this post can be seen on the github page here.

The simulated data in this example consists of 30 observations of a * Normal* distributed variable (so in the generalised linear modelling speak, the link function is

*). These 30 observations can be sub-grouped into 5 individual groups. We assume that there is a common relationship between these 5 groups (e.g. 5 Machines of same type; 5 Persons from same cohort;). This grouping of course requires some domain knowledge and common sense. In ref [3] there is a nice introductory chapter on hierarchical modelling which is a recommended read – in particular the concept of exchangeability. The data also has one predictor variable.*

**Identity**The figure shows the data in 3 panels: *Top Left* shows the data at the population level (or a higher level – where all observations are part of one big group); *Top Right* shows the same data but the 5 sub-groups are coloured differently; *Bottom* panel shows the 5 groups plotted separately on each panel (Ref [1] has some nice examples of how to plot structured data in various ways). Looking at this figure, we can see that the slope in each group is about equal, however the intercept for each group is different. We would like to consider this in our modelling and model the group intercepts, randomly distributed around the central intercept.

################### first use normal regression and lme4 package fit.lm = lm(resp ~ pred, data=dfData) summary(fit.lm) Call: lm(formula = resp ~ pred, data = dfData) Residuals: Min 1Q Median 3Q Max -5.3043 -1.6194 -0.4117 1.5824 6.7582 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.28521 0.92127 2.48 0.0194 * pred 1.50345 0.08684 17.31 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.616 on 28 degrees of freedom Multiple R-squared: 0.9146, Adjusted R-squared: 0.9115 F-statistic: 299.7 on 1 and 28 DF, p-value: < 2.2e-16 # residual se = 2.616 #################################################### #### fit a 2 level model with random intercept library(lme4) fit.lme = lmer(resp ~ pred + (1 | group), data=dfData, REML=F) summary(fit.lme) Linear mixed model fit by maximum likelihood ['lmerMod'] Formula: resp ~ pred + (1 | group) Data: dfData AIC BIC logLik deviance df.resid 141.3 146.9 -66.6 133.3 26 Scaled residuals: Min 1Q Median 3Q Max -1.5264 -0.7096 -0.2699 0.4328 2.6193 Random effects: Groups Name Variance Std.Dev. group (Intercept) 2.664 1.632 Residual 3.777 1.943 Number of obs: 30, groups: group, 5 Fixed effects: Estimate Std. Error t value (Intercept) 1.89979 1.00879 1.883 pred 1.54593 0.06604 23.408 Correlation of Fixed Effects: (Intr) pred -0.594 ## random effect or group level sd = 1.632 ## population level sd = 1.9434 ## total = 1.6322+1.9434 = 3.5756

The interesting numbers I would like to look at are coefficients for the population (higher level group) in a simple linear model and a model with random effects – the slope coefficients are about the same but the intercepts are different in both models. Furthermore, the second (which is perhaps more important part) set of numbers are the amount of residual noise unexplained. The simpler model has a residual standard deviation of 2.6, while the 2 level model has 2 noise terms: Random effect level standard deviation 1.6 and remaining population level standard deviation 1.9 (which is smaller than the simpler model). So this suggests that you can explain more of your sampling noise in your data by using a Random effects model and accounting for the structure of the data.

Now lets break this model down into parts and write our own models, the first one is a MCMC based approach using Stan, and the second one is using an optimisation with resampling based approach. In both cases you are trying to estimate multiple parameters in a parameter space, and you hope to converge to a point.

The code for the Stan model is in the same directory of the repository at github – and is structurally very similar to the R code shown below, which defines the log posterior function (see the post on parameter estimation for details).

mylogpost = function(theta, data){ ## data resp = data$resp # resp mModMatrix = data$mModMatrix ##### this mapping variable maps each group level intercept to each data point #### at the likelihood level groupIndex = data$groupIndex ## mapping variable to map each random effect to its respective response variable ## parameters to track/estimate sigmaRan = exp(theta['sigmaRan']) # random effect scale/sd sigmaPop = exp(theta['sigmaPop']) # population level sd betas = theta[grep('betas', names(theta))] # vector of betas i.e. regression coefficients for population iGroupsJitter = theta[grep('ran', names(theta))]# random effects jitters for the group deflections ## random effect jitter for the population intercept # each group contributes a jitter centered on 0 # population slope + random jitter ivBetaRand = betas[1] + iGroupsJitter # create a matrix of betas with the new interceptr/unique intercept for each random effect ivIntercept = ivBetaRand[groupIndex] # expand this intercept using the mapping variable iFitted = as.matrix(mModMatrix[,2:ncol(mModMatrix)]) %*% betas[2:ncol(mModMatrix)] iFitted = ivIntercept + iFitted # using identity link so no transformation required ## priors # sigma priors lhp = dunif(sigmaRan, 0, 2, log=T) + dunif(sigmaPop, 0, 5, log=T) # random effects prior lran = sum(dnorm(iGroupsJitter, 0, sigmaRan, log=T)) # priors for the betas lp = sum(dcauchy(betas, 0, 10, log=T)) # write the likelihood function lik = sum(dnorm(resp, iFitted, sigmaPop, log=T)) val = lhp + lran + lp + lik return(val) }

Some points about this code snippet:

- sigmaRan is the standard deviation for the group level. All groups are assumed to have the same noise centred on zero. This is a general assumption and may not be true if you have different types of groups. If you feel brave, you can assign a different distribution to each type of group.
- sigmaPop is the population distribution at the likelihood level.
- I have set the priors for the sigmaRan and sigmaPop as uniform 0 to 2 and 0 to 5 respectively. This is not necessary but helps the convergence process, if you are having problems with convergence.
- The population level regression coefficients are given a 0 centred Cauchy distribution.
- We are tracking 9 parameters in this optimisation problem: 2 standard deviations, 2 regression coefficients and 5 group level deflections.

fit.lap = mylaplace(mylogpost, start, lData) ### lets use the results from laplace and SIR sampling with a t proposal density ### to take a sample tpar = list(m=fit.lap$mode, var=fit.lap$var*2, df=4) ## get a sample directly and using sir (sampling importance resampling with a t proposal density) s = sir(mylogpost, tpar, 5000, lData) apply(s, 2, mean)[-c(1:2)] betas1 betas2 ran1 ran2 ran3 ran4 ran5 1.8598225 1.5462439 0.3754429 0.7316874 -0.5849880 1.7242024 -2.1096244 exp(apply(s, 2, mean)[1:2]) sigmaPop sigmaRan 2.082105 1.351142

We use Sampling importance resampling (SIR) algorithm using a t proposal density to generate a random sample for our parameters using the mode and covariance matrix calculated earlier with the function *mylaplace*.

I show the results from all 4 methods below, and you can see they are pretty similar.

############### you can compare the results from stan and lmer and the logposterior function # Inference for Stan model: linearRegressionRandomEffects. # 4 chains, each with iter=5000; warmup=2500; thin=1; # post-warmup draws per chain=2500, total post-warmup draws=10000. # # mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat # betas[1] 1.775 0.038 1.620 -1.736 0.935 1.841 2.738 4.786 1853 1.000 # betas[2] 1.549 0.001 0.070 1.411 1.503 1.549 1.597 1.685 5649 1.001 # sigmaRan 2.961 0.051 2.471 0.875 1.728 2.414 3.487 8.093 2377 1.001 # sigmaPop 2.094 0.005 0.327 1.563 1.863 2.056 2.280 2.843 4928 1.001 # rGroupsJitter[1] 0.503 0.036 1.623 -2.559 -0.419 0.434 1.326 3.996 2083 1.000 # rGroupsJitter[2] 0.901 0.037 1.635 -2.068 -0.055 0.808 1.733 4.493 1997 1.000 # rGroupsJitter[3] -0.636 0.036 1.618 -3.838 -1.516 -0.646 0.225 2.813 2036 1.000 # rGroupsJitter[4] 2.140 0.036 1.660 -0.808 1.131 2.013 3.008 5.892 2077 1.000 # rGroupsJitter[5] -2.448 0.035 1.652 -5.842 -3.378 -2.406 -1.502 0.792 2179 1.000 # > summary(fit.lme) # Linear mixed model fit by maximum likelihood ['lmerMod'] # Formula: resp ~ pred + (1 | group) # Data: dfData # # Random effects: # Groups Name Variance Std.Dev. # group (Intercept) 2.664 1.632 # Residual 3.777 1.943 # Number of obs: 30, groups: group, 5 # # Fixed effects: # Estimate Std. Error t value # (Intercept) 1.89979 1.00879 1.883 # pred 1.54593 0.06604 23.408 # # Random effect jitters from lme # > t(ranef(fit.lme)$group) # 1 2 3 4 5 # (Intercept) 0.3888984 0.7627657 -0.6863616 1.94236 -2.407663 # > fit.lap$mode[-c(1:2)] # betas1 betas2 ran1 ran2 ran3 ran4 ran5 # 1.9037216 1.5456933 0.3897490 0.7787912 -0.7075750 1.8900023 -2.3491993 # > exp(fit.lap$mode)[1:2] # sigmaPop sigmaRan # 1.805638 1.463412 ## SIR sampling - these will slighly change each time you run it # > apply(s, 2, mean)[-c(1:2)] # betas1 betas2 ran1 ran2 ran3 ran4 ran5 # 1.8598225 1.5462439 0.3754429 0.7316874 -0.5849880 1.7242024 -2.1096244 # > exp(apply(s, 2, mean)[c(1:2)]) # sigmaPop sigmaRan # 2.082105 1.351142

However in problems with many parameters and groups, the optimisation based approach may not converge, so optimisation+SIR, Stan or lmer based approach may be required – or a different optimization library required.

## Model Quality Scores

In a previous post we calculated some model quality scores. However I had been struggling in calculating the Log Likelihood and AIC for a Random effects model – due to a mistake I was making in the calculations. Eventually I figured it out with help from another post I found here.

## first write the log predictive density function lpd = function(theta, data){ betas = theta # vector of betas i.e. regression coefficients for population ## data resp = data$resp # resp mModMatrix = data$mModMatrix group = data$group sigmaRan = exp(data$sigmaRan) # random effect scale/sd sigmaPop = exp(data$sigmaPop) # population level sd # calculate fitted value iFitted = mModMatrix %*% betas ######## perform notational acrobatics # construct the variance covariance matrix z = model.matrix(resp ~ 0 + group) zt = t(z) mRan = diag(sigmaRan^2, nrow=nlevels(group), ncol=nlevels(group)) mRan = z %*% mRan %*% zt mPop = diag(sigmaPop^2, nrow=length(resp), ncol=length(resp)) mCov = mRan + mPop ## likelihood function with posterior theta return(dmnorm(resp, iFitted, mCov, log=T)) } lData$group = dfData$group lData$sigmaRan = log(1.632) lData$sigmaPop = log(1.943) theta = fixef(fit.lme) lpd(theta, lData) [1] -66.63871 ## compare with lme logLik(fit.lme) 'log Lik.' -66.63871 (df=4)

The code above shows that to calculate these scores, we need the log predictive density function and to calculate the log likelihood correctly we need to use a multivariate normal density with a covariance matrix that accounts for population level variance and group level variance.

I have been looking into calculating WAIC for structured models or doing cross-validation (which is on my todo list) – however this can get tedious when you have to take into account the structure of the groups. I will try and cover that in a post at sometime in the future.