We just created, in the previous post, a simple univariate, multivariable regression function ** ks_lm0** that calculates regression slopes and intercept.

However, it would be interesting not only to find out what our quotient (or factor) space X/Y would be (which is defined by the *bi (i !=0 ) *regression* *coefficients), and what particular equivalence relation R would be of our dataset A with its image B on Y*l* we project it to, and what the projection function (**T** or **R**) we have to Y*l* and Y*0* spaces respectively, but also look at what our projected dataset B becomes in the new space of the reduced dimensionality Yl (or the same in Y0, because we found out before that in the linear space **d**=**f**). Actually that is the main goal of the whole exercise we just done – we projected our data from the space with the less “convenient” structure into one with a more “convenient” (with the reduced dimensionality) structure. BTW, that thing is a part of *Factor Analysis *(no surprise we are dealing with factor spaces).

What we want to find (see previous chapter) is the transform **M**. We just found (in ** ks_lm0**) the above mentioned transform

**R**(which is the diagonal matrix with one row made from

*bi*coefficients (

*i*!=

*0*)), and we know that

**NM**=

**R**, i.e.

**M**=

**N**-1

**R**. In general case, if we want to rotate

**uv**basis relative to the

**ij**one (BTW, which is a part of

*Principal Component Analysis*)

**N**-1 transform would be all rows, except one 0 row matrix, but we won’t do it now, and save it for later, and just declare that basis for our Space of new “synthetic variables” Yl would be

**u**=

*a*

**i**+

*b*

**k**,

**v**=

*c*

**j**+

*d*

**k**, and even

*a=b=c=d=1*. I.e. we do not rotate new basis (yet), and do not significantly scale the unit vectors length (only do that trigonometrically). Therefore

**N**-1:

(1 0 … 1)

(0 1 … 1)

(. . … .)

(0 0 … 0)

So, let us stop for a moment at a more visually “comprehendable” 3D representation of our data, where we look only at glucosylated hemoglobin (glyhb), cholesterol/hdl (rati)o and systolic blood pressure (bp.1s) dimensions, and see how our data would look in 3D and 2D representation, after we have collapsed glyhb dimension, and created 2 synthetic variables of the linear combination of glyhb & ratio and glyhb & bp.1s. To do that we will enhance our new ** ks_lm** function to not only calculate slopes and interception as in

**, but also transformation matrices**

*ks_lm0***R**and

**N**-1, and the very transformed datasets Xl and Yl.

First, we use *lattice* package for 3D scatter plot of the original (blue) data X in coordinates V3=glyhb, V1=ration, V2=bp.1s, and projected (magenta) data Xl into quotient space member Yl (regression plane) and then converted back to the X space basis:

*load(url(“http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/diabetes.sav”))*

*ks_model <- ks_lm( diabetes, c(“glyhb”), c(“ratio”,”bp.1s”) )*

*xdf <- as.data.frame(t(ks_model$X))*

*xdf[, “V4”] <- rep_len(1, nrow(ydf))*

*xldf <- as.data.frame(t(ks_model$Xl))*

*xldf[, “V4”] <- rep_len(2, nrow(ydf))*

*xxldf <- rbind(xdf, xldf)*

*cloud(V3 ~ V1*V2, xxldf, groups=V4)*

And then, we use *ggplot2* package to draw a 2D scatter plot of our new synthetic variables in the Yl space:

*ydf <- as.data.frame(t(ks_model$Y))*

*ydf[, “V3”] <- rep_len(2, nrow(ydf))*

*xdf <- as.data.frame(t(ks_model$X))*

*xdf[, “V3”] <- rep_len(1, nrow(ydf))*

*xydf <- rbind(ydf, xdf)*

*ggplot(xydf, aes(x=V1, y=V2, colour=V3))+*

* geom_point(alpha=0.5)*

Now, having visually seen how our linear regression function work on 3D data, and how it may be used for dimensionality/factor reduction, and where in it is a place for principle component manipulation, let us proceed playing with it on a larger dimension data…

**Appendix **

**ks_lm** <- function (df, yname, xnames){

* all.names = c( unlist(xnames), unlist(yname) )*

* clean.df <- df[complete.cases( df[, all.names] ), all.names]*

* nr <- nrow(clean.df)*

* nc <- ncol(clean.df)*

* model <- list()*

* model$X <- matrix(do.call(rbind,clean.df), nc)*

* x <- clean.df*

* x[, yname] <- NULL*

* x[, nc] <- rep_len(1, nr)*

*y <- clean.df[, yname]*

* X <- matrix( do.call(cbind,x), nrow(x) )*

*b1 <- solve(t(X) %*% X) %*% t(X) %*% y*

* #Projection onto 0 member of quotient space X/Y*

* model$R <- diag(rep_len(1, nc))*

* model$R[nc,1:nc] <- b1*

* model$R[nc,nc] <- 0*

* #X to Y Basis rotation*

* model$Nm1 <- diag(rep_len(1, nc))*

* model$Nm1[,nc] <- rep_len(1, nc)*

* model$Nm1 <- model$Nm1[-nc,]*

* model$h0 <- rep_len(0, nc)*

* model$h0[nc] <- b1[ncol(b1)]*

* model$X0 <- model$R %*% model$X*

* model$Xl <- model$X0[, 1:nr] + model$h0*

* model$Y <- model$Nm1 %*% model$X0*

* model*

*}*