Title: | Finite Mixture of Multivariate Censored/Missing Data |
---|---|
Description: | It fits finite mixture models for censored or/and missing data using several multivariate distributions. Point estimation and asymptotic inference (via empirical information matrix) are offered as well as censored data generation. Pairwise scatter and contour plots can be generated. Possible multivariate distributions are the well-known normal, Student-t and skew-normal distributions. This package is an complement of Lachos, V. H., Moreno, E. J. L., Chen, K. & Cabral, C. R. B. (2017) <doi:10.1016/j.jmva.2017.05.005> for the multivariate skew-normal case. |
Authors: | Francisco H. C. de Alencar [aut, cre], Christian E. Galarza [aut], Larissa A. Matos [ctb], Victor H. Lachos [ctb] |
Maintainer: | Francisco H. C. de Alencar <[email protected]> |
License: | GPL (>= 2) |
Version: | 3.1 |
Built: | 2025-02-09 02:59:47 UTC |
Source: | https://github.com/cran/CensMFM |
It fits finite mixture models for censored or/and missing data using several multivariate distributions. Point estimation and asymptotic inference (via empirical information matrix) are offered as well as censored data generation. Pairwise scatter and contour plots can be generated. Possible multivariate distributions are the well-known normal, Student-t and skew-normal distributions. This package is an complement of Lachos, V. H., Moreno, E. J. L., Chen, K. & Cabral, C. R. B. (2017) <doi:10.1016/j.jmva.2017.05.005> for the multivariate skew-normal case.
The DESCRIPTION file:
Index of help topics:
CensMFM-package Finite Mixture of Multivariate Censored/Missing Data fit.FMMSNC Fitting Finite Mixture of Multivariate Distributions. rMMSN Random Generator of Finite Mixture of Multivariate Distributions. rMMSN.contour Pairwise Scatter Plots and Histograms for Finite Mixture of Multivariate Distributions. rMSN Generating from Multivariate Skew-normal and Normal Random Distributions.
The CensMFM package provides comprehensive tools for fitting and analyzing finite mixture models on censored and/or missing data using several multivariate distributions. This package supports the normal, Student-t, and skew-normal distributions, facilitating point estimation and asymptotic inference through the empirical information matrix. Additionally, it allows for the generation of censored data.
Key functions include:
fit.FMMSNC
: Fits finite mixtures of censored and/or missing multivariate distributions using an EM-type algorithm. This function supports skew-normal, normal, and Student-t distributions.
rMMSN.contour
: Generates pairwise scatter plots and contour plots for analyzing the relationships within the fitted models.
rMMSN
: Provides functionality to generate random realizations from a finite mixture of multivariate distributions, particularly useful for simulation studies involving censored data.
rMSN
: Focuses on generating random realizations from multivariate Skew-normal and Normal distributions.
This package serves as an extension and complement to the methodologies presented in the paper by Lachos, V. H., Moreno, E. J. L., Chen, K. & Cabral, C. R. B. (2017) <doi:10.1016/j.jmva.2017.05.005>, specifically for the multivariate skew-normal case.
Francisco H. C. de Alencar [aut, cre], Christian E. Galarza [aut], Larissa A. Matos [ctb], Victor H. Lachos [ctb]
Maintainer: Francisco H. C. de Alencar <[email protected]>
Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.
Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.
C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.
F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.
fit.FMMSNC
, rMSN
, rMMSN
and rMMSN.contour
It adjusts a finite mixture of censored and/or missing multivariate distributions (FM-MC). These are the Skew-normal, normal and Student-t multivariate distributions. It uses a EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters.
fit.FMMSNC(cc, LI, LS, y, mu = NULL, Sigma = NULL, shape = NULL, pii = NULL, nu = NULL, g = NULL, get.init = TRUE, criteria = TRUE, family = "SN", error = 1e-05, iter.max = 350, uni.Gama = FALSE, kmeans.param = NULL, cal.im = FALSE)
fit.FMMSNC(cc, LI, LS, y, mu = NULL, Sigma = NULL, shape = NULL, pii = NULL, nu = NULL, g = NULL, get.init = TRUE, criteria = TRUE, family = "SN", error = 1e-05, iter.max = 350, uni.Gama = FALSE, kmeans.param = NULL, cal.im = FALSE)
cc |
vector of censoring indicators. For each observation it takes 0 if non-censored, 1 if censored. |
LI |
the matrix of lower limits of dimension |
LS |
the matrix of upper limits of dimension |
y |
the response matrix with dimension |
mu |
a list with |
Sigma |
a list with |
shape |
a list with |
pii |
a vector of weights for the mixture (dimension of the number |
nu |
the degrees of freedom for the Student-t distribution case, being a vector with dimension |
g |
number of mixture components. |
get.init |
Logical, |
criteria |
Logical, |
family |
distribution family to be used. Available distributions are the Skew-normal ("SN"), normal ("Normal") or Student-t ("t") distribution. |
error |
relative error for stopping criterion of the algorithm. See details. |
iter.max |
the maximum number of iterations of the EM algorithm. |
uni.Gama |
Logical, |
kmeans.param |
a list with alternative parameters for the kmeans function when generating initial values. List by default is
|
cal.im |
Logical, |
The information matrix is calculated with respect to the entries of
the square root matrix of Sigma, this using the Empirical information matrix. Disclaimer: User must be careful since the inference is asymptotic, so it must be used for decent sample sizes. Stopping criterion is abs((loglik/loglik-1))<epsilon
.
It returns a list that depending of the case, it returns one or more of the following objects:
mu |
a list with |
Sigma |
a list with |
Gamma |
a list with |
shape |
a list with |
nu |
a vector with one element containing the value of the degreees of freedom |
pii |
a vector with |
Zij |
a |
yest |
a |
MI |
a list with the standard errors for all parameters. |
logLik |
the log-likelihood value for the estimated parameters. |
aic |
the AIC criterion value for the estimated parameters. |
bic |
the BIC criterion value for the estimated parameters. |
edc |
the EDC criterion value for the estimated parameters. |
iter |
number of iterations until the EM algorithm converges. |
group |
a |
time |
time in minutes until the EM algorithm converges. |
The uni.Gama
parameter refers to the matrix for the Skew-normal distribution, while for the normal and student-t distribution, this parameter refers to the
matrix.
Francisco H. C. de Alencar [email protected], Christian E. Galarza [email protected], Victor Hugo Lachos [email protected] and Larissa A. Matos [email protected]
Maintainer: Francisco H. C. de Alencar [email protected]
Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.
Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.
C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.
F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.
rMSN
, rMMSN
and rMMSN.contour
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) nu <- c(0,0) pii <- c(0.6,0.4) percen <- c(0.1,0.2) n <- 200 g <- 2 seed <- 654678 set.seed(seed) test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape, percen = percen, each = TRUE, family = "SN") Zij <- test$G cc <- test$cc y <- test$y ## left censoring ## LI <-cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- y[cc==1] #full analysis may take a few seconds more... test_fit.cc0 = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "Normal", error = 0.0001, iter.max = 200, uni.Gama = FALSE, cal.im = FALSE) test_fit.cc = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE) ## missing data ## pctmiss <- 0.2 # 20% of missing data in the whole data missing <- matrix(runif(n*g), nrow = n) < pctmiss y[missing] <- NA cc <- matrix(nrow = n,ncol = g) cc[missing] <- 1 cc[!missing] <- 0 LI <- cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- +Inf test_fit.mis = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) nu <- c(0,0) pii <- c(0.6,0.4) percen <- c(0.1,0.2) n <- 200 g <- 2 seed <- 654678 set.seed(seed) test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape, percen = percen, each = TRUE, family = "SN") Zij <- test$G cc <- test$cc y <- test$y ## left censoring ## LI <-cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- y[cc==1] #full analysis may take a few seconds more... test_fit.cc0 = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "Normal", error = 0.0001, iter.max = 200, uni.Gama = FALSE, cal.im = FALSE) test_fit.cc = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE) ## missing data ## pctmiss <- 0.2 # 20% of missing data in the whole data missing <- matrix(runif(n*g), nrow = n) < pctmiss y[missing] <- NA cc <- matrix(nrow = n,ncol = g) cc[missing] <- 1 cc[!missing] <- 0 LI <- cc LS <-cc LI[cc==1]<- -Inf LS[cc==1]<- +Inf test_fit.mis = fit.FMMSNC(cc, LI, LS, y, mu=mu, Sigma = Sigma, shape=shape, pii = pii, g = 2, get.init = FALSE, criteria = TRUE, family = "SN", error = 0.00001, iter.max = 350, uni.Gama = FALSE, cal.im = TRUE)
It generates random realizations following a multivariate finite mixture of Skew-normal (family == "SN"
) and normal (family == "Normal"
) distributions under censoring. Censoring level can be set as a percentage and it can be adjusted per group if desired.
rMMSN(n = NULL, mu = NULL, Sigma = NULL, shape = NULL, percent = NULL, each = FALSE, pii = NULL, family = "SN")
rMMSN(n = NULL, mu = NULL, Sigma = NULL, shape = NULL, percent = NULL, each = FALSE, pii = NULL, family = "SN")
n |
number of observations |
mu |
a list with |
Sigma |
a list with |
shape |
a list with |
percent |
Percentage of censored data in each group or data as a whole (see next item). |
each |
If |
pii |
a vector of weights for the mixture of dimension |
family |
distribution family to be used for fitting. Options are "SN" for the Skew-normal and "Normal" for the Normal distribution respectively. |
It returns a list that depending of the case, it returns one or more of the following objects:
y |
a |
G |
a vector of length |
cutoff |
a vector containing the censoring cutoffs per group. |
Francisco H. C. de Alencar [email protected], Christian E. Galarza [email protected], Victor Hugo Lachos [email protected] and Larissa A. Matos [email protected]
Maintainer: Francisco H. C. de Alencar [email protected]
Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.
Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.
C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.
F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.
fit.FMMSNC
, rMSN
and rMMSN.contour
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) pii <- c(0.6,0.4) percent <- c(0.1,0.1) family <- "SN" n <-100 set.seed(20) rMMSN(n = n,pii = pii, mu = mu, Sigma = Sigma, shape = shape, percent = percent, each = TRUE, family = family)
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) pii <- c(0.6,0.4) percent <- c(0.1,0.1) family <- "SN" n <-100 set.seed(20) rMMSN(n = n,pii = pii, mu = mu, Sigma = Sigma, shape = shape, percent = percent, each = TRUE, family = family)
It plots the scatter plots with density contours for different multivariate distributions. Possible options are the Skew-normal (family == "SN"
), Normal (family == "Normal"
) and Student-t (family == "t"
) distribution. Different colors are used by groups. Histograms are shown in the diagonal.
rMMSN.contour(model = NULL, y = NULL, mu = NULL, Sigma = NULL, shape = NULL, nu = NULL, pii = NULL, Zij = NULL, contour = FALSE, hist.Bin = 30, contour.Bin = 10, slice = 100, col.names = NULL, length.x = c(0.5, 0.5), length.y = c(0.5, 0.5), family = "SN")
rMMSN.contour(model = NULL, y = NULL, mu = NULL, Sigma = NULL, shape = NULL, nu = NULL, pii = NULL, Zij = NULL, contour = FALSE, hist.Bin = 30, contour.Bin = 10, slice = 100, col.names = NULL, length.x = c(0.5, 0.5), length.y = c(0.5, 0.5), family = "SN")
model |
is an object resultant from the |
y |
the response matrix with dimension |
mu |
a list with |
Sigma |
a list with |
shape |
a list with |
nu |
the degrees of freedom for the Student-t distribution case, being a vector with dimension |
pii |
a vector of weights for the mixture of dimension |
Zij |
a matrix of dimension |
contour |
If |
hist.Bin |
number of bins in the histograms. Default is 30. |
contour.Bin |
creates evenly spaced contours in the range of the data. Default is 10. |
slice |
desired length of the sequence for the variables grid. This grid is build for the contours. |
col.names |
names passed to the data matrix |
length.x |
a vector of dimension 2 with the value to be subtracted and added from the minimum and maximum observation in the x-axis respectively. Default is |
length.y |
a vector of dimension 2 with the value to be subtracted and added from the minimum and maximum observation in the y-axis respectively. Default is |
family |
distribution family to be used. Available distributions are the Skew-normal ("SN"), normal ("Normal") or Student-t ("t") distribution. |
If the model
object is used, the user still has the option to choose the family
. If the model
object is not used, the user must input all other parameters. User may use the rMMSN
function to generate data.
This functions works well for any length of and
, but contour densities are only shown for
.
Francisco H. C. de Alencar [email protected], Christian E. Galarza [email protected], Victor Hugo Lachos [email protected] and Larissa A. Matos [email protected]
Maintainer: Francisco H. C. de Alencar [email protected]
Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.
Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.
C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.
F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.
fit.FMMSNC
, rMMSN
and fit.FMMSNC
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) nu <- 0 pii <- c(0.6,0.4) percent <- c(0.1,0.2) n <- 100 seed <- 654678 set.seed(seed) test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape, percent = percent, each = TRUE, family = "SN") ## SN ## SN.contour = rMMSN.contour(model = NULL, y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, family = "SN") #Plotting contours may take some time... ## SN ## SN.contour = rMMSN.contour(model = NULL, y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, contour = TRUE, family = "SN") ## Normal ## N.contour = rMMSN.contour(model = NULL,y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, contour = TRUE, family = "Normal") ## t ## t.contour = rMMSN.contour(model = NULL,y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, nu = c(4,3), contour = TRUE, family = "t")
mu <- Sigma <- shape <- list() mu[[1]] <- c(-3,-4) mu[[2]] <- c(2,2) Sigma[[1]] <- matrix(c(3,1,1,4.5), 2,2) Sigma[[2]] <- matrix(c(2,1,1,3.5), 2,2) shape[[1]] <- c(-2,2) shape[[2]] <- c(-3,4) nu <- 0 pii <- c(0.6,0.4) percent <- c(0.1,0.2) n <- 100 seed <- 654678 set.seed(seed) test = rMMSN(n = n, pii = pii,mu = mu,Sigma = Sigma,shape = shape, percent = percent, each = TRUE, family = "SN") ## SN ## SN.contour = rMMSN.contour(model = NULL, y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, family = "SN") #Plotting contours may take some time... ## SN ## SN.contour = rMMSN.contour(model = NULL, y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, contour = TRUE, family = "SN") ## Normal ## N.contour = rMMSN.contour(model = NULL,y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, contour = TRUE, family = "Normal") ## t ## t.contour = rMMSN.contour(model = NULL,y = test$y, Zij = test$G ,mu = mu, Sigma = Sigma, shape = shape, pii = pii, nu = c(4,3), contour = TRUE, family = "t")
It generates random realizations from a multivariate Skew-normal and Normal distribution.
rMSN(n, mu, Sigma, shape)
rMSN(n, mu, Sigma, shape)
n |
number of observations. |
mu |
a numeric vector of length |
Sigma |
a numeric positive definite matrix with dimension |
shape |
a numeric vector of length |
It returns a x
matrix containing the generated random realizations.
Francisco H. C. de Alencar [email protected], Christian E. Galarza [email protected], Victor Hugo Lachos [email protected] and Larissa A. Matos [email protected]
Maintainer: Francisco H. C. de Alencar [email protected]
Cabral, C. R. B., Lachos, V. H., & Prates, M. O. (2012). Multivariate mixture modeling using skew-normal independent distributions. Computational Statistics & Data Analysis, 56(1), 126-142.
Prates, M. O., Lachos, V. H., & Cabral, C. (2013). mixsmsn: Fitting finite mixture of scale mixture of skew-normal distributions. Journal of Statistical Software, 54(12), 1-20.
C.E. Galarza, L.A. Matos, D.K. Dey & V.H. Lachos. (2019) On Moments of Folded and Truncated Multivariate Extended Skew-Normal Distributions. Technical report. ID 19-14. University of Connecticut.
F.H.C. de Alencar, C.E. Galarza, L.A. Matos & V.H. Lachos. (2019) Finite Mixture Modeling of Censored and Missing Data Using the Multivariate Skew-Normal Distribution. echnical report. ID 19-31. University of Connecticut.
fit.FMMSNC
, rMMSN
and rMMSN.contour
mu <- c(-3,-4) Sigma <- matrix(c(3,1,1,4.5), 2,2) shape <- c(-3,2) rMSN(10,mu = mu,Sigma = Sigma,shape = shape)
mu <- c(-3,-4) Sigma <- matrix(c(3,1,1,4.5), 2,2) shape <- c(-3,2) rMSN(10,mu = mu,Sigma = Sigma,shape = shape)