Full Information Maximum LikelihoodEdit
Full Information Maximum Likelihood
Full Information Maximum Likelihood (FIML) is a principled statistical method for estimating model parameters when some data are missing. Rather than discarding incomplete cases or imputing single values, FIML works from the joint distribution implied by a model and uses the entire observed-data likelihood to estimate parameters. This approach is widely used in fields that rely on complex models and incomplete data, notably in Structural equation modeling and in analyses of Panel data and longitudinal studies. By leveraging all available information, FIML often achieves greater efficiency and less bias than traditional ad hoc remedies, provided its assumptions are reasonable.
Overview
At its core, FIML treats missing data as an intrinsic part of the data-generating process specified by the model. For each observation, the method writes a likelihood contribution that reflects only the observed components of that observation, and then sums (or multiplies) these contributions across cases to form the overall log-likelihood to be maximized. When no data are missing, FIML reduces to the familiar form of Maximum likelihood estimation based on complete data. The “full information” label emphasizes that the approach uses all available information across variables and time periods, rather than discarding cases with any missing values.
FIML is particularly common in the analysis frameworks that rely on a probabilistic specification of relationships among variables, such as Structural equation modeling and models for Latent variable constructs. In these settings, FIML integrates over the missing portions of the data implicitly through the model, avoiding the need to fill in missing values explicitly.
Foundations and mechanics
Likelihood and observed data: For each unit, the method constructs the contribution to the likelihood from the observed variables, marginalizing over the unobserved values under the model’s joint distribution. The total FIML objective is the sum (or integral) of these contributions across units. This builds on the general idea of Likelihood function in statistics and the specific concept of the Observed data likelihood.
Missing data patterns: FIML accommodates arbitrary patterns of missingness, as long as the model specifies a coherent joint distribution for all variables involved. The estimation is carried out with software that can handle the necessary integrals or analytical forms, depending on the model and data type. See also discussions of Missing data and related mechanisms.
Assumptions about the missingness process: A central practical consideration is the missingness mechanism. FIML typically relies on the assumption of missing at random, or MAR, meaning that the probability of a value being missing can depend on observed data but not on the missing value itself. See Missing at random for details. If missingness depends on unobserved data (MNAR, or Missing Not At Random), standard FIML can yield biased estimates unless the missing-data mechanism is modeled or sensitivity analyses are conducted. See Missing not at random for more on this distinction.
Parameter properties: Under regularity conditions, FIML estimators are consistent and asymptotically normal, and standard errors are derived from the observed-data information matrix. In large samples, these properties help ensure reliable inference even when some data are missing.
Assumptions, controversies, and debates
MAR vs MNAR: The practical success of FIML hinges on the plausibility of MAR in a given application. Critics warn that MAR is untestable in many settings and that violations can bias estimates. Proponents argue that MAR is a reasonable working assumption in many applied contexts and that FIML, under MAR, often outperforms listwise deletion and simple imputation. Researchers sometimes perform sensitivity analyses to assess how results change under different missing-data assumptions, an approach to which some funding bodies and journals give increasing emphasis.
Model specification and misspecification risk: Since FIML relies on a specified joint distribution for all variables, misspecification of the model (e.g., incorrect distributional assumptions, wrong structural relations) can distort estimates. Critics point out that reliance on a single model may mask alternative explanations or data features; supporters counter that FIML remains efficient and transparent when models are well-specified and tested against data via fit indices and diagnostic checks.
Comparisons with imputation-based approaches: An ongoing debate in practice concerns when to prefer FIML versus modern imputation methods such as Multiple imputation followed by standard analysis. Proponents of MI argue that it provides a flexible framework that can handle a variety of missing-data mechanisms and model types, with results that are robust to certain kinds of misspecification when pooling is done carefully. FIML, by contrast, estimates parameters in a single step under the specified model, which can be more straightforward to implement in some SEM contexts and may yield gains in efficiency when the MAR assumption holds.
Computational considerations: FIML can be computationally intensive, especially for complex models with many variables or non-Gaussian outcomes. Advances in optimization algorithms and software have mitigated this, but practice still requires careful model specification, convergence diagnostics, and sometimes simplifications to ensure tractable estimation. See discussions of Robust statistics and Likelihood-based methods for related practical considerations.
Applications and extensions
Structural equation modeling: FIML is a standard tool for estimating SEMs with incomplete data, including models with several latent variables, multiple indicators, and complex covariance structures. Software such as LISREL, Mplus, AMOS and other SEM toolkits implement FIML under various data types and model specifications.
Longitudinal and panel data: In settings with repeated measures, FIML efficiently uses data from individuals even when some waves are missing. This makes it a preferred option in disciplines ranging from psychology to economics where panel data are common.
Non-normal outcomes and robustness: For outcomes that deviate from normality, researchers can combine FIML with robust standard errors or use distributional forms that better match the data. In some contexts, researchers apply Satorra-Bentler corrections or other robust approaches to ensure valid inference under nonstandard conditions.
Extensions and variants: FIML can be adapted to multilevel or hierarchical models, multigroup analyses, and measurement-invariance testing. It also pairs with pattern-mixture or selection-model approaches to address MNAR concerns, giving researchers a framework for sensitivity analyses within a coherent likelihood-based paradigm.
Practical considerations and examples
When to use FIML: If the analysis involves a model with missing data that is well-justified by theory and the MAR assumption is reasonable given observed data, FIML often yields less biased estimates and higher efficiency than dropping cases or imputing simplistic values. See Maximum likelihood estimation for foundational ideas and Missing data for context on how data can be incomplete.
Interpretability and reporting: As with any model-based method, practitioners should report the missing-data context, the assumed MAR/MNAR considerations, the model specification, and the procedures used to check robustness. Clear documentation helps readers assess whether the conclusions rest on plausible assumptions.
Software and implementation: FIML is available in many contemporary statistical packages used in research and policy analysis. Researchers often rely on the software’s built-in facilities to specify the model, declare missing-data patterns, and obtain standard errors and fit statistics that reflect the observed-data likelihood.
See also