Stacked generalization

6.0kCitations
Citations of this article
2.5kReaders
Mendeley users who have this article in their library.
Get full text

Abstract

This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory. © 1992 Pergamon Press Ltd.

References Powered by Scopus

Induction of Decision Trees

15543Citations
N/AReaders
Get full text

A theory of the learnable

3669Citations
N/AReaders
Get full text

Nonlinear prediction of chaotic time series

1070Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Deep Learning in neural networks: An overview

14110Citations
N/AReaders
Get full text

Wrappers for feature subset selection

7184Citations
N/AReaders
Get full text

Learning deep architectures for AI

6700Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 1047

68%

Researcher 282

18%

Professor / Associate Prof. 132

9%

Lecturer / Post doc 71

5%

Readers' Discipline

Tooltip

Computer Science 766

62%

Engineering 332

27%

Agricultural and Biological Sciences 69

6%

Mathematics 69

6%

Article Metrics

Tooltip
Mentions
Blog Mentions: 2
News Mentions: 4
References: 8
Social Media
Shares, Likes & Comments: 34

Save time finding and organizing research with Mendeley

Sign up for free