18842

Self-Training Converts Weak Learners to Strong Learners in Mixture Models

APA

(2021). Self-Training Converts Weak Learners to Strong Learners in Mixture Models. The Simons Institute for the Theory of Computing. https://simons.berkeley.edu/talks/self-training-converts-weak-learners-strong-learners-mixture-models

MLA

Self-Training Converts Weak Learners to Strong Learners in Mixture Models. The Simons Institute for the Theory of Computing, Dec. 06, 2021, https://simons.berkeley.edu/talks/self-training-converts-weak-learners-strong-learners-mixture-models

BibTex

          @misc{ scivideos_18842,
            doi = {},
            url = {https://simons.berkeley.edu/talks/self-training-converts-weak-learners-strong-learners-mixture-models},
            author = {},
            keywords = {},
            language = {en},
            title = {Self-Training Converts Weak Learners to Strong Learners in Mixture Models},
            publisher = {The Simons Institute for the Theory of Computing},
            year = {2021},
            month = {dec},
            note = {18842 see, \url{https://scivideos.org/index.php/Simons-Institute/18842}}
          }
          
Spencer Frei (UC Berkeley)
Talk number18842
Source RepositorySimons Institute

Abstract

We consider a binary classification problem where the data comes from a mixture of two rotationally symmetric log-concave distributions. We analyze the dynamics of a nonconvex gradient-based self-training algorithm that utilizes a pseudolabeler to generate labels for unlabeled data and updates the pseudolabeler based on a "supervised" loss that treats the pseoudolabels as if they were ground truth-labels. We show that provided the initial pseudolabeler has classification error smaller than some absolute constant, then self-training produces classifiers with error at most epsilon greater than that of the Bayes-optimal classifier using O(d/epsilon^2) unlabeled examples. That is, self-training converts weak learners to strong learners using only unlabeled examples. We show that gradient descent on the logistic loss with O(d) labeled examples (i.e., independent of epsilon) suffices to produce a pseudolabeler such that the aforementioned results hold.