What is Self-Supervised Learning?
Self-supervised learning is a training paradigm where models learn representations from unlabeled data by solving pretext tasks that generate supervisory signals from the data itself. It powers the pre-training of foundation models and reduces dependence on expensive labeled data.
workBrowse Machine Learning JobsSelf-supervised learning occupies a space between supervised learning (which requires labeled data) and unsupervised learning (which discovers structure without objectives). It creates supervisory signals from the data itself by defining pretext tasks that force the model to learn useful representations. The key insight is that predicting parts of the data from other parts requires understanding the underlying structure.
For language, the dominant self-supervised objectives are next-token prediction (GPT) and masked token prediction (BERT). These objectives require the model to develop deep language understanding to succeed, resulting in representations that transfer effectively to downstream tasks.
For vision, self-supervised methods include contrastive learning (SimCLR, MoCo), which trains models to produce similar representations for different augmentations of the same image while separating different images; masked image modeling (MAE), which masks portions of images and trains the model to reconstruct them; and self-distillation (DINO, DINOv2), which trains a student network to match the output of a momentum-updated teacher network.
Self-supervised learning has been transformative because it leverages the vast amounts of unlabeled data available on the internet. The web contains orders of magnitude more unlabeled text and images than any labeled dataset. Self-supervised pre-training on this data produces foundation models with broad knowledge that can be specialized for specific tasks through fine-tuning with much smaller labeled datasets.
How Self-Supervised Learning Works
The model creates its own training signal from unlabeled data by hiding or corrupting parts of the input and training to predict or reconstruct the missing parts. For text, this means predicting masked or next words. For images, this means reconstructing masked patches or matching augmented views. Learning to solve these tasks requires understanding the data structure.
trending_upCareer Relevance
Self-supervised learning is the foundation of modern AI pre-training. Understanding these methods provides crucial context for how foundation models acquire their capabilities. It is important for research roles and for practitioners who need to understand model behavior.
See Machine Learning jobsarrow_forwardFrequently Asked Questions
How does self-supervised learning differ from unsupervised learning?
Self-supervised learning uses a defined objective derived from the data (like predicting masked words), while unsupervised learning discovers structure without a specific objective (like clustering). Self-supervised methods typically learn more useful representations because they optimize a concrete task.
Why is self-supervised learning important?
It enables pre-training on vast amounts of unlabeled data, which is much more abundant than labeled data. This pre-training produces the foundation of knowledge in models like GPT, BERT, and their successors.
Is self-supervised learning knowledge needed for AI careers?
For research and advanced engineering roles, understanding self-supervised learning is important. For application-focused roles, knowing that pre-trained models use self-supervised learning provides useful context for making model selection and fine-tuning decisions.
Related Terms
- arrow_forwardPre-training
Pre-training is the initial phase of training where a model learns general representations from large-scale data using self-supervised objectives. It provides the foundation of knowledge and capabilities that subsequent fine-tuning adapts for specific tasks.
- arrow_forwardFoundation Model
A foundation model is a large AI model trained on broad data that can be adapted to a wide range of downstream tasks. Examples include GPT-4, Claude, LLaMA, and DALL-E. They represent a paradigm shift toward general-purpose models that serve as a base for many applications.
- arrow_forwardBERT
BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model developed by Google that reads text in both directions simultaneously. It established new benchmarks across many NLP tasks and popularized the pre-train then fine-tune paradigm.
- arrow_forwardTransfer Learning
Transfer learning is a technique where knowledge gained from training on one task is applied to a different but related task. It is the foundation of the pre-train and fine-tune paradigm that makes modern AI practical for the vast majority of applications.
- arrow_forwardSupervised Learning
Supervised learning is the most common ML paradigm where a model learns from labeled training data to make predictions on new data. The "supervision" comes from known correct answers (labels) that guide the learning process.
Related Jobs
View open positions
View salary ranges