Artificial Intelligence
Out of context: Reply #1970
- Started
- Last post
- 2,576 Responses
- ********-21
"SEAL is a framework that enables language models to generate their own finetuning data and optimization instructions—called self-edits—in response to new tasks or information.
SEAL learns to generate these self-edits via reinforcement learning (RL), using downstream task performance after a model update as the reward.
Each training iteration involves the model generating a self-edit based on a task context, applying the self-edit via supervised finetuning, evaluating the updated model, and reinforcing edits that improve performance.
This process is implemented with a lightweight reinforcement learning algorithm called ReSTEMEM, which does rounds of selecting high-reward samples using rejection sampling and reinforcing via SFT. We demonstrate SEAL in two domains: (1) Knowledge Incorporation, where the model integrates new factual information by generating logical implications as synthetic data, and (2) Few-Shot Learning, where the model autonomously selects data augmentations and training hyperparameters to adapt to new abstract reasoning tasks."
- Now this is something fresh :P********
- so if it gets downvotes continually it learns from that. incredible.kingsteven
- no need to be snarky, you can just go upvote utopians posts if that will make you feel better********
- I just don't get it, are you getting paid affiliate money to post these?utopian
- No one here likes you grafician, get lost********
- Now this is something fresh :P