Artificial Intelligence

Out of context: Reply #1970

Started
Last post
2,719 Responses

********
-21
"SEAL is a framework that enables language models to generate their own finetuning data and optimization instructions—called self-edits—in response to new tasks or information.
SEAL learns to generate these self-edits via reinforcement learning (RL), using downstream task performance after a model update as the reward.
Each training iteration involves the model generating a self-edit based on a task context, applying the self-edit via supervised finetuning, evaluating the updated model, and reinforcing edits that improve performance.
This process is implemented with a lightweight reinforcement learning algorithm called ReSTEMEM, which does rounds of selecting high-reward samples using rejection sampling and reinforcing via SFT. We demonstrate SEAL in two domains: (1) Knowledge Incorporation, where the model integrates new factual information by generating logical implications as synthetic data, and (2) Few-Shot Learning, where the model autonomously selects data augmentations and training hyperparameters to adapt to new abstract reasoning tasks."
https://jyopari.github.io/posts/…
********
-21Permalink
Upvote Downvote
Flag
- Now this is something fresh :P
  ********
- so if it gets downvotes continually it learns from that. incredible.kingsteven
- no need to be snarky, you can just go upvote utopians posts if that will make you feel better
  ********
- I just don't get it, are you getting paid affiliate money to post these?utopian
- No one here likes you grafician, get lost
  ********
Show [[ numHiddenNotes ]] more notes Add Note
Save Cancel

Artificial Intelligence

Out of context: Reply #1970

View thread