Artificial Intelligence

Out of context: Reply #1970

  • Started
  • Last post
  • 2,576 Responses
  • ********
    -21

    "SEAL is a framework that enables language models to generate their own finetuning data and optimization instructions—called self-edits—in response to new tasks or information.

    SEAL learns to generate these self-edits via reinforcement learning (RL), using downstream task performance after a model update as the reward.

    Each training iteration involves the model generating a self-edit based on a task context, applying the self-edit via supervised finetuning, evaluating the updated model, and reinforcing edits that improve performance.

    This process is implemented with a lightweight reinforcement learning algorithm called ReSTEMEM, which does rounds of selecting high-reward samples using rejection sampling and reinforcing via SFT. We demonstrate SEAL in two domains: (1) Knowledge Incorporation, where the model integrates new factual information by generating logical implications as synthetic data, and (2) Few-Shot Learning, where the model autonomously selects data augmentations and training hyperparameters to adapt to new abstract reasoning tasks."

    https://jyopari.github.io/posts/…

    • Now this is something fresh :P
      ********
    • so if it gets downvotes continually it learns from that. incredible.kingsteven
    • no need to be snarky, you can just go upvote utopians posts if that will make you feel better
      ********
    • I just don't get it, are you getting paid affiliate money to post these?utopian
    • No one here likes you grafician, get lost
      ********

View thread