Direct Preference Optimization: Your language model is secretly a reward model

rw-book-cover

Metadata

  • Author: readwise.io
  • Full Title: Direct Preference Optimization: Your language model is secretly a reward model
  • URL: 58089842

Highlights

  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model (View Highlight)