Pelayo Arbués

Recent Notes

  • I am cooking again

    Mar 22, 2026

  • The 10x Manager

    Feb 17, 2026

  • 2025 Reading Wrapped

    Jan 08, 2026

See 99 more →

Home

❯

Literature Notes

❯

Articles

❯

Direct Preference Optimization: Your language model is secretly a reward model

Direct Preference Optimization: Your language model is secretly a reward model

Apr 16, 20251 min read

  • articles
  • literature-note

rw-book-cover

Metadata

  • Author: Rafael Rafailov, Archit Sharma, Eric Mitchell, Stefano Ermon, Christopher D. Manning, Chelsea Finn
  • Full Title: Direct Preference Optimization: Your language model is secretly a reward model
  • URL: https://readwise.io/reader/document_raw_content/58089842

Highlights

  • Direct Preference Optimization: Your Language Model is Secretly a Reward Model (View Highlight)

Graph View

  • Metadata
  • Highlights

Now Reading

  • A New Era of Personalization: Shape Your Taste Profile on Spotify

    Mar 23, 2026

See 1706 more →

Created with Quartz, © 2026

  • Bluesky
  • Linkedin
  • Mastodon
  • Twitter
  • Unsplash
  • GitHub
  • RSS