Facebook Pushes Its Llama 4 AI Model to the Right, Wants to Present “Both Sides” [404 Media]

BlueMonday1984@awful.systems · 6 months ago

Facebook Pushes Its Llama 4 AI Model to the Right, Wants to Present “Both Sides” [404 Media]

corbin@awful.systems · 6 months ago

It’s well-known folklore that reinforcement learning with human feedback (RLHF), the standard post-training paradigm, reduces “alignment,” the degree to which a pre-trained model has learned features of reality as it actually exists. Quoting from the abstract of the 2024 paper, Mitigating the Alignment Tax of RLHF (alternate link):

LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting pretrained abilities, which is also known as the alignment tax.