BlueMonday1984@awful.systems to TechTakes@awful.systemsEnglish · 19 days agoFacebook Pushes Its Llama 4 AI Model to the Right, Wants to Present “Both Sides” [404 Media]www.404media.coexternal-linkmessage-square8linkfedilinkarrow-up11arrow-down10 cross-posted to: [email protected][email protected][email protected][email protected]
arrow-up11arrow-down1external-linkFacebook Pushes Its Llama 4 AI Model to the Right, Wants to Present “Both Sides” [404 Media]www.404media.coBlueMonday1984@awful.systems to TechTakes@awful.systemsEnglish · 19 days agomessage-square8linkfedilink cross-posted to: [email protected][email protected][email protected][email protected]
minus-squarecorbin@awful.systemslinkfedilinkEnglisharrow-up0·17 days agoIt’s well-known folklore that reinforcement learning with human feedback (RLHF), the standard post-training paradigm, reduces “alignment,” the degree to which a pre-trained model has learned features of reality as it actually exists. Quoting from the abstract of the 2024 paper, Mitigating the Alignment Tax of RLHF (alternate link): LLMs acquire a wide range of abilities during pre-training, but aligning LLMs under Reinforcement Learning with Human Feedback (RLHF) can lead to forgetting pretrained abilities, which is also known as the alignment tax.
It’s well-known folklore that reinforcement learning with human feedback (RLHF), the standard post-training paradigm, reduces “alignment,” the degree to which a pre-trained model has learned features of reality as it actually exists. Quoting from the abstract of the 2024 paper, Mitigating the Alignment Tax of RLHF (alternate link):