People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1’s 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3

However, something to note:

Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.

Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?

Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

The same knowledge cutoff, same amount of training data, and same training time give me hope that it’s just a better finetune of maybe Llama 3.1 405b.

  • ᗪᗩᗰᑎ@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    1 month ago

    I was also not sure what this meant, so I asked Google’s Gemini, and I think this clears it up for me:


    This means that the creators of Llama 3.3 have chosen to release only the version of the model that has been fine-tuned for following instructions. They are not making the original, “pretrained” version available.

    Here’s a breakdown of why this is significant:

    • Pretrained models: These are large language models (LLMs) trained on a massive dataset of text and code. They have learned to predict the next word in a sequence, and in doing so, have developed a broad understanding of language and a wide range of general knowledge. However, they may not be very good at following instructions or performing specific tasks.
    • Instruction-tuned models: These models are further trained on a dataset of instructions and desired outputs. This fine-tuning process teaches them to follow instructions more effectively, generate more relevant and helpful responses, and perform specific tasks with greater accuracy.

    In the case of Llama 3.3 70B, you only have access to the model that has already been optimized for following instructions and engaging in dialogue. You cannot access the initial pretrained model that was used as the foundation for this instruction-tuned version.

    Possible reasons why Meta (the creators of Llama) might have made this decision:

    • Focus on specific use cases: By releasing only the instruction-tuned model, Meta might be encouraging developers to use Llama 3.3 for assistant-like chat applications and other tasks where following instructions is crucial.
    • Competitive advantage: The pretrained model might be considered more valuable intellectual property, and Meta may want to keep it private to maintain a competitive advantage.
    • Safety and responsibility: Releasing the pretrained model could potentially lead to its misuse for generating harmful or misleading content. By releasing only the instruction-tuned version, Meta might be trying to mitigate these risks.

    Ultimately, the decision to release only the instruction-tuned model reflects Meta’s strategic goals for Llama 3.3 and their approach to responsible AI development.