• 0 Posts
  • 5 Comments
Joined 1 year ago
cake
Cake day: June 26th, 2023

help-circle

  • You need an absolutely insane amount of data to train LLMs. Hundreds of billions to tens of trillions of tokens. (A token isn’t the same as a word, but with numbers this massive it doesn’t even matter for the point.)

    Wikipedia just doesn’t have enough data to make an LLM off of, and even if you could do it and get okay results, it’ll only know how to write text in the style of Wikipedia. While it might be able to tell you all about the how different cultures most commonly cook eggs, I doubt you’ll get any recipe out of it that makes sense.

    If you were to take some base model (such as llama or gpt) and tune it in Wikipedia data, you’ll probably get a “llama in the style of Wikipedia” result, and that may be what you want, but more likely not.




  • Docker, using the nextcloud:stable image (not-all in-one) with postgres, behind nginx, and finally ZFS with 2x modern HDDs for storage. I run the stock apps plus a small handful, and have carried the same database through many versions over the last 5 years.

    It’s usable, but definitely not snappy.

    The web interface for files is fine. Not instantaneous at all but not a huge problem. I have about 1TB of files (images and videos) in one folder, then varying files everywhere else. I suspect that the number of files (but probably not the size) is causing the slowdown.

    Switching to, for example, the notes app is incredibly slow, and the NC Android app is just as bad.