• dogslayeggs@lemmy.world
    link
    fedilink
    English
    arrow-up
    20
    ·
    5 天前

    Sure. My company has a database of all technical papers written by employees in the last 30-ish years. Nearly all of these contain proprietary information from other companies (we deal with tons of other companies and have access to their data), so we can’t build a public LLM nor use a public LLM. So we created an internal-only LLM that is only trained on our data.

    • Fmstrat@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      4 天前

      I’d bet my lunch this internal LLM is a trained open weight model, which has lots of public data in it. Not complaining about what your company has done, as I think that makes sense, just providing a counterpoint.

    • utopiah@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      4 天前

      You are solely using your own data or rather you are refining an existing LLM or rather RAG?

      I’m not an expert but AFAIK training an LLM requires, by definition, a vast mount of text so I’m skeptical that ANY company publish enough papers to do so. I understand if you can’t share more about the process. Maybe me saying “AI” was too broad.