Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

0x815@feddit.de · 1 year ago

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head

P1r4nha@feddit.de · 1 year ago

Practically you would have to separate model architecture from weights. Weights are licensed as research use only, while the architecture is the actual scientific contribution. Maybe some instructions on best train the model.

Only problem is that you can’t really prove if someone just retrained research weights or trained from scratch using randomized weights. Also certain alterations to the architecture are possible, so only the “headless” models are used.

I think there’s some research into detecting retraining, but I can imagine it’s not fool proof.

frog 🐸@beehaw.org · 1 year ago

I kind of think that as proof-of-concepts, the AI models are kind of interesting. I don’t like the content they produce much, because it is just so utterly same-y, so I haven’t yet seen anything that made me go “wow, that’s amazing”. But the actual architecture behind them is pretty cool.

But at this point, they’ve gone beyond researching an interesting idea into full on commercial enterprises. If we don’t have an effective means of retraining the existing models to remove the data that isn’t licenced for commercial use (which is most of it), then it seems the only ethical way to move forward would be to start again with more selective training data, including only what is commercially licenced. Now the research has been done in how to create these models, it should be quicker to build new ones with more ethically sourced training data.