The Apple M5 Pro chip is rumored to be announced later this week with improved GPU cores and faster inference. With consumer hardware getting better and better and AI labs squishing down models to fit in tiny amounts of vRAM, it’s becoming increasingly more feasible to have an assistant which has absorbed the entirety of the internet’s knowledge built right into your PC or laptop, all running privately and securely offline. The future is exciting everyone, we are closing the gap
Yeah, that’s never going to happen, I’m afraid. The models do get denser and better at transformative tasks, but you will simply never be able to ask that 22B 4-quant for the birthdates of obscure but historically important Bolivian politicians. That’s simply about information density and that’s not a useful application for models.
It’s going to be irrelevant, of course, once there’s a convenient 1-click way to integrate your local Kiwix-server into your model’s Open-WebUI’s knowledge base. There’s no need to waste VRAM on Wikipedia and Stackoverflow knowledge.
Kiwix and privacy-respecting metasearch integration will be useful for this
Page Assist already can let ollama models search the internet with SearXNG. But even though it often finds what I’m asking for, it’s still lacking. It’s just a one-shot search and it wouldn’t try a different search query if it doesn’t return any helpful messages
Totally agreed. LLMs shouldn’t be asked to know things, it’s counterproductive. They should be asked to DO things, and use available tools to do that.