• 2 Posts
  • 4 Comments
Joined 3 months ago
cake
Cake day: September 12th, 2025

help-circle

  • Most AI platforms use massive models with trillions of parameters that activate all their computational power for every single query.

    The first part is probably right, frontier models are likely around a trillion parameters total, though we don’t know for sure, but I don’t think the second part is correct. It’s almost certain the big proprietary models, like all the recent large open models, use a Mixture-of-Experts system like Thuara, because it’s just cheaper to train and run.

    While traditional web research might take you 10 minutes of clicking through pages and consuming cookies from major search engines, Thaura uses a fraction of the energy and provides the same information.

    This part is pretty misleading. It is very unclear how much an LLM query compares to a search in terms of energy use, but even assuming they’re about equal (most estimates put LLM queries higher), the LLM also has to do their own web searches to find the information, if you’re using it for research purposes, so that point is fairly moot. Also the “consuming cookies” part isn’t really an energy problem, but a privacy problem, so I’m not sure why it’s used in this context.

    Thaura uses a “mixture of expert” models with 100 billion total parameters, but only activates 12 billion per query.

    Going to the actual website, it does credit the “GLM-4.5 Air architecture”, but the article doesn’t mention GLM, or the company behind it (Z.ai) at all. Given that this is likely a finetune of the GLM model that was freely released, it feels weird how the Thaura team seem reluctant to give credit to the Z.ai team.

    These companies are often controlled by US-based corporations whose political stance supports occupation, apartheid, and Western hegemony - not human rights or global justice.

    Reading below and also looking at their website, the hosting and inference is done by US firms (DigitalOcean, TogetherAI) in datacenters hosted in the EU. That’s not inherently bad from a privacy standpoint due to encryption, but it does feel disjointed that they are railing against US firms and western hegemony while simultaneously using their services for Thaura.

    While I don’t think the Thaura team had bad intentions in fine-tuning their model and building the service, I feel that this is a pretty misleading article that also doesn’t give any significant details on Thaura, like it’s performance. They also haven’t given back to the community by releasing their model weights, despite building on an open model themselves. Personally, I think it’s better to stick to Z.ai, Qwen, Deepseek, etc, who actually release their models to the community and pretrain their models themselves.