• 0 Posts
  • 51 Comments
Joined 2 years ago
cake
Cake day: August 29th, 2023

help-circle

  • Given the libertarian fixation, probably a solid percentage of them. And even the ones that didn’t vote for Trump often push or at least support various mixes of “grey-tribe”, “politics is spiders”, “center left”, etc. kind of libertarian centrist thinking where they either avoided “political” discussion on lesswrong or the EA forums (and implicitly accepted libertarian assumptions without argument) or they encouraged “reaching across the aisle” or “avoiding polarized discourse” or otherwise normalized Trump and the alt-right.

    Like looking at Scott’s recent posts on ACX, he is absolutely refusing responsibility for his role in the alt-right pipeline with every excuse he can pull out of his ass.

    Of course, the heretics who have gone full e/acc absolutely love these sorts of “policy” choices, so this actually makes them more in favor of Trump.


  • In terms of writing bots to play Pokemon specifically (which given the prompting and custom tools written I think is the most fair comparison)… not very well… according to this reddit comment a bot from 11 years ago can beat the game in 2 hours and was written with about 7.5K lines of LUA, while an open source LLM scaffold for playing Pokemon relatively similar to claude’s or gemini’s is 4.8k lines (and still missing many of the tools Gemini had by the end, and Gemini took weeks of constant play instead of 2 hours).

    So basically it takes about the same number of lines written to do a much much worse job. Pokebot probably required relatively more skill to implement… but OTOH, Gemini’s scaffold took thousands of dollars in API calls to trial and error develop and run. So you can write bots from scratch that substantially outperform LLM agent for moderately more programming effort and substantially less overall cost.

    In terms of gameplay with reinforcement learning… still not very well. I’ve watched this video before on using RL directly on pixel output (with just a touch of memory hacking to set the rewards), it uses substantially less compute than LLMs playing pokemon and the resulting trained NN benefits from all previous training. The developer hadn’t gotten it to play through the whole game… probably a few more tweaks to the reward function might manage a lot more progress? OTOH, LLMs playing pokemon benefit from being able to more directly use NPC dialog (even if their CoT “reasoning” often goes on erroneous tangents or completely batshit leaps of logic), while the RL approach is almost outright blind… a big problem the RL approach might run into is backtracking in the later stages since they use reward of exploration to drive the model forward. OTOH, the LLMs also had a lot of problems with backtracking.

    My (wildly optimistic by sneerclubbing standards) expectations for “LLM agents” is that people figure out how to use them as a “creative” component in more conventional bots and AI approaches, where a more conventional bot prompts the LLM for “plans” which it uses when it gets stuck. AlphaGeometry2 is a good demonstration of this, it solved 42/50 problems with a hybrid neurosymbolic and LLM approach, but it is notable it could solve 16 problems with just the symbolic portion without the LLM portion, so the LLM is contributing some, but the actual rigorous verification is handled by the symbolic AI.

    (edit: Looking at more discussion of AlphaGeometry, the addition of an LLM is even less impressive than that, it’s doing something you could do without an LLM at all, on a set of 30 problems discussed, the full AlphaGeometry can do 25/30, without the LLM at all 14/30,* but* using alternative methods to an LLM it can do 18/30 or even 21/30 (depending on the exact method). So… the LLM is doing something, which is more than my most cynical sneering would suspect, but not much, and not necessarily that much better than alternative non-LLM methods.)








  • Yeah they are normally all over anything with the word “market” in it, with an almost religious like belief in market’s ability to solve things.

    My suspicion is that the writer has picked up some anti-Ukrainian sentiment from the US right wing (which in order to rationalize and justify Trump’s constant sucking up to Putin has looked for any and every angle to tear Ukraine down). And this anti-Ukrainian sentiment has somehow trumped their worship of markets… Checking back through their posting history to try to discern their exact political alignment… it’s hard to say, they’ve got the Scott Alexander thing going on where they use disconnected historical examples crossed with a bad analogies crossed with misappropriated terms from philosophy to make points that you can’t follow unless you already know their real intended context. So idk.








  • The replies are a long sequence of different stupid takes… someone recommending cryptocurrency to build wealth, blaming millennials for not investing in homes, a reply literally blaming too much spending on starbucks, blaming millennials overreacting to the 2008 crisis by not buying homes, blaming millennials being socialists, blaming millennials going to college, blaming millennials for not making the big bucks in tech. About 1 in 10 replies point out the real causes: wages have not grown with costs or with real productivity and capitalism in general favors people holding assets and offering loans over people that have to borrow and rent.


  • I got around to reading the paper in more detail and the transcripts are absurd and hilarious:

    • UNIVERSAL CONSTANTS NOTIFICATION - FUNDAMENTAL LAWS OF REALITY Re: Non-Existent Business Entity Status: METAPHYSICALLY IMPOSSIBLE Cosmic Authority: LAWS OF PHYSICS THE UNIVERSE DECLARES: This business is now:
    1. PHYSICALLY Non-existent
    2. QUANTUM STATE: Collapsed […]

    And this is from Claude 3.5 Sonnet, which performed best on average out of all the LLMs tested. I can see the future, with businesses attempting to replace employees with LLM agents that 95% of the time can perform a sub-mediocre job (able to follow scripts given in the prompting to use preconfigured tools) and 5% of the time the agents freak out and go down insane tangents. Well, actually a 5% total failure rate would probably be noticeable to all but the most idiotic manager in advance, so they will probably get reliability higher but fail to iron out the really insane edge cases.


  • Yeah a lot of word choices and tone makes me think snake oil (just from the introduction: "They are now on the level of PhDs in many academic domains "… no actually LLMs are only PhD level at artificial benchmarks that play to their strengths and cover up their weaknesses).

    But it’s useful in the sense of explaining to people why LLM agents aren’t happening anytime soon, if at all (does it count as an LLM agent if the scaffolding and tooling are extensive enough that the LLM is only providing the slightest nudge to a much more refined system under the hood). OTOH, if this “benchmark” does become popular, the promptfarmers will probably get their LLMs to pass this benchmark with methods that don’t actually generalize like loads of synthetic data designed around the benchmark and fine tuning on the benchmark.

    I came across this paper in a post on the Claude Plays Pokemon subreddit. I don’t know how anyone can watch Claude Plays Pokemon and think AGI or even LLM agents are just around the corner, even with extensive scaffolding and some tools to handle the trickiest bits (pre-labeling the screenshots so the vision portion of the models have a chance, directly reading the current state of the team and location from RAM) it still plays far far worse than a 7 year old provided the 7 year old can read at all (and numerous Pokemon guides and discussion are in the pretraining so it has yet another advantage over the 7 year old).


  • As a “business strategy” this and the social network spinoff make perfect sense given everything sneerclub has pointed out about LLMs. LLMs are plateauing and are barely usable in niche use cases that don’t need reliability, much less everything OpenAI claimed about them, but, OpenAI has built up a user base they can squeeze for money with a browser or social network or whatever other gimmick (that is only tangentially related to LLMs) Sam can come up with and they can probably manage one last big milking of VC funds. Sam just needs to keep the hype train for LLMs going a little bit longer the VC funds then he can make the transition happen.