• 0 Posts
  • 19 Comments
Joined 1 year ago
cake
Cake day: June 17th, 2023

help-circle






  • Maybe a bit late, but I’ve worked on this kind of functionality. I did not work on the algorithm, but the guys who did say:

    • Voice print data is not enough to reconstruct the voice
    • There is an input validation step specifically to detect TTS, so you may have trouble. EDIT nvm I read TTY wrong lol, still interesting they check for this

    This is of course based on trust, but these are the claims.







  • I think the part that annoys me the most is the hype around it, just like blockchain. People who don’t know any better claiming magic.

    We’ve had a few sequence specific architectures over the years. GRU, LSTM and now Transformers. They were all better than the last at the task of sequence specific transformations, and at least for the last one the specific task was language translation. We eventually figured out these guys have a bit of clairvoyance too, they could make accurate predictions based on past data, or at least accurate enough to bet on, and you can bet traders of various stripes have already made billions off that fact. I’ve even seen a transformer based weather model. It did OK, but transformers are better at language.

    And that’s all it is! ChatGPT is a Transformer in the predictive stance. It looks at a transcript of a conversation and thinks what a human is most likely to say next. It’s a very complex transformation of historical data. If you give it the exact same transcript, it gives the exact same answer. It is in the literally mathematically rigorous sense entirely incapable of an original thought. Any perceived sentience is a shadow of OpenAI’s army of annotators or the corpus it was trained on, and I have a hard time assigning sentience to tomorrow’s forecast, which may well have used similar technology. It’s just an ultra fancy search engine index.

    Anyways, that’s my rant done I guess. Call it a cynical engineer’s opinion. To be clear I think it’s a fantastic and useful technology, and it WILL change how we interact with machines. It can do fancy things with the combination of “shell” code driving it’s UI like multi-step “agents” or running code, and I actually hope OpenAI extends it far into the future, but I sincerely think any form of AGI will be something entirely different to LLMs, or at least they’ll only form a small part of it as an encoder/decoder for it’s thoughts.

    EDIT: Added some paragraph spacing. Sorry, went into a more broad AI rant rather than staying on topic about coding specifically lol




  • I remember hearing a while back that Musk made an executive decision at Tesla to not use LIDAR. I thought: “That’s a stupid decision. At least invest in making it better if you think its not sufficient” and I had a quite negative view of his engineering abilities ever since. Seeing as a Tesla can be fooled by a projector these days, I’m willing to die on that hill. I will admit that he is an exceptional businessman, most people would piss away a fortune if given one, but an engineer he is not, not by a loooooooong way.


  • I’ve not played with it much but does it always describe the image first like that? I’ve been trying to think about how the image input actually works, my personal suspicion is that it uses an off the shelf visual understanding network(think reverse stable diffusion) to generate a description, then just uses GPT normally to complete the response. This could explain the disconnect here where it cant erase what the visual model wrote, but that could all fall apart if it doesn’t always follow this pattern. Just thinking out loud here


  • Thanks for the detailed reply, I see that I did indeed misunderstand what he was saying. I’m an R&D engineer so I guess my knee jerk response to character level mischief is exactly what you said, it can’t see them anyway, I already knew that so I dismissed that possible interpretation in my mind straight out the gate. Maybe I should assume zero knowledge of internal AI workings reading commentary in the wild.

    Edit: Actually just thought of a good analogy for this. Say I play a sound and then ask you what it is of. You might reply “it sounds like a bell”, but if I asked exactly the composition of frequencies that made the sound, you might not be able to say. Similarly the AI sees a group of letters as a definite “thing” (token) but it doesn’t know what actually went into that because its “ears”(tokenizer) already reduced it to a simpler signal.