Bing’s GPT-4 With Image Input Can Break Captchas

𝕊𝕚𝕤𝕪𝕡𝕙𝕖𝕒𝕟@programming.dev · 1 year ago

Bing’s GPT-4 With Image Input Can Break Captchas

vcmj@programming.dev · 1 year ago

I’ve not played with it much but does it always describe the image first like that? I’ve been trying to think about how the image input actually works, my personal suspicion is that it uses an off the shelf visual understanding network(think reverse stable diffusion) to generate a description, then just uses GPT normally to complete the response. This could explain the disconnect here where it cant erase what the visual model wrote, but that could all fall apart if it doesn’t always follow this pattern. Just thinking out loud here

𝕊𝕚𝕤𝕪𝕡𝕙𝕖𝕒𝕟@programming.dev · 1 year ago

Unfortunately I don’t yet have access to it so I can’t check if the description always comes first. But your theory sounds interesting, I hope we’ll be able to find out more soon.