ordellrb@lemmy.world to linuxmemes@lemmy.world · edit-26 months agoNot Total Recall (1990)lemmy.worldimagemessage-square58fedilinkarrow-up1587arrow-down17
arrow-up1580arrow-down1imageNot Total Recall (1990)lemmy.worldordellrb@lemmy.world to linuxmemes@lemmy.world · edit-26 months agomessage-square58fedilink
minus-squareR00bot@lemmy.blahaj.zonelinkfedilinkarrow-up17·6 months agoI can’t imagine it’d be that hard to write some code that does that using an existing AI model.
minus-squarenot_amm@lemmy.mllinkfedilinkEnglisharrow-up3·6 months agoI found a small command to run KDE Spectacle (screenshot software) with Tesseract so I can OCR a screenshot if I want to, I only had to install Tesseract and a main language, you could easily do the same with an API and/or a local AI.
minus-squareMacN'Cheezus@lemmy.todaylinkfedilinkEnglisharrow-up1·6 months agoLlava and Bakllava are two Ollama models than can not only extract text but also describe what’s happening on screen. Using tesseract-ocr, as the other guy suggested, is probably simpler and less resource intensive though.
I can’t imagine it’d be that hard to write some code that does that using an existing AI model.
You’re probably right.
I found a small command to run KDE Spectacle (screenshot software) with Tesseract so I can OCR a screenshot if I want to, I only had to install Tesseract and a main language, you could easily do the same with an API and/or a local AI.
Llava and Bakllava are two Ollama models than can not only extract text but also describe what’s happening on screen.
Using
tesseract-ocr
, as the other guy suggested, is probably simpler and less resource intensive though.