When do you think there will be an Android app that could accurately perform real time object recognition?

PumpkinDrama@reddthat.com · edit-2 2 years ago

When do you think there will be an Android app that could accurately perform real time object recognition?

over_clox@lemmy.world · edit-2 2 years ago

Modern AI, as you’re seeing it today, is processed by massive data centers online with thousands of processing units running in parallel, not by your local device. Your device would be way too slow to expect any sort of realtime object recognition, at least with the current state of technology.

TL;DR - I don’t think it’ll happen anytime soon, at least not on your local device. It would take a super fast and steady connection to the AI service.

deegeese@sopuli.xyz · edit-2 2 years ago

Don’t underestimate the potential for optimization when you can constrain the problem to a narrow range of uses. Model pruning and custom silicon go far. Voice assistants used to be purely cloud compute, but a lot of common use cases are done on device now.

over_clox@lemmy.world · 2 years ago

Yes, I’ve been testing FUTO Voice Recognition lately. It’s awesome as hell, but it is far from realtime. And this ain’t even object recognition, it’s only voice recognition.

https://voiceinput.futo.org/

https://play.google.com/store/apps/details?id=org.futo.voiceinput

☂️-@lemmy.ml · edit-2 5 months ago

deleted by creator

over_clox@lemmy.world · 2 years ago

We’re not talking about image rendering, we’re talking about image recognition. Although they may seem related, they are not.

It’s one thing to sling a 3D model and textures to a GPU, but it’s totally a different thing to take a photo and sling it against a humongous AI model being run at a datacenter with billions of images to compare it to.

☂️-@lemmy.ml · edit-2 5 months ago

deleted by creator

over_clox@lemmy.world · edit-2 2 years ago

Recognizing a face is one thing, that’s more or less just knowing certain geometries. Recognizing who that face actually is, or what model car that is, or whatever, requires processing through a huge database of information.

Also, as of right now, not all AI systems are even smart enough to distinguish a human from a monkey. They both have faces yo…

☂️-@lemmy.ml · edit-2 5 months ago

deleted by creator

over_clox@lemmy.world · 2 years ago

No shit Watson, that’s my whole point. AI as anyone today knows it is cloud based, meaning you’re tethered to the internet. Your device can’t process it all by its little measly lonesome self.

☂️-@lemmy.ml · edit-2 5 months ago

deleted by creator

qwertyqwertyqwerty@lemmy.world · 2 years ago

Honestly, I expect some form of it in the next five years. Tech can move fast when it wants to and there’s 💵 involved.

over_clox@lemmy.world · 2 years ago

Is Moore’s Law Finally Dead?

Zarxrax@lemmy.world · 2 years ago

There is an app called Object Detector which does this. It’s not particularly accurate and can’t recognize a lot of objects though. It does run on phones in realtime though.

ryathal@sh.itjust.works · 2 years ago

An isolated phone, not for a while. A phone with a dedicated 5g connection would be pretty close.

4am@lemm.ee · edit-2 2 years ago

Frigate does this on a raspberry pi or Intel NUC already. Would be power hungry in a phone but if you are not training ML models and just looking for objects they already know, the tech would be ready today.

EDIT: Here’s Google’s article on how to create your own TensorFlow app for Android https://developers.google.com/ml-kit/vision/object-detection/custom-models/android

fartsparkles@sh.itjust.works · 2 years ago

Don’t know about an Android app but YOLOv8 Detect and similar models can detect objects in videos and classify them.

Altima NEO@lemmy.zip · 2 years ago

Google Goggles used to be able to do that

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 🏆@yiffit.net · 2 years ago

Google Lens still can.

QubaXR@lemmy.world · 2 years ago

GPT 4 with image uploads gets pretty damn close, though it’s not real-time and processed server side

Em Adespoton@lemmy.ca · 2 years ago

Think about what Apple currently has: dedicated ML processing chip with multiple cores, and yet the on-device object recognition is still an “overnight while plugged in” process for a single image, and only detects a limited number of object types.

Real-time mobile offline OR is still the mythical “at least ten years out.” It needs improvements in processors, sample sets, training data and algorithms to get to real-time.

breadsmasher@lemmy.world · 2 years ago

Doesn’t google lens basically do this already?

over_clox@lemmy.world · 2 years ago

Yes, but OP is referring to realtime object recognition. Although we don’t have to wait very long for object recognition right now, we still have to wait a bit. That’s not quite realtime.

dope@lemm.ee · 2 years ago

Object recognition.

Then “action recognition”. I mean, what the objects are doing. To each other and such.

Then you have a narration machine. Which could be nice.

over_clox@lemmy.world · edit-2 2 years ago

Very related, not my work though…

Self Aware Lara Croft

This seems like something of an ideal effort, but it also took lots of human work to even prepare the AI model, and many many runs to refine it.

We’re not even close to realtime object recognition at this point. Delayed recognition, yes. Realtime, no.

Edit: The self Aware Lara Croft series recently dropped their latest video of Level 7…

https://piped.video/watch?v=SYX4CwyZ1LM