In Star Wars Rebels, there was an E-XD-series infiltrator droid that could quickly take inventory of everything in a Rebel warehouse. With the advanced object recognition capabilities of modern AI, it’s only a matter of time before an app for Android can accurately and rapidly identify and store objects in real-time from video capture. This could be similar to a home inventory app where users only need to capture video and move around the house instead of taking pictures and labeling items. When do you think such an app will become available? Also, what is the closest app available right now?

edit: I didn’t say offline or on-device, I don’t know why everyone assumes that. I mean a service offered through an Android app.

  • over_clox@lemmy.world
    link
    fedilink
    arrow-up
    12
    arrow-down
    1
    ·
    edit-2
    1 year ago

    Modern AI, as you’re seeing it today, is processed by massive data centers online with thousands of processing units running in parallel, not by your local device. Your device would be way too slow to expect any sort of realtime object recognition, at least with the current state of technology.

    TL;DR - I don’t think it’ll happen anytime soon, at least not on your local device. It would take a super fast and steady connection to the AI service.

    • deegeese@sopuli.xyz
      link
      fedilink
      arrow-up
      11
      arrow-down
      1
      ·
      edit-2
      1 year ago

      Don’t underestimate the potential for optimization when you can constrain the problem to a narrow range of uses. Model pruning and custom silicon go far. Voice assistants used to be purely cloud compute, but a lot of common use cases are done on device now.

    • umbrella@lemmy.ml
      link
      fedilink
      arrow-up
      4
      arrow-down
      2
      ·
      1 year ago

      dunno, some mobile devices are starting to ship with pretty passable gpus nowadays

      • over_clox@lemmy.world
        link
        fedilink
        arrow-up
        7
        arrow-down
        1
        ·
        1 year ago

        We’re not talking about image rendering, we’re talking about image recognition. Although they may seem related, they are not.

        It’s one thing to sling a 3D model and textures to a GPU, but it’s totally a different thing to take a photo and sling it against a humongous AI model being run at a datacenter with billions of images to compare it to.

        • umbrella@lemmy.ml
          link
          fedilink
          arrow-up
          5
          arrow-down
          1
          ·
          1 year ago

          image recognition is also done on gpus, a powerful enough gpu on say, a phone can do a variety of ai tasks

          a mobile integrated intel gpu can already do facial recognition on a video stream for example

          data centers have to be big because they centralize a lot of work

          • over_clox@lemmy.world
            link
            fedilink
            arrow-up
            3
            ·
            edit-2
            1 year ago

            Recognizing a face is one thing, that’s more or less just knowing certain geometries. Recognizing who that face actually is, or what model car that is, or whatever, requires processing through a huge database of information.

            Also, as of right now, not all AI systems are even smart enough to distinguish a human from a monkey. They both have faces yo…

              • over_clox@lemmy.world
                link
                fedilink
                arrow-up
                2
                arrow-down
                1
                ·
                1 year ago

                No shit Watson, that’s my whole point. AI as anyone today knows it is cloud based, meaning you’re tethered to the internet. Your device can’t process it all by its little measly lonesome self.

                • umbrella@lemmy.ml
                  link
                  fedilink
                  arrow-up
                  3
                  arrow-down
                  1
                  ·
                  edit-2
                  1 year ago

                  you should look up what frigate is.

                  my desktop gpu can generate ai art pretty quickly too

  • Zarxrax@lemmy.world
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    There is an app called Object Detector which does this. It’s not particularly accurate and can’t recognize a lot of objects though. It does run on phones in realtime though.

  • fartsparkles@sh.itjust.works
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    Don’t know about an Android app but YOLOv8 Detect and similar models can detect objects in videos and classify them.

  • QubaXR@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    GPT 4 with image uploads gets pretty damn close, though it’s not real-time and processed server side

  • Em Adespoton@lemmy.ca
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Think about what Apple currently has: dedicated ML processing chip with multiple cores, and yet the on-device object recognition is still an “overnight while plugged in” process for a single image, and only detects a limited number of object types.

    Real-time mobile offline OR is still the mythical “at least ten years out.” It needs improvements in processors, sample sets, training data and algorithms to get to real-time.

    • over_clox@lemmy.world
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      Yes, but OP is referring to realtime object recognition. Although we don’t have to wait very long for object recognition right now, we still have to wait a bit. That’s not quite realtime.

      • dope@lemm.ee
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Object recognition.

        Then “action recognition”. I mean, what the objects are doing. To each other and such.

        Then you have a narration machine. Which could be nice.

        • over_clox@lemmy.world
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Very related, not my work though…

          Self Aware Lara Croft

          This seems like something of an ideal effort, but it also took lots of human work to even prepare the AI model, and many many runs to refine it.

          We’re not even close to realtime object recognition at this point. Delayed recognition, yes. Realtime, no.

          Edit: The self Aware Lara Croft series recently dropped their latest video of Level 7…

          https://piped.video/watch?v=SYX4CwyZ1LM