The vast majority of computer vision research leads to technology that surveils human beings, a new preprint study that analyzed more than 20,000 computer vision papers and 11,000 patents spanning three decades has found. Crucially, the study found that computer vision papers often refer to human beings as “objects,” a convention that both obfuscates how common surveillance of humans is in the field, and objectifies humans by definition.
I am totally in favor of criticizing researchers for doing science that actually serves corporate interests. I wrote a whole thing doing that just last week. I actually fully agree with the main point made by the researchers here, that people in fields like machine vision are often unwilling to grapple with the real-word impacts of their work, but I think complaining that they use the word “object” for humans is distracting, and a bit of a misfire. “Object detection” is just the term of art for recognizing anything, humans included, and of course humans are the object that interests us most. It’s a bit like complaining that I objectified humans by calling them a “thing” when I included humans in “anything” in my previous sentence.
Again, I fully agree with much of their main thesis. This is a really important point:
And I do agree that sometimes, it’s wise to update our language to be more respectful, but I’m not convinced that in this instance it’s the smoking gun they’re portraying it as. The structures that make this technology evil here are very well understood, and they matter much more than the fairly banal language we’re using to describe the tech.
deleted by creator
This still just feels like a muddying of technical language. If you were to write an article about autopilot killing somebody and use object to refer to them, that’s certainly dehumanization, but saying that an object detection algorithm performs poorly on humans doesn’t feel like it is.
Part of the problem is that in general we aren’t talking about specialized human detection models that incorporate things like pose estimation. Instead it is almost always a general object detection alg, and referring to the same models differently based on the subject just adds muddiness.
I’m mostly familiar with AI within healthcare, and in my workplace, any released model is going to have a number of conversations and evaluations about the technical performance, practical impact on patients, and general ethics of the model. Those conversations blend, but it’s harmful to make the language less clear in any one of those contexts.