« Currently, the only known law enforcement client using Rekognition is the Washington County Sheriff’s Office in Oregon. (Other clients may exist, but Amazon has refused to disclose who they might be.) Worryingly, when asked by Gizmodo if they adhere to Amazon’s guidelines where this strict confidence threshold is concerned, the WCSO Public Information Officer (PIO) replied, “We do not set nor do we utilize a confidence threshold.” »
That's worrisome indeed. Understanding confidence thresholds is essential to using software like this. Thinking they don't "utilize" one means they don't understand the first thing about it.
> Understanding confidence thresholds is essential to using software like this. Thinking they don't "utilize" one means they don't understand the first thing about it.
Thresholding is not the only way to make use of a classifier that ouputs a confidence score. In particular, the slide shown in the article indicates that they use the confidence score to sort search results in descending order. That means that there is no fixed threshold and the PIO's statement is not worrisome at all.
The system described is much less dangerous (in terms of overconfident decisions) than one that only returns results matching a threshold, because it means that the user also occasionally sees low-confidence results and learns that they can't simply defer decisions to the machine.
"We were following the AI leads" is the perfect excuse for parallel construction[1] (evidence laundering).
If Rekognition supports limiting the search to the people in a specific town (or other small group), it might be possible to force someone to be somewhere on the list of matches. That would give bad actors the perfect tool for manufacturing "probable cause".
I've seen how capricious some can be with their work tools. I dated a girl whose dad ran a background check on everyone she dated without their knowledge or consent. I was 19 at the time and it seemed wrong, but reporting one of the higher up police officers in my town who now knew all my details seemed like a freedom limiting move.
You're assuming that there is a wide distribution of confidence levels in every search. Its very possible with grainy footage that no result is above a reasonable threshold, but the results will still show an ordered list.
Without _some_ threshold or training in what these kind of scores mean, its easy for a user to think "well they were the best match" and take a negative action like bringing them in for questioning or introducing the match as evidence, even if its only 10% +/- 5%.
Seeing a single result just above threshold leads a person to believe it's a match. Seeing the 200 people within a few points of the threshold might lead someone to think it was actually completely indeterminate with a large number of potential suspects not far from each other.
>Thinking they don't "utilize" one means they don't understand the first thing about it.
They probably have at least one person (or several) who knows what they're doing. After discussing the situation with the department They probably just told the department to set it to whatever the minimum is because the department probably expressed a desire for the maximum possible number of matches because they want to manually pick through the list rather than trust the (new and untested) machine to spit out a short list.
As other commenters have mentioned, setting the software wide open has the side benefit of exposing the users to a lot of obvious false positives which (hopefully) prevents them from forming a habit of assuming the machine is usually or always right. Unless there's a drastic increase in transparency I'm still not a fan of the police deploying facial recognition tech.
The language used by the PIO is worrying, but further down in the article he explains that they don't use the software for "matching". And that makes sense... Instead of filtering out all but the best matches by setting the confidence threshold really high, you set it much lower and sift the results manually.
This is essential just a tool to quickly filter a very large data set down to possible matches. Interestingly enough, the police and many of the letter agencies have already had software that does this for over 25 years. Rekogition will likely perform better, but it still seems weird to me that Amazon would make this move at all. What is the impetus? They obviously aren't going to make money off of this, compared to their other income sources.
I'm disappointed that a tools with a record of poor testing on women and non-white people is even allowed in "production" environment. We need to have a higher bar on what counts as commercial-ready for AI and facial recognition in things that impact people, and in particularly the justice system. We need to be much more cautious.
<The technology provides leads for investigators, but ultimately identifying suspects is a human-based decision-making process, not a computer-based one.>
It does provide leads to police officers, which is a great tool. Many criminals are recidivists and are already existing in the database.
Many police officers know their 'clientele' very well by face and name, so they can run checks on the computer anytime they want as long as it is justified. I don't see anything in using a automated way of doing it by picture.
> It does provide leads to police officers, which is a great tool.
From now on, whenever a computer crime is suspected I'll make sure to forward your name as a lead on the off-chance that you had anything to do with the crime.
Policing should start from available evidence leading to suspects, not start from suspects to turn to see which match the evidence.
That is a nice sentiment, but I can tell you based on personal experience that it is a pipe dream. Leads are usually the bottleneck or the foundation of an investigation. When they start drying up, everything slows down, and everyone starts to get worried that they will just run out of places to investigate. Rekogition is probably marginally more effective than the face matching software that our LE already uses to find possible leads.
>From now on, whenever a computer crime is suspected I'll make sure to forward your name as a lead on the off-chance that you had anything to do with the crime
How exactly do you believe investigations work? You don't think that rounding up the usual suspects and asking questions is a legitimate investigative tool? There isn't always a direct link from the available evidence to a person. Should they just give up and close the case?
Let me put it another way; someone is stealing packages from your doorstep. You happen to be the neighbor of a person you know has been arrested for burglary. That fact doesn't even enter your thought process?
Past behavior is a good indicator of future behavior and the real world isn't CSI. Detectives need to follow any leads they can get their hands on.
>From now on, whenever a computer crime is suspected I'll make sure to forward your name as a lead on
>the off-chance that you had anything to do with the crime.
Policing start from any leads available, not evidences. Evidences are used in a court of law.
This article, and all the "fear of FR" articles in general do not describe how FR is used. 'What is the process of an FR analysis' sure would be useful in this discussion.
It's quite simple: a face image is submitted, and then a sorted by "match score" result is available. This is the same gallery, now sorted in respect to the submitted face image. That "confidence threshold" (also called verification threshold, and a number of other similar terms) is the threshold within the sorted results after which one can generally discard possible matches between the submitted face and a face in the gallery because the similarities between the submitted face and the candidate face have little resemblance. Imagine a series of faces on a horizontal row: the left-most is the highest match score, and moving right is the next highest in the sort, and so on moving to the right. The left-most image looks the most like the submitted face image, and as one moves right each image is a little bit less similar.
This confidence threshold loses accuracy when a) the submitted image is a less than a perfect image (often the case), b) the gallery is composed of images that are less than perfect (often the case), and even c) the types of cameras used to generate the images are radically different, with different dynamic ranges and potentially poor video encoding parameters. As a result, a few things are done: 1) multiple images (even if less than ideal quality) (and if available) are added to the gallery for each person, and 2) the confidence threshold is treated as a soft number, with sliders even to dynamically grow and shrink the net cast by the analysis. Also notice this is interactive - that implies a human operator. The entire FR process is a selection guidance tool, not an authoritative selection tool. The FR operator is keenly aware of the fact that they are sifting through a large number of low quality data points. FR is a tool to help sift sand.
Disclaimer: I am a lead developer of a leading FR product. Not Amazon's.
Also, to speak about racial bias in FR: it is a stretch to call it "racial bias", perhaps more of a "low facial dynamic range invisibility". Both people with very dark skin and people with very pale skin have the same situation: there is very low variation from the brightest non-highlight part of their face to the darkest part of their face. This means there is very little distinguishable information for FR to use, if it can even identify the portion of an image that contains a face. Such individuals are simply hard to see with FR, and when identified and tracked as a face, there is very little information to use for differentiation. I don't know if I'd call that a "racial bias". These people are almost FR invisible, which benefits them in this situation, making them very hard to distinguished with FR analysis.
Wish you'd spend your time on something that will move the needle in a more positive direction. The long term applications of the tech that you are making are downright scary.
Is there a path to artificial general intelligence that doesn't have computers being able to recognize people's faces in the same way that people are able to recognize each other?
There are quite a few simpler paths that do not require AI. I hope - but I am not sure - that in the longer term we will come to our senses and see that without proper safeguards in place stuff like this has more negative than positive applications. Once we cross that bridge we can start to apply the tech within those lines, but if we do not draw them first we will have a lot of grief and those lessons will be that much harder learned. Maybe that's the only way we will progress but I really would like to see this happen without a large mess.
There is a joke that technology is like four wheel drive (itself a form of technology), you will still get stuck, but in harder places. In other words, the tech that gets you into trouble is not necessarily the tech that will get you out of that trouble, it may need a more advanced technology to solve a problem created by a previous level.
It's a very tough decision as a scientist. Plowshares and weapons are made from the same steel, one mans fertilizer is another mans bomb. But FR to me is enabling tech that has limited upside and unlimited downside.
Who do you think is correcting FR users misunderstandings and insuring the poor practices you're reading about do not come true? I think my skills are critical here.
Making FR more 'perfect' will also allow the less desirable regimes to go after their dissidents in a more 'perfect' way, there are some historic implications that might have changed the balance completely. Now, of course there is such a thing as the distinction between the tool (say, a hammer) and its use (say, hammering a nail in), and the alternative uses that those tools are being used for.
The people at Google and Facebook are slowly figuring out that their data driven perfect future may not be one that they themselves (or their descendants) would like to live in. Face recognition for computers enables a whole pile of use cases that are downright scary and I would love for such tech to be stalled as long as possible.
What makes you think that you (or anyone else for that matter) are capable of preventing the abuse of FR? I appreciate that you and many others in the space are working with the best intentions, it just seems like any potential benefits of FR are dwarfed by its potential use as a tool of oppression.
What makes you think that you (or anyone else for that matter) are capable of preventing the abuse of FR by walking away and declaring an easily created technology simply taboo? It is going to be developed, like it or not. May as well have intimate knowledge of it, help to educate others, and actually know how to defeat it. Walking away you're just blind.
So when a client (read: police department) comes to you to create a predictive model of a “likely suspect”s face, how will you reconcile centuries of biased and discriminatory policing as your training set?
Per the racial bias comment, that does sound somewhat dismissive of the concept. Recognition performance and imposter/genuine dists as a function of country of origin are explicitly measured in a few places within NIST's face recognition verification tests after all. Certainly dynamic range can play a role, though my impression from other results makes me think that is an oversimplification.
Why is it considered acceptable risk to deploy something that can't recognize large swaths of the population. If your FR fails in tests to recognize Oprah Winfrey - it's broken.
A poor understanding of ROC curves. A concept that is surprisingly difficult to explain to people with low maths abilities and will lead to misuse. Even without the "confidence" not being a true probability. My anecdote is I made a classifier to speed up the search through a list of images. It worked great and saved lots of money. There was an auto accept (no review) for a confidence greater than 95%. After a while they get back to me and say "we have run some analysis and it is getting about 5% or the auto accept wrong, how can we make it get them all correct?".
Of course set the threshold to 100%.
But then it doesn't accept anything.
Then give me a few years and more devs to develop the classifier to make it more powerful.
Can't you just tweak it to make it better...
And so it goes
That's worrisome indeed. Understanding confidence thresholds is essential to using software like this. Thinking they don't "utilize" one means they don't understand the first thing about it.