The fact that technology companies have been grossly negligent and irresponsible isn't a reason to not regulate them: It's proof regulation needs to be much, much stronger.
My suspicion is that the concern with machine learning over racism is rooted in two things. The first is just the general modern trend of accusing anything you don't like of being racist, because everybody hates racism and wants to fight it. And the second is the fear on the part of people who make a living fighting racism that machine learning might actually put them out of a job.
Because machine learning is basically a paperclip optimizer. You tell it to maximize a thing, it maximizes the thing and minimizes everything else. Racism isn't paperclips, so the paperclip optimizer will optimize for smashing it in favor of making more paperclips. And then they're out of business.
Because when you look at the criticism of this stuff, it generally looks like this. ~12% of the population is black, only ~5% of the selected applicants are black, the algorithm is accused of racism.
But nothing is that simple, because all kinds of things like income and education level and so on correlate with race, so you have to take all of those things into account before you can tell what's going on. And taking into account all of the available data is how machine learning works.
Which isn't to say that you couldn't make an algorithm racist. Tell it to optimize for applicants with a particular skin color and it does. But then your problem isn't with the algorithm, it's with the jackasses who asked for that.
What to optimize for is a much more general and difficult question. (Hint: Not paperclips.)
I don't get to how you go from this statement, to then again explaining exactly how racism is embedded in algorithms. By using the biased data we have in the real world...
Likewise, if the system is trained to duplicate human decision-making (like who gets loans), interesting things can happen: if the decision-makers unconsciously favored whites over blacks, the algorithm could wind up weighing skin color or stereotypically Black or Latino names negatively, meaning that the final model is explicitly racist, just because there is a correlation in the training data. That doesn't mean we shouldn't use deep learning, it means that it's not responsible to just fit the training data and ship without testing for such problems.
All you are doing here is convincing me that tech companies are just runaway trains with nobody at the controls!
Can you explain or understand the algorithms humans use to drive cars?
Machine learning is very widely used in the sciences and extremely beneficial to humanity in uncountably many ways and assuredly countless more to come. Of course technologies can be used for evil but so can nearly everything that exists. I believe your proposal comes from a desire to help or better the world, but to ban all non-human-readable algorithms is frankly ridiculous and demonstrates a naive understanding of the issue. It sounds a lot like the calls by the U.S. Congress to ban encryption.
- In medical: your doctor should be responsible for your diagnosis and drug company is responsible for defective drugs, except when they get away with lobbying and hiring good lawyers.
- In physics: I'm not sure if it's as big of a problem as in social networks. But consider this case: If you cannot reproduce the result of an experiment due to a ML model being cryptic, that would lead to huge credibility issue in science.
I'm not sure a human-readable algorithm exists for ranking all the web pages in the world based on natural language input. In fact, I'm pretty sure such an algorithm does not, and potentially cannot, exist given the absolute failure of all approaches towards NLP that weren't based on absolute masses of text data and complex models.
Are you willing to make Google 10% as effective to achieve your goal of a human-readable algorithm?
This generally has worked well. On the other hand, actually attempting to manipulate search results based on automated handling of content is what has given us countless of censorship debates or simply failure where even uncontroversial content is removed or downranked because it violated some sort of strange rule because it had a 'bad word' in it. On Facebook recently clothing ads for the disabled people were banned[1], because turns out the ML system only cared about the wheelchair, not the person in it.
It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.
[1]https://www.nytimes.com/2021/02/11/style/disabled-fashion-fa...
It's trivial to generate webs of fake, inter-related content and use that specifically to feed incoming links to valuable pages. Or to comment-spam websites so aggressively it ruins them. Or all of the secret deals between high-ranking sites to feed links even though the sites weren't related. There are countless examples of black-hat techniques to break PageRank.
I am sorry but you simply can't build a sustainable search engine without deeply understanding the user intent and the meaning behind the indexed pages.
there are also countless of adversarial examples to trick ML algorithms. In fact this is in many ways worse because of the 'idiot savant' character of ML systems, which are almost always oblivious to context and can be tricked in ways that aren't apparent from the design of the system.
In contrast to systems that are legible or even formally verifiable ML systems are entirely unable to provide any guarantees. When someone breaks pagerank at least it's apparent how they broke it. When an ML system mistakes a turtle with a fractal pattern on its shell for a gun nobody knows how to fix the system in any reliable way, other than feed it more data and pray.
One company controls 80% of what is found on the internet. They set rules, restrictions, penalties that are not public. They do not pass any sort of regulatory muster. They rip and tear through businesses standing in their way. They crush out a person's online existence through never explained reasons. They use every advantage they can to tweak a human's emotions, drive and needs to feed more and more advertisements.
You suggest those trying to use every advantage they can to rank higher unscrupulous?
Google's fight to keep search results crisp ended soon after they began selling advertising. Google long ago quit innovating search to be better for people, they've made it better for advertisers.
I agree that you don't need NLP to rank webpages (though it certainly helps), but you do need it to parse the kinds of queries given to search engines these days. The days of logical OR and NOT are long gone I'm afraid.
> It's actually fairly straight-forward to build recommender systems on transparent, graph-based algorithms and it gives you the added advantage of not discriminating in strange ways.
I think other commenters have addressed the PageRank issue, but I'd be super interested in papers doing the work you note above.
Absolutely. If it can't be done responsibly and ethically, perhaps it should not be done.
Tell me, how did your brain come up with what you wrote? How do I validate that it isn't racist, sexist, or slanted towards encouraging violence and harm?
Not to mention Facebook’s are even more difficult. Tangentially related, remember when you could use “View As” on your profile page to see what your profile looked like to others? It doesn’t work anymore, only works for Public and Yourself; you can no longer choose the person to view as.
It’d be great to test these algorithms. We can’t. They need to be designed and instrumented so this is possible.
male guest: "now first of all, let me just start by saying I'm not racist..."
female guest: "pfft..."
host: "ah see you made a noise there, but a lot of people accuse him of being a racist, so I think it's very helpful to know that he actually isn't one..."
In other words the solution to this should be antitrust enforcement and decentralization of power.
There's existing a term for people with this view:
An apt comparison.
This is quite a bizarre claim as there is famously an entire category of problems that are hard to solve but easy to verify: P vs NP