AI safety







Some years ago, people like Ray Kurzweil were popularizing the idea of a "Singularity" where at some threshold AI self-improvement would lead to sudden superintelligence.

This seemed like a serious possibility to me at the time, so I decided to investigate it seriously. Examining the physics of transistors, I concluded that planar transistors would hit fundamental scaling limits at ~22nm, with shrinking past that no longer improving performance, and if they could be made with more air around them for a better dielectric constant, then that would go down to ~17nm. Intel is now stuck at ~16nm.

I then investigated other possibilities like photonic computers, spintronics, and quantum computers, and concluded that photonics would only be viable for fiber optic communication between chips, spintronics wouldn't be practical, and quantum computers wouldn't be useful for general purpose computation. (Years after that, I eventually came to a similar conclusion as Gil Kalai, deciding that correlated phase noise would prevent large speedups from quantum computers.)

I also investigated the possibility of recursive AI self-improvement, and came to the sorts of conclusions that Robin Hanson later wrote about here.

Having considered the issue adequately, I then moved on to other topics.

For comparison, here's an excerpt from this 2001 article on Eliezer Yudkowsky:

He said he expects Singularity could happen in the very near future: "I wouldn't be surprised if tomorrow was the Final Dawn, the last sunrise before the Earth and Sun are reshaped into computing elements."

When one researcher booted up a program he hoped would be AI-like, Yudkowsky said he believed there was a 5 percent chance the Singularity was about to happen and human existence would be forever changed.

After another firm announced it might pull the plug on advanced search software it created, Yudkowsky wrote on Sunday to a Singularity mailing list: "Did anyone try, just by way of experimentation, explaining to the current Webmind instantiation that it's about to die?"

When that interview was done, Yudkowsky had already cofounded an "institute" for researching AI safety, and was living off donations to it — thus providing a nice example of the difference between epistemic and instrumental rationality.

Yudkowsky never made any contributions to programming or algorithm design, so the people making progress on machine learning at places like Google usually haven't been very interested in what he says.

Anyway, my point here is that there are no prizes for being right about this, although at least the people who were right haven't been specifically excluded like with the 2003 Iraq war.




contact with reality



To make progress, science and engineering need experiments. Without a reality check, you get people wasting their time on, say, supersymmetric string theory, or crackpot ideas for perpetual motion machines.

What do you do, then, when preparing for something new and important, something you only get one chance at? If we look at history, the answer is often something like "charge into machine gun fire over open ground".

Someone I know told me, while he was doing machine learning research at Google, that he thought development of strong AI should be done as soon as possible, because it would inevitably take over at least a datacenter, and datacenters would continue to get larger, so the risk from that would be smaller if it happened sooner. The idea is then that afterwards people would take the risk seriously, but I'm not sure that can be assumed. "We need a competent and destructive computer virus, so people will actually take computer security seriously." Well, we've had some destructive ones — maybe not exceptionally competent ones, but good enough ones. Do people take security seriously now? Companies certainly say they do, but do they? Have people, say, stopped using pointers without bounds checking or verification? And even the situation with computer security might be optimistic about the response — I think the "protomolecule" from "The Expanse" was supposed to be a metaphor for AI.

But some people correctly recognized how military tactics must adapt to new technology. My view is that the problem here is not the impossibility of an individual being correct, but the impossibility of society recognizing the correct individuals. Alternatively, rather than considering individual humans, you could consider individual perspectives (or "mental frameworks" or "framings") where each individual human has some which are good and some which are bad, and the correct approach must be synthesized from multiple frameworks that may originally be split between different people.

The best response to that type of situation, then, is to have a variety of approaches ready, and to be ready for some of them to fail. But the approaches must still come from somewhere, and usually they come from finding some analogous problems to extrapolate from.

Because we're considering things at a societal level here, this is not a problem of finding an analogy but a problem of incentives. Before Shockley took his work, Julius Lilienfeld created transistors by analogy from ion transport in fluids. Today, most smart people have learned not to make that sort of mistake. But, returning to an individual level, what are some analogous problems to AI safety?



heuristic blends



People can have widely varying ethics. To take examples from publicly stated positions of people I've talked with, Robin Hanson thinks it would be very good to have trillions of emulated humans working in slave-like conditions, because having more humans is good, while Sonya Mann thinks it would be good to somehow make every human sterile, because creating more humans is unethical.

How do people come to such disparate ethical conclusions? In general, people do what "feels" right rather than relying on grand principles. That involves blending various heuristics, such as the famous "5 moral foundations".

Different people and different cultures have different weights for each of those heuristics, and some people may have some heuristics that others don't. This results in different ethical evaluations of normal situations. "Principles" are then an interpolation between well-understood points which have been evaluated according to those heuristics. This is usually helpful, but extrapolation to extreme situations leads to extreme results that may vary widely between people. Normally, people assign a low confidence to such extreme extrapolations, but some people, such as Sonya Mann and Robin Hanson, assign a very high confidence to both their abstractions in general and their ethical extrapolation specifically.

So, instead of relying on an individual's judgement, human societies generally have voting systems. In other words, many perturbations of a complex heuristic blend are evaluated and the median is used.






The problem of getting an "AI" to do what you want can be considered an instance of the problem of getting computer programs to do what you want. So, we can consider principles that apply to other instances of that general problem.

Neural networks are prone to overfitting. A network trained to distinguish between dogs and wolves learned to look for snow, because it was given pictures of wolves on snow and dogs on grass. A neural network trained to recognize barbells was given pictures of people lifting them, and learned to recognize arms as well as the actual barbells.

And of course, humans are sometimes prone to superstitions.

There are various techniques to reduce overfitting, such as:
- K-fold cross-validation
- dropout
- data augmentation

In the case of a "strong AI" such techniques could be applied to simulated future or counterfactual scenarios rather than input data.


But the strongest approach here is the sort of examination that discovered that a neural network was finding snow instead of wolves. Abstraction is universal enough for humans to decipher bee dances and prusten at tigers, and it's universal enough for humans to understand machine learning systems well enough too.





A stress-minimization algorithm that can change its stress function easily will simply set its stress to 0. That is not useful. A useful self-modifying system must have its behavior constrained to useful actions, either by immutable structure or by inability to escape a local minimum. In general, useful machine learning systems require strong techniques for escaping local minima, which means that a local minimum of usefulness will generally be escaped from. So, my view is that any useful self-modifying machine learning system must be partly immutable.

The result of giving people free heroin is not ubermensch societies that use drugs to overcome akrasia; you generally get some mix of addicts that function worse (maybe a little worse, or maybe much worse) and abstainers. Yes, a number of highly successful people have used methylphenidate, or even large amounts of cocaine, but my point here is about the (lack of) apparent benefits of closing a feedback loop.






One of the common concerns about strong AI is that it can manipulate people into doing whatever it needs. Fortunately, people already have good defenses against convincing arguments. It's usually common to encounter flawed but convincing arguments made by smarter people, so the correct response is often to say, "I don't know where the flaw in your argument is, but it's probably wrong." Most people are well-prepared to take that approach.

(We also have many existing examples of effective manipulation, such as advertising, or popular fake news stories. Advertising is theoretically about informing customers, but in practice it's manipulation, and even the way that people giving a speech usually get an introduction is a way of overcoming a common safety mechanism that people have for defense against super-persuaders, rather than actually being for informative purposes.)

Anyway, the way to prevent people from being manipulated by a hypothetical superintelligence is probably basically the same way you prevent people from being manipulated by a cult — prevent isolation and over-exposure, avoid things like sleep deprivation, find people that aren't naturally gullible, etc.





Some people have argued that computer speeds would increase exponentially until scanned human brains could be simulated to make emulated humans, thus bypassing the whole problem of AI design.

Of course, progress has stalled well before that point. But also, people can't predict the electrical outputs of a single neuron from its electrical inputs. You can get maybe 90% accuracy, but as is often the case, it's the unpredictable deviations from a baseline pattern that carry much of the information. So, a map of neural connections is not enough information to simulate a human brain. Not only is a more detailed simulation impractical from a computational perspective, but considering that (among other things) DNA methylation is involved in memory formation, the technology for an adequate mapping does not exist.

ethical development


I'm not a Kaczynski-style primitivist, but neither do I think technology is automatically good. My view is: CFCs were bad, leaded gasoline is bad, but polypropylene is good; radio was good but television is bad; email and Wikipedia are good but Facebook is bad.

Image recognition algorithms can be used for guided missiles. Face replacement in videos can be used to create fake evidence of crimes being committed, or fake videos of politicians saying things. Face recognition can be used by Facebook or China to track people.

When developing something, you should ask yourself: What are the good uses, and how will this actually be used? I talked to somebody working on face replacement in video who was excited about that being used for video games to put the player's face on a character, but of course video games already have character editors, and adjusting 3d models is a better way of doing that. I talked to another researcher working on neural networks doing image processing, and when I asked about good applications, the best he could come up with was "robot arms that cook meals for disabled people". People want to believe that they're working on something good, so sometimes excuses and flimsy argments get made.

If you're working on some specific application that you think is good, then that's fine, but you shouldn't think that you're doing good just by working on some nebulous kind of "progress". If you're working for Facebook, there's no reason to actually try harder than you have to, or to favor substance over style. Technology is only good on average to the extent that it's used by good people.

A scientist has few options for limiting the usage of what they develop to good people, but there are some. Smarter people are less likely to ruin things by being dumb, and you can make tools that only make sense to smart people. People who are good at finding information on the internet are more likely to have accurate and balanced views, and you can make tools that only people who know how to use the internet can find and learn to use. And finally, humble people are less likely to ruin things with overconfidence, and you can make things with a humble aesthetic rather than a grandiose aesthetic.




back to index