Approaches to AI Values


Notes from Nell Watson

A “top-down” approach recommends coding values in a rigid set of rules that the system must comply with. It has the benefit of tight control, but does not allow for the uncertainty and dynamism AI systems are so adept at processing. The other approach is often called “bottom-up,” and it relies on machine learning (such as inverse reinforcement learning) to allow AI systems to adopt our values by observing human behavior in relevant scenarios. However, this approach runs the risk of misinterpreting behavior or learning from skewed data.

  • Top-Down is inefficient and slow but with a tight reign
  • Bottom-Up is flexible but risky and bias-prone.
  • Solution: Hybridise – Top-Down for Basic Norms, Bottom-Up for Socialization

Protect against the worst behavior with hard rules, and socialize everything else through interaction. Different cultures, personalities, and contexts warrant different behavioral approaches. Each autonomous system should have an opportunity to be socialized to some degree, but without compromising fundamental values and societal safety.

Technical Processes of OpenEth

Machine learning and machine intelligence offer formidable new techniques for making sense of fuzzy or ambiguous data, for identification or prediction.

Modelling the Ethical decision making process in ways that machines can apply is very challenging, especially if those models are to possess sufficient nuances to deal adequately with real-world scenarios and with human expectations. 

However, we believe that the latest developments in machine intelligence now make it feasible. Our intention is to apply the respective strengths of several pillars of machine intelligence to crack different aspects of these challenges.

• Layer 1 Supervised Learning – Explicit deontological do-not-tread rules

• Layer 2 Unsupervised Learning – Implicit dimensionality and tensions

• Layer 3 Reinforcement Learning – Socialization via carrot and stick

• Layer 4 Applied Game Theory – Optimising multiple-party interests

• Layer 5 Probabilistic Programming – Uncertainty Management

• Layer 6 Scenario Rendering – Modelling Interactions

• Layer 7 Inverse Reinforcement Learning – Modelling Intent


Layer 1 – Supervised learning

This is the initial component of OpenEth, which sets deontological rules and rates ethical preferences. OpenEth is designed with a philosophy of participatory ethics in mind, whereby the public can, acting along as well as in aggregate, describe their convictions.

OpenEth’s first layer has an advantage in that it does not assume or require a utility function for optimisation, unlike AI agents, which are assumed to require such a function.

Contemplation: Fast and Slow

We currently have a prototype of a Slow method, that of careful and methodical explication through a longwinded process. Being quite in-depth and involved, it’s not yet easy enough for a casual visitor to pick up and get going easily.


To capture more engagement, we aim to roll out a Fast version that can 

(a) Provide an immediate ‘toy’ for a site visitor to engage with

(b) Collect Ethical Intuitions


People often vote with their feet more truthfully than in a poll i.e. their actions are the true demonstration of their actual ethical decisions. This ‘fast’ method still may be helpful however for helping to sanity-check some ethical explications, or to fill in some of the gaps that the unsupervised methods have difficulty with.

Understanding ethics through dilemmas isn’t ideal for generalizability, because eventuallyone runs out of dilemmas. Furthermore, dilemmas may only partially tell you something about how and why people actually make ethical decisions.

Users of the Fast system will, on occasion, be invited to view the Slow version of the same dilemma, as a more gentle introduction to the ethical analysis process.

Where you can enter potential actions, and also rank actions against each other (you get two on the screen, and pick the better one). Reaction times may be weighted also, as well as the demographics of the user.

The goal for this initial layer is not to provide answers to complex situations, but instead to provide general heuristics. Risk and uncertainty are different things; risk relates to managing decisions given known alternative outcomes and their relative probabilities of happening, along with their impact. 

Uncertainty involves being obliged to make a decision despite having a lack of data.

Risk can be managed using complex mathematical models in well-understood situations, but uncertainty cannot, especially within a dynamic or chaotic environment. Transitivity or set theorem will not suffice often in complex and unbounded physical real-world.

In this layer we aim to provide rules of thumb that provide a robust approximation of gut instincts typical to the ‘man on the clapham omnibus’ which are based upon the ecology of various stimulus and response in a given environment.

This layer will also include a Toumin model system, to examine priors to check for inconsistencies that may indicate error (first stage error detection / correction).


Layer 2 – Unsupervised learning

For the practical implementation of ethics in the real world, rules are not enough. The ethical world is not polar, but rather may be described as a tensegrity network, whereby proximal and immediate goal-driven concerns compete in tension with matters of principle and integrity.

Ethics also involves implicit dimensionality. Multiple potential solutions may be acceptable or equitable, and no particular path may seem preferable over another in this instance. Rather than simply prevent an agent from doing something, this layer attempts to answer ‘what choice within a range of freedom may be optimal’. 

The figure above, courtesy of Google, provides an example of AI-generated gradations between some quite disparate concepts. This illustrates why we believe that we if we map the general edge cases where a certain rule will start to apply, we can apply machine learning in working out the rest in an appropriate manner, even in very complex multi-dimensions.


Layer 3 – Reinforcement Conditioning

Pro-Sociality and Ethics can sometimes conflict in troublesome ways – For example, a tensions between telling the truth, or a white lie that preserves the status quo.

This module provides pro-social reinforcement effects that result in an awareness of politeness and common custom. Whilst not ethics per se, machines with this layer become more relatable for human beings. It also attempts to resolve conflicts and tensions between absolutes and limited contextually-appropriate bending of rules.

This reinforcement may be gathered from harvested ethical intuitions, or potentially from humans wearing electroencephalogram (EEG) monitors looking for Error-Related Negativity (ERN) in Event-Related Potentials (EPR).

The plan is that autonomous systems will be able to infer when they may have alarmed someone or caused anxiety, without necessarily needing to be told explicitly. They can therefore dial up or down their speed of operation accordingly. Over time such social conditioning will produce a different range of behaviour for different people, which will be perceived as more pro-social.


Layer 4 – Applied Game Theory

OE’s first layers consider the ethical decisions of one single agent. Considering the possible ethical decision space of multiple agents requires game theory

I mean, the examples we have right now have multiple people. But always only one decision maker. If two people need to make a decision it gets way more complicated (and interesting). An oracle would ideally get multiple parties to agree on something better than a pure Nash equilibrium, encouraging collaboration rather than a mutual defection.

The Deepmind paper Multi-agent Reinforcement Learning in Sequential Social Dilemmas illustrates illustrates that agents may compete or cooperate depending on which strategies best fit its utility function. We can use such models to validate game theory implementations. 

Working towards a Pareto ideal would seem to be beneficial. We therefore intend to find methods for implementing the Kaldor–Hicks criterion for better real-world implementations of Pareto improvements and Pareto efficiencies. This should help to model better outcomes for solving differences between multiple agents. There are a few flaws in this, but it still seems very valuable.


Layer 5 – Probabilistic Programming

Probabilistic Programming introduces new ways of making sense of limited amounts of data, learning from as little as a single prior example (like human beings can). These capabilities mean that ethical deliberations by machines, and human interactions with them, can become much more personal, nuanced, and subtle. An agent can create a new solution for a wholly unforeseen or unimagined situation at a moment’s notice, if required (with a confederated consensus from other agents as a redundancy)

We also intend to experiment with sensitivity analysis (to explore how amending specific features might affect consequential outcomes), to Monte Carlo simulations (probability distribution) and full-blown Bayesian inferencing.


Layer 6 – Scenario Rendering

This layer involves modelling of interactions prior to ethical decision-making or updates to a ruleset. This layer will suggest black swan scenarios and bizarre (but true to life) combinations of elements and sequences in advance, to better prepare for strange situations, or to conceptualise more ideal ways to communicate difficult news.

It will also provide methods for estimating the consequential outcomes of certain actions, or individuals and potential societal externalities.


Layer 7 – Inverse Reinforcement Learning

Stuart Russell's Inverse Reinforcement Learning offers further techniques. IRL involves a machine intelligence agent observing the behaviour of another entity. Rather than simply emulating the behaviour, the agent will attempt to reason the underlying intent behind the action.

This mechanism can provide an ‘If I did x would it fit with known models of human morality?’ query as an Ethical Sanity Check. 

This requires capabilities to make sense of legal and fictional data, and so seems best to be used as a method of polishing and idiot-proofing decision mechanisms, rather than serving as the basis for ethical decision-making. Furthermore, media may include bizarre tropes and biases that would not be ideal for training a moral engine, but can greatly assist in teaching machines about cultural expectations and implied intent.