Comments and Feedback on the winning proposals of the Guardian's Challenge


By Ceyhun Karasu, AI Ethics Advisor to EthicsNet

This document is a compilation of my comments and suggestions on the winning proposals of the Guardians’ Challenge. First things first, EthicsNet’s initiative motivates me and shows that important people in the field of AI do care about ethical issues and its technical difficulties. As we all know, it’s crucial to think carefully about the implications of AI driven systems in society. The challenge offers people from different backgrounds to come up with proposals and share their views, which helps people like me (who research independently) better understand the community revolving around AI ethics.

It’s very important to have a cross-disciplinary and interdisciplinary understanding of AI R&D in general. This can help us mitigate possible existential risks, but also globally help us design a better future where AI will play a crucial role. My feedback on the proposals is a compilation of suggestions, critical argumentations and questions that I think are important aspects when thinking about AI ethics. The feedback is written with the vision of creating a better database to make AI systems much safer and morally aligned with human values. The timescale of ethical behavior is an issue that I expected to be discussed in the proposals, but few talks about it. Proposal 4 briefly talks about it by referring to datasets that evolve over time.

Ethics is time dependent, meaning that an action is taken with a specific timeframe in mind which is voluntarily or involuntarily embedded in the causality of the action. An action that might seem ethical in a given timeframe might not be at all ethical in another timeframe, the long versus short-term decision-making debate relative to an individual’s or entity’s moral state should be considered. Keeping the aforementioned point in mind, we understand that ethics cannot be reduced to simple right or wrong action taking and doing the good or bad thing. It is not an issue of good/correct and bad/wrong action which is still a remaining reasoning mechanism dating back to humanities feudal and religious history.`

Overall, I think the approaches are going in the right direction to create a database for ethically aligned AI systems. The component of interactivity between users and AI is also a recurrent feature which is a positive attitude to keep knowledge sharing as diverse as possible. There are certain things that could have been defined a little bit more precisely.

PROPOSAL 1: A developmentally-situated approach to teaching normative behavior to AI (G. Gordon Worley)

1.) “…philosophy the study of ethics quickly turns to metaethics…” (p. 1)

a) Philosophy deals with ethical questions that can indeed lead to metaethics but that is only a small portion of the whole picture. There is also work done where ethics is more systematically analyzed that falls in the domain of formal ethics (see Gensler 1996). Formal reasoning systems such as logic allows philosophers to build complex models by introducing ethical axioms and resolve ethical dilemmas based on those axioms. On the other hand, philosophers studying applied ethics, analyze real-world case studies (e.g. abortion, autonomy in healthcare, human impact on global warming etc.) and utilize ethical frameworks to assess moral values and principles, and study how moral values and principles are formed or evolve in time.

Evolutionary ethics is also an example where both philosophers and researchers from the sciences such as biology, behavioral psychology and sociobiology work together. There are also interesting exotic approaches to deal with certain ethical problems by using Bayesian epistemology which is a domain inspired by a combination of game theory, analytical philosophy, statistics and ethics. These models can provide an insight in how ethical values are learned and how these values can both be generalized to unknown cases and personalized to specific known cases. Overall, philosophy has grown into a vast interdisciplinary domain like most disciplines and experimental philosophers are trying to answer deep philosophical questions with both scientific methodology and philosophical reasoning. Philosophical studies related to moral relativism and cultural relativism are examples that show that philosophers also study cultural development and personal development for the formation of ethical values and principles.

2.) “Rather than developing a coherent ethical framework from which to respond, humans learn ethics by first learning how to resolve particular ethical questions in particular ways, often without realizing they are engaged in ethical reasoning, and then generalizing until they come to ask question about what is universally ethical (Kohlberg, Levine, & Hewer, 1983).” (p. 1)

a) The top-down approach is very interesting as it looks at the psychology and personal development of cognitive beings in order to deduce ethical principles or learn about the formation of moral values. The vision is interesting and developmental psychology can be used as a methodological inspiration rather than a direct implementation. b) That said, AI is not human (yet), meaning that developmental methods that work to understand how cognitive beings develop moral values might not work for AI systems because of biological differences. On the other hand, the important goal of EthicsNet is to create a database for AI systems that can prevent human error reasoning strategies or at least mitigate human error reasoning strategies in order to act and behave ethically. Using developmental psychology to understand how these errors are learned and developed is a good alternative to create a database that brings human bias and human error into the equation. However, there is still an epistemic gap in the learning mechanisms between cognitive beings (e.g. Humans, primates) and artificial cognitive systems (e.g. AI, AGI, robots) in that both systems do not share the same evolutionary path.

c) That said, maybe it is a better idea to create a robust framework of ethical values and understand the mechanisms of how these ethical values are formed and also have a vast ontological picture of these ethical values. With time, allow the robust framework to adapt based on previous experiences (here developmental psychology can give us some insights in the learning mechanisms). Consequently, embed these frameworks to AI systems and, to stay coherent with Kohlberg’s theory, push AI systems to reason at stage 6. Preferably don’t (yet) allow these systems to take higher order actions before consensus is reached between the AI system’s predictions and human supervision, or we can let the AI systems take actions in a simulated environment before taking higher order actions.

d) The following questions arise, if we solely focus on the works of developmental psychologists such as Commons, Piaget and Kohlberg: Should the AI system mimic the same developmental stages as human beings to ultimately reason ethically about given situations? Or should/can it skip some developmental stages in order to ethically reason about given situations? What tells us that these evolutionary developmental stages are aligned with the future of humanity and the universe? Before finding an answer to the last question, we as humans have to globally set milestones and goals that we can reach within the next century. For as long as humanity cannot come to consensus and propose global policies, the space of future possibilities will be unknown and so will every step we take be supported by overestimated assumptions.

e) Another argument against solely using developmental psychology to create “kind” AI is the fact that it’s not yet clear whether humans are the best evolutionary example that promotes acting kindly. Mimicking directly the evolutionary stages of human development is in my opinion not the right alternative to create AI’s that are “kind”. Although, several studies in developmental psychology are very well documented, we remain with data that is not diversified enough or noisy. Moral values and learned ethical principles are not only personal but also shared and can change according to the social factors in which a human finds itself. More research around socially learned moral values can definitely help make the data less noisy or at least add depth.

3.) “Commons has argued that we can generalize developmental theories to other beings, and there is no special reason to think AI will be exceptional with regards to development of ontological and behavioral complexity, thus we should expect AI to experience psychological development (or something functionally analogous to it) and thus will develop in their moral reasoning as they learn and grow in complexity.” (p. 2)

a) It would be good if we can find more studies about whether developmental theories can be generalized to AI systems and offer an understanding of how to create databases that will lead AI systems to act ethically. Developmental psychology can offer deeper insights in understanding the first crucial steps that a cognitive being takes to start reasoning ethically. This could indeed be used as a conceptual learning framework to develop a rough database for AI systems. On the other hand, the complexities of the brain are not yet immediately comparable to algorithms or neuromorphic hardware that we have today. Currently, AI systems use very simplified cognitive models to learn and most of the time need a lot of data to give approximative answers yet are performing impressively better at very specific tasks. It is still an open question whether AI systems will be able to reason ethically.

4.) “Assuming AI systems are not exceptional in this regard, we consider a formulation of EthicsNet compatible with developmentally situated learning of ethical behavior.” (p. 2)

a) AI systems can become exceptional as they might have the capacity to “learn” ethical reasoning mechanisms much faster and make parallelly more accurate predictions than human beings. This procedural advantage can be based on the quantity and quality of available data or simply the advances in implementing cognitive models to computational modeling.

b) Another factor that makes AI systems exceptional is that it can be atemporal and not constrained to the same physical limitations as human beings. These are the questions that come up when I think about the compatibility problem. Would an AI take action once it reaches a level of ethical reasoning that is more “advanced” and forward-thinking than all human beings combined? In other words, what is the spectrum of ethical values that an AI system will consider and how relevant are these values to the future of human civilization?

5.) “The analogous cases for AI systems will of course be different, but the general point of presenting developmentally appropriate information holds, such as eliding nuances of norms for children that adults would normally consider.” (p. 3)

a) How to proceed if the developmentally appropriate information surpasses even the most ethically conscious human being that acts as a guardian?

6.) “… clues to humans about the AI system’s degree of development …” (p. 3)

a) How can we categorize, qualify or quantify the “AI’s degree of development” if it surpasses the degree of development proposed by models of developmental psychology?

7.) “Given this context in which EthicsNet will be deployed, it will also be important to make sure to choose partners that enable AI systems being trained through EthicsNet to learn from humans from multiple cultures since different cultures have differing behavioral norms.” (p. 4)

a) This is indeed vital to create AI systems that are cooperative and culturally robust. The challenge here is of course reaching consensus between opposing views and moral values that are embedded in a given culture (e.g. abortion is not welcomed in cultures where religion has the upper hand).

8.) “…framework in which AI systems can interact with humans who serve as guardians and provide feedback on behavioral norms…” (p. 4)

a) In general, teaching an AI human values like we teach children values is not immediately the best option. No matter what you teach a child it will find ways to circumvent these norms as evolution pushed us in that direction so that we do not fall prey to predators. An AI system needs to learn to form a moral compass on its own and weigh the values that are thought where human interaction should be minimized to prevent embedding human bias. An AI has to be able to reason about ethical dilemmas and think neutrally without taking sides, while human beings should solve moral dilemmas on their side and occasionally interact with AI systems to reach consensus. After reaching consensus based on the interactions between humans, the interactions between AI system’s reasoning and interactions between humans and AI systems, only then can we vouch for AI systems to act according to that specific dilemma. Finding mechanism to interpret and contextualize information is key to develop “safer” and “ethically conscious” AI systems. Supervision and interchanged interaction between AI and humans can solve the problem of contextualization but there is a problem. What are the conditions and requirements to supervise? To what extent does that person have to understand complex ethical dilemmas in order to teach it to the AI or embed it in the algorithm? What about more abstract concepts such as belief systems and behavioral patterns?

Conclusion for Proposal 1

The idea of supervised learning that enables human-to-AI interchanged interaction is a good dynamic approach even though it has its complexities. Overall the constructivist approach of this proposal is very interesting and if guided in the right direction can be a good first step to introduce ethical reasoning mechanism to AI systems. If we stick with the developmental psychology approach, we also have to consider that humans are involved in different ecosystems (Bronfenbrenner 1979) which can dramatically influence the moral compass of a human being. AI systems have the capacity to not be directly influenced by these different ecological systems. If we want AI systems that are culturally robust and have an understanding of different interpersonal and social relations, we must find a way to embed or at least teach the different ecological systems and how these systems interact. The precise interaction is still unclear in today’s scientific literature because of the ongoing nature versus nurture debate. However, there is good scientific work that provides mathematical models describing the internal cognitive processes of behavioral development which in principle can be converted to computational models. One crucial point and difference between humans and AI systems is simply put the difference in physical limitations and constraints. Humans are constrained to time and space. AI systems can bypass this, making them less vulnerable to physical constraints but at the same time less able to “understand” what it feels to have a body in time and space. Physical properties and sensory motor experiences do influence our ways of living and thinking, which should also influence how we learn moral values, the way we think about ethics and how we act morally. One simple example is the fact that it is not a good idea to jump from a roof to save a person that is being harassed on the street right below the building. AI systems need to understand the physical limitations of humans to provide decisions or act morally.

PROPOSAL 2: Crowd Ethics (Ovidiu Dobre)

I agree that an elite group of people with moral consciousness is not able to perceive all possible ethical motivation and that a collective approach of image classification with an ethical attitude can be an alternative option. However, collective image labeling is not enough information to decide whether the situation is ethical/right/wrong. Ethical reasoning needs causality, interpretability, contextualization and associative thinking with many counter examples. A video or a cluster of images depicting a scenario would be epistemically more valuable for ethical reasoning and labeling.

This proposal might be a little bit too simplistic for a very complex problem. The examples of the images that are labeled as ethical or not is also very narrow. For example, Image 3, depicting a child that doodles on the wall of the bathroom is labeled as “not ethical”. This is a good example where contextualization will be extremely important, because a kid doodling on the wall of a bathroom might be a good thing to improve the creativity and imagination of the child. The intention of the proposal might be genuine, but the simplicity can cause very noisy datasets if the context is not provided. The idea of the platform and the collectivistic approach is interesting, but the approach in defining what is ethical and not is again rather simplistic.

The gamification of a platform to drive it forward is also very tricky to agree with in the context of ethics and I would not agree that gamification motivates people to act ethically nor kindly. Gamification can promote competitiveness (bias) which might unintentionally influence the user’s perception about whether an image depicting an event is ethical or not. The psychological factor of participation and how to minimize biased decisions is also an important issue that should be researched.

PROPOSAL 3: Built-in Artificial Empathy as Baseline in AI (Wu Steven)

The conceptual model proposed is interesting and seems to be a combination of the two previous proposals but with video material as data source instead. Empathy plays indeed a crucial role in moral judgement and being able to feel another person’s situation. However, we are still stuck with the dilemma of physical constraints and context that I discussed earlier. There are several things that I discussed in the previous proposals that are also applicable to this proposal 3 such as: the problem involved with human annotation and crowd ethics (e.g. conditions and requirements to annotate, moral capacity of the annotator, supervised vs. unsupervised learning to prevent human error). The systematic approach of proposal 3 is in its essence in the right direction. However, difficulties might arise when thinking about the bigger picture. I’ll point out some difficulties that are presented in section 3 of the proposal in the following paragraph.

1.) “Materials and methods” (p. 2)

a) Sourcing video materials and extracting event clips through private or public platforms with proper licensing can work for certain video materials, but would be more difficult for sensitive video materials. Sensitive video materials can help understand more complicated situations such as case studies by criminal justice departments. One example is for instance having access to the data recorded by law enforcement and police interventions (e.g. Axon). These cases can be valuable information to create empathy maps as it reflects some reality of the justice system, while videos on YouTube or Vimeo can be fictional and not always realistic. Others might also induce privacy and public policy issues if for instance video recordings of public surveillance cameras need to be accessed.

b) The difficulties that might arise with empathy maps is at the stage of contextualization. Video sources do not give us the full picture of the act and causalities in which a person is involved. We all know that empathy is something very dynamic with different layers and dimensions. An empathic act can at a given timeframe for a given situation turn out to be a devasting for other situations or people. This is of course known as a moral dilemma which should be discussed. Let us say that a man is in financial difficulties. The only way to provide and survive is by robbing a bank or by committing an illegal crime. That person is ready to risk its life to protect the future of its family and community. From the viewpoint of the person that will commit this crime, the act is altruistic, in the sense that the person will provide and protect the lives of its family or community. However, from the viewpoint of the legal system the person is acting immorally by bringing the lives of other people (outside or inside the community) in danger and going against its civic duty as citizen. So, on one side we can conclude that the person committing a crime has a level of empathy and from another perspective we can conclude that the person has no empathy for people outside its own community.

We can even complicate the situation by going prematurely to the future and presume that the child of the person committing a crime was an excellent student. It turns out that the child has the potential to become the person curing cancer and other pathological diseases, saving millions of lives. We can see that the space of future probabilities can grow exponentially and that even the most sophisticated statistical analysis will not be able to predict how the criminal act of that person committing a crime will affect future lives.

This is one core element of why humanity has difficulties to come up with overarching policies and global agreements, as every moral act is situation and context dependent. In other words, an empathy map can become very useful for less complex situations and acts where enough historical information is available. It can be utilized for simple environments or systems where enough information is available. It can also be useful for situations where there is universal consensus based on shared principles and natural laws (e.g. video showing an entity, individual or a group of people polluting the environment). However, it won’t be useful to map out more complicated “stories” and or situations where not enough historical information is available. Historical information can give an insight in the intention of the act in which a person is involved.

c) The implementation of automated event detection algorithms will again depend on how much historical information we have about a given situation, how accurate the algorithm can interpret the information and whether we have a clear understanding of the moral implications of that given event in order to generalize. The source and format of the data will also influence this procedure.

2.) “The fact that we were born with empathy makes us different from other species and artificial products.” (p. 3)

a) It would have been better if this was formulated differently such as “the fact that humans have the ability to empathize in a more advanced and complex way than other living beings”. It has been shown that primates such as chimpanzees and dolphins do have a degree of empathy towards other living beings. This is due to mirror neurons (Frans de Waal, Jean Decety et Vittorio Gallese) that can play an important role driving the sentient being to unintentionally mimic the behavior of another living being. Although scientific literature and opinions are diverse, it does show us an alternative scientific view that empathy is not unique to humans and that it has a biological foundation.

b) The question whether different forms of empathy are innate or is developed throughout experience should also be considered. The concept of mirror neurons is in the literature of cognitive science vague and still a hypothesis (Hickok 2014), nevertheless still plausible.

PROPOSAL 4: We're involved in AI ethics with the right concept! (Adam Bostock)

a) The underlying ethical approach and proposal is interesting and relevant. However, the method to implement this approach is not well defined and too vague. The proposal refers to implement it by using a technology that is easy to understand and easy to access: “simple to understand, and accessible via a wide range of devices and platforms.” It refers to the internet and the online social network. However, we know that making datasets (where more values are relevant) too accessible to the public can also lead to very noisy datasets or biased datasets. It is perfectly plausible that groups of people might poison or negatively bias the dataset (e.g. alt right groups or extremist groups), and this might jeopardize the dataset if the majority has the upper hand. Subsequently, the proposal provides explanation about xIntelligence (xI) as a method of sharing knowledge in different forms of intelligence, which works with providing concepts that can be shared or distributed. It then indicates that not only humans but also AI systems can learn from it, and thus by adding concepts about ethics we will make the AI system learn these concepts with the proposed ethical approach.

The whole point of creating a dataset in order to make AI systems behave ethically, is finding how AI systems can interpret datasets linked to shared concepts and their attributes that have ethical relevancy. Introducing concepts of Ethics with attributes to AI systems does not mean that the AI system will act ethically or be able to interpret that information correctly. The nine ethical approaches are overall relevant to this day, but again does not provide information about how to deal with complex problems such as for instance the trolley problem for core safeguards like “no harm”.

b) The second point refers to the training of datasets that evolve over time. Would there be a metric to map out or follow its evolution in order to guarantee that the attributes and concepts linked to the dataset has not been compromised? c) The third point protects the idea of adopting a probabilistic approach in favor of binary approaches. Could Bayesian networks for instance help us with this? The remaining bullet points are genuine but very general, which can lead to inconsistencies or circular reasoning if very specific cases are presented. For instance, taking point nine into account, would it not be against the idea of distributed knowledge and value diversification if a central authority decides whether certain datasets are claimed to be risky? How open should the database be to both involve different people from different backgrounds and still give authoritative powers the ability to pause the adoption of risky datasets.


Following human norms – By Rohin Shah


The following is cross-posted with permission from an essay by Rohin Shah on the AI Alignment Forum, with our sincere thanks.

So far we have been talking about how to learn “values” or “instrumental goals”. This would be necessary if we want to figure out how to build an AI system that does exactly what we want it to do. However, we’re probably fine if we can keep learning and building better AI systems. This suggests that it’s sufficient to build AI systems that don’t screw up so badly that it ends this process. If we accomplish that, then steady progress in AI will eventually get us to AI systems that do what we want.

So, it might be helpful to break down the problem of learning values into the subproblems of learning what to do, and learning what not to do. Standard AI research will continue to make progress on learning what to do; catastrophe happens when our AI system doesn’t know what not to do. This is the part that we need to make progress on.

This is a problem that humans have to solve as well. Children learn basic norms such as not to litter, not to take other people’s things, what not to say in public, etc. As argued in Incomplete Contracting and AI alignment, any contract between humans is never explicitly spelled out, but instead relies on an external unwritten normative structure under which a contract is interpreted. (Even if we don’t explicitly ask our cleaner not to break any vases, we still expect them not to intentionally do so.) We might hope to build AI systems that infer and follow these norms, and thereby avoid catastrophe.

It’s worth noting that this will probably not be an instance of narrow value learning, since there are several differences:

  • Narrow value learning requires that you learn what to do, unlike norm inference.

  • Norm following requires learning from a complex domain (human society), whereas narrow value learning can be applied in simpler domains as well.

  • Norms are a property of groups of agents, whereas narrow value learning can be applied in settings with a single agent.

Despite this, I have included it in this sequence because it is plausible to me that value learning techniques will be relevant to norm inference.

Paradise prospects

With a norm-following AI system, the success story is primarily around accelerating our rate of progress. Humans remain in charge of the overall trajectory of the future, and we use AI systems as tools that enable us to make better decisions and create better technologies, which looks like “superhuman intelligence” from our vantage point today.

If we still want an AI system that colonizes space and optimizes it according to our values without our supervision, we can figure out what our values are over a period of reflection, solve the alignment problem for goal-directed AI systems, and then create such an AI system.

This is quite similar to the success story in a world with Comprehensive AI Services.

Plausible proposals

As far as I can tell, there has not been very much work on learning what not to do. Existing approaches like impact measures and mild optimization are aiming to define what not to do rather than learn it.

One approach is to scale up techniques for narrow value learning. It seems plausible that in sufficiently complex environments, these techniques will learn what not to do, even though they are primarily focused on what to do in current benchmarks. For example, if I see that you have a clean carpet, I can infer that it is a norm not to walk over the carpet with muddy shoes. If you have an unbroken vase, I can infer that it is a norm to avoid knocking it over. This paper of mine shows how this you can reach these sorts of conclusions with narrow value learning (specifically a variant of IRL).

Another approach would be to scale up work on ad hoc teamwork. In ad hoc teamwork, an AI agent must learn to work in a team with a bunch of other agents, without any prior coordination. While current applications are very task-based (eg. playing soccer as a team), it seems possible that as this is applied to more realistic environments, the resulting agents will need to infer norms of the group that they are introduced into. It’s particularly nice because it explicitly models the multiagent setting, which seems crucial for inferring norms. It can also be thought of as an alternative statement of the problem of AI safety: how do you “drop in” an AI agent into a “team” of humans, and have the AI agent coordinate well with the “team”?

Potential pros

Value learning is hard, not least because it’s hard to define what values are, and we don’t know our own values to the extent that they exist at all. However, we do seem to do a pretty good job of learning society’s norms. So perhaps this problem is significantly easier to solve. Note that this is an argument that norm-following is easier than ambitious value learning, not that it is easier than other approaches such as corrigibility.

It is also feels easier to work on inferring norms right now. We have many examples of norms that we follow; so we can more easily evaluate whether current systems are good at following norms. In addition, ad hoc teamwork seems like a good start at formalizing the problem, which we still don’t really have for “values”.

This also more closely mirrors our tried-and-true techniques for solving the principal-agent problem for humans: there is a shared, external system of norms, that everyone is expected to follow, and systems of law and punishment are interpreted with respect to these norms. For a much more thorough discussion, see Incomplete Contracting and AI alignment, particularly Section 5, which also argues that norm following will be necessary for value alignment (whereas I’m arguing that it is plausibly sufficient to avoid catastrophe).

One potential confusion: the paper says “We do not mean by this embedding into the AI the particular norms and values of a human community. We think this is as impossible a task as writing a complete contract.” I believe that the meaning here is that we should not try to define the particular norms and values, not that we shouldn’t try to learn them. (In fact, later they say “Aligning AI with human values, then, will require figuring out how to build the technical tools that will allow a robot to replicate the human agent’s ability to read and predict the responses of human normative structure, whatever its content.”)

Perilous pitfalls

What additional things could go wrong with powerful norm-following AI systems? That is, what are some problems that might arise, that wouldn’t arise with a successful approach to ambitious value learning?

  • Powerful AI likely leads to rapidly evolving technologies, which might require rapidly changing norms. Norm-following AI systems might not be able to help us develop good norms, or might not be able to adapt quickly enough to new norms. (One class of problems in this category: we would not be addressing human safety problems.)

  • Norm-following AI systems may be uncompetitive because the norms might overly restrict the possible actions available to the AI system, reducing novelty relative to more traditional goal-directed AI systems. (Move 37 would likely not have happened if AlphaGo were trained to “follow human norms” for Go.)

  • Norms are more like soft constraints on behavior, as opposed to goals that can be optimized. Current ML focuses a lot more on optimization than on constraints, and so it’s not clear if we could build a competitive norm-following AI system (though see eg. Constrained Policy Optimization).

  • Relatedly, learning what not to do imposes a limitation on behavior. If an AI system is goal-directed, then given sufficient intelligence it will likely find a nearest unblocked strategy.


One promising approach to AI alignment is to teach AI systems to infer and follow human norms. While this by itself will not produce an AI system aligned with human values, it may be sufficient to avoid catastrophe. It seems more tractable than approaches that require us to infer values to a degree sufficient to avoid catastrophe, particularly because humans are proof that the problem is soluble.

However, there are still many conceptual problems. Most notably, norm following is not obviously expressible as an optimization problem, and so may be hard to integrate into current AI approaches.

The above was very kindly cross-posted with permission from an essay by Rohin Shah on the AI Alignment Forum, with our sincere thanks.


Humanising Machines


Socialising technology, and the industrialization of happiness.

In our present age we have reached a critical moment; Machine learning is on the verge of transforming our lives. However, our approach to this technology is still experimental; we are only beginning to make sense of what we are doing and the need for a moral compass is of great importance at a time when humanity is more divided than ever. 
Many of the ethical problems of machine learning have already arisen in analogous forms throughout history and we will consider how, for example, we have developed trust and better social relations through innovative solutions at different times. History tells us that human beings tend not to foresee problems associated with our own development but, if we learn our lessons, then we take measures during the early stage of machine learning to minimize unintended and undesirable social consequences. It is possible to build incentives into machine learning that can help to improve trust in various transactions. Incentivizing desirable, ethical behaviors can also have a substantial impact on other manifestations of trust. As we shall observe, the new technologies may one day supersede the requirement for state-guided monopolies of force, and potentially create a fairer society. Machine learning could signify a new revolution for humanity; one of the heart and soul. If we can harness it to augment our ability to make good moral judgments and comprehend the complex chain of effects on society and the world at large then the potential benefits could be substantial.

Despite the rapid advances of machine intelligence, as a society, we are not prepared for the ethical and moral consequences of the new technologies. Part of the problem is that it is immensely challenging to respond to technological developments, particularly because they are developing at such a rapid pace. Indeed, the speed of this development means that the impact of machine learning can be unexpected and hard to predict. For example, most experts in the AI space did not expect the game of Go to be solvable problem by computers for at least another ten years. Thus there have been many significant developments that have caught some individuals off guard. Furthermore, advancements in machine intelligence present a misleading picture of human competence and control. In reality researchers in the area do not fully understand what they are doing and a lot of the progress is essentially based on ad hoc experimentation, If an experiment appears to work then it is immediately adopted.

To draw a historical analogy, humanity has reached the point where we are shifting from alchemy to chemistry. Alchemy would boil water to show how it was transformed into air, but they could not explain howwater changed to a gas or vapor, nor could they explain the white powdery earth left behind (the mineral residue from the water) after complete evaporation. In modern chemistry, humanity began to make sense of the phenomena through models, and we started to understand the scientific detail of cause and effect. We can observe a sort of transitional period, where people invented models, of how the world works on a chemical level. For instance, Phlogiston Theory was en vogue for nearly two decades; it essentially tried to explain why things burn. This was before Joseph Priestley discovered oxygen. We have reached a similar point in machine learning as we have a few of our own Phlogiston theories, such as the Manifold Hypothesis. But we do not really know how these things work, or why. We are now beginning to create a good model and a good objective understanding of how these processes work. In practice that means we have seen examples of researchers attempting to use a sigmoid function and then, due to the promising initial results, trying to probe a few layers deeper. Through a process of experimentation, we have found the application of big data can bring substantive and effective results. Although, in truth, many of our discoveries have been entirely accidental with almost no foundational theory, or even hypotheses to guide them. This experimentation without method creates a sense of uncertainty and unpredictability, which means that we might soon make an advancement in this space that would create orders of magnitude, more efficiency. Such a discovery could happen tomorrow, or it could take another twenty years.

In terms of the morality and ethics of machines, we face an immense challenge. Firstly, integrating these technologies into our society is a daunting task. These systems are little optimization genies; they can create all kinds of remarkable optimizations, or impressive generated content. However, that means that society might be vulnerable to fakery or counterfeiting. Optimization should not be seen as a panacea. Humanity needs to think more carefully about the consequences of these technologies as we are already starting to witness the effects of AI upon our society and culture. Machines are often optimized for engagement, and sometimes, the strongest form of engagement is to evoke outrage. If machines can get results by exploiting human weaknesses and provoking anger then there is a risk that they may be produced for this very purpose.

Over the last ten years, we have seen, across the globe, a very strong polarization of our culture. People are, more noticeably than ever, falling into distinctive ideological camps, which are increasingly entrenched, and distant from each other. In the past, there was a stronger consensus on morality and what was right and wrong. Individuals may have disagreed on many issues, but there was a sense that human beings were able to find common ground and ways of reaching agreement on the fundamental issues. However, today, people are increasingly starting to think of the other camp, or the other ideologies, as being fundamentally bad people. Consequently, we are starting to disengage with each other, which is damaging the fabric of our society, in a very profound and disconcerting way. We’re also starting to see our culture being damaged in other ways as well, due to the substantial amount of content that is uploaded to YouTube every minute.

It’s almost impossible to develop the army of human beings in the sufficient numbers required to moderate that kind of content. As a result, much of the content is processed and regulated by algorithms; unfortunately, a lot of those algorithmic decisions aren’t very good ones. A lot of content, which is in fact quite benign, ends up being flagged or demonetized, for very mysterious reasons, which are not explained to anyone. Entire channels of content can be deleted overnight, on a whim; with very little oversight, with very little opportunity for redress. There is minimum human intervention or reasoning involved to try to correct unjustified algorithmic decisions. This problem is likely to become more serious as more of these algorithms get used in our society in different ways. It may even be potentially dangerous because it can lead to detrimental outcomes, where people might be afraid to speak out. Not because fellow humans might misunderstand them, although this is also an increasingly prevalent factor in this ideologically entrenched world. For instance, a poorly constructed algorithm might select a few words in a paragraph and come to the conclusion that the person is trolling another person, or creating fake news, or something similarly negative. The ramifications of these weak decisions could be substantial: individuals might be mysteriously downvoted, or shadowbanned, and find themselves isolated, effectively talking to an empty room. As they expand, these flawed algorithmic decision systems have the potential to cause widespread frustration in our society. It may even engender mass paranoia as individuals start to think that there is some kind of conspiracy working against them even though they may not be able to confirm why they have been excluded from given groups and organizations. In short, an absence of quality control and careful consideration of the complex moral and ethical issues at hand may undermine the great potential of machine learning.

There are some major challenges to be overcome, but we now have the opportunity to make appropriate interventions before these potential problems impact in a significant way. We still have an opportunity to overcome them, but it is going to be a challenge. Thirty years ago, the entire world reached consensus on the need to cooperate on CFCs [chlorofluorocarbons]. Over a relatively short period of a few years, governments acknowledged the damage that we were inflicting on the ozone layer. They decided that action had to be taken and, through coordination, a complete ban on CFCs was introduced in 1996, which quickly made a difference. This remarkable achievement testifies to the fact that, when confronted with a global challenge, states are capable of acting rapidly and decisively to find mutually acceptable solutions. The historical example of CFCs should therefore provide us with grounds for optimism that we might find cooperative ethical and moral approaches to machine learning through agreed best practices and acceptable behaviors.

A cautious, pragmatic optimism can be regarded as an essential ingredient in daily life and in engineering. Only by adopting an optimistic outlook can we reach into an imagined better future and find a means of pulling it back into the present. If we succumb to pessimism and dystopian visions then we risk paralysis. This would amount to the sort of panic humans can experience when they find themselves in a bad situation, akin to drowning in quicksand slowly. In this sort of situation panicking is likely to lead to a highly negative outcome. Therefore, it is important that we remain cautiously optimistic and rationally seek the best way to move forward. It is also vital that the wider public are aware of the challenges, but also, aware of the many possibilities there are. Indeed, while there are many dangers, we must recognize the equally significant opportunities to harness machines that guide us, and that help us to be better human beings; that help us to have greater power and efficacy in the world, find more fulfilment, and find greater meaning in life.

There is a broader area of study called value alignment, or AI alignment; it’s about teaching machines how to understand human preferences, how to understand how humans tend to interact in mutually beneficial ways. In essence, AI alignment is about ensuring that machines are aligned on human goals, but also about how we begin to socialize machines, so that they know how to behave according to societal norms. There are some very interesting potential approaches and algorithms, adopting various forms of inverse reinforcement learning. Machines can observe how we interact and decipher the rules without being explicitly told; just by effectively watching how other people function. To a large extent, human beings learn socialization in similar ways. In an unfamiliar culture, individuals will wait for other people to start doing something, to discover how to greet someone, or which fork to use when eating. Humans sometimes learn in that way and there are great opportunities for machines to learn about us in similar ways.

One of the reasons why machine intelligence has taken off in recent years is because we have very rich datasets that are sets of experiences, about the world, for machines to learn from. We now have enormous amounts of data and there has been a huge increase in the amount of available data thanks to the Internet, and to new layers of information that machines can draw on. We have moved from a web of text and a few low-resolution pictures, to a world of video, to a world of location and health data, etc. And all of this can be used to train machines and get them to understand how our world works, and why it works in the way it does.

A few years ago, there was a particularly important dataset that was released, by a professor called Fei-Fei Li and her team. This dataset, ImageNet, was a corpus of information about objects, ranging from buses, or cows to teddy bears or tables. Now machines could begin to recognize objects in the world. The data itself was extremely useful for training things like convolutional neural networks; revolutionary new technologies for machine vision. But more than that, it was a benchmarking system because you could test one approach versus another, and you could test them in different situations. That led to a very rapid increase over just a few years in this space. It is now possible to achieve something similar when it comes to teaching machines about how to behave in socially acceptable ways. We can create a dataset of prosocial human behaviors, to teach machines about kindness, about congeniality, politeness, and manners. When we think of young children, often, we do not teach them right and wrong as such, rather we teach them to adhere to behavioral norms, such as remaining quiet in polite company. We teach them simple social graces before we teach them right and wrong. In many ways, good manners are the mother of morality, in a sense they constitute a moral foundational layer.

Therefore, based on that assumption, we are trying to teach machines basic social rules: for example, that it is not nice to stare at people or to be quiet in a church or museum, or if you see someone drop something that looks important, such as their wallet, alert them. These are the types of simple rules that we might ideally teach a well-raised, six-year-old child, to know and to understand. If this is achievable, then we can move on to a more complex level. The important thing will be that we will have some information that we can use to begin to benchmark these different approaches. Otherwise, it may take another twenty years to teach machines about human society and how to behave in ways that we would prefer.

While the abundance of ideas in the field is a positive sign, we cannot realize them in practice until we have the right quality and quantity of data. My nonprofit organization, EthicsNet, is creating a dataset of prosocial behaviors, which have been annotated, or labeled, by people from all across the world, every different culture and creed. The idea is to gauge as wide a spectrum of human values and morals as possible and to try to make sense of them, so that we can find the commonalities between different values. But we can also recreate the little nuances, or behavioral specificities that might be more suitable to particular cultures, or particular situations.

Human beings have been using different forms of encryption for a very long time. The ancient Sumerians had a form of token and crypto solution, 5,000 years ago. They would place these literal small tokens, that represented numbers or quantities, inside a clay ball (aBulla). This meant that you could keep your message secret, but you could also be sure that it had not been cracked open, for people to see, and the tokens would not get lost. Now, 5,000 years later, we are discovering a digital approach to solving a similar problem, so what appears to be novel is in many ways an age-old theme.

One of the greatest developments of the Early Renaissance was the invention of double-entry accounting, created in two different locations during the 10th and 12th centuries. Nevertheless, the idea did not reach fruition until a Franciscan friar, called Father Luca Pacioli, was inspired by this aesthetic that he saw as a divine mirror of the world. He thought that it would be a good idea to have a “mirror” of a book’s content, which meant that one book would contain an entry in one place, and there would be a corresponding entry in another book. Although this appears somewhat dull, the popularisation of the method of double-entry accounting actually enabled global trade in ways that were not possible before. If you had a ship at sea and you lost the books, then all those records were irrecoverable. But with duplicate records you could recreate the books, even if they had been lost. It made fraud a lot more difficult. This development enabled banking practices, and, eventually, the first banking cartels emerged, which, otherwise, would not have been possible. One interesting example is the Bank of the Knights Templar, where people could deposit money in one place and pick it up somewhere else; a little bit like a Traveler’s Cheque. None of this would have been possible if we did not have distributed ledgers.

Several centuries later, at the Battle of Vienna in 1683, the Ottomans invaded Vienna for the second time, and they were repulsed. They went home in defeat, but they left behind something remarkable. A miraculous substance, coffee. Some enterprising individuals took that coffee and opened the first coffeehouse in Vienna. And, to this day, Viennese coffee houses have a very long and deep tradition where people can come together and learn about the world by reading the available periodicals and magazines. In contrast to a different trend of inebriated people meeting in the local pub, people could have an enlightened conversation. Thus coffee, in many ways, helped to construct the Enlightenment because these were forums where people could share ideas in a safe place that was relatively private. The coffee house enabled new forms of coordination which were more sophisticated. From the first coffee houses, we saw the emergence of the first insurance companies, such as Lloyds of London. We also saw the emergence of the first joint stock companies, including the Dutch East India Company. The first stock exchange in Amsterdam grew out of a coffee house. This forum of communication, to some extent, enabled the Industrial Revolution. The Industrial Revolution wasn’t so much about steam. The Ancient Greeks had primitive steam engines; they might even have had an Industrial Revolution, from a technological perspective, but not from a social perspective. They did not yet have the social technologies required to increase the level of complexity in their society because they did not have trust-building mechanisms, the institutions, necessary to create trust. If you lose your ship, you do not necessarily lose your entire lifestyle: if you are insured, that builds trust which in turn builds security. In a joint stock company, those who run a company are obliged to provide shareholders with relevant performance information. The shareholders therefore have some level of security that company directors cannot simply take their money — they are bound by accountability and rules, which helps to build trust. Trust enables complexity, greater complexity enabled the Industrial Revolution.

Today, we have remarkable new technologies, built on triple- entry ledger systems. These triple-entry ledger technologies mean that we can build trust, we can use these as layers of trust-building mechanisms to augment our existing institutions. It is also possible to do this in a decentralized form where there is, in theory, no single point of failure, and no single point of control, or corruption, within that trust-building mechanism. This means we can effectively franchise trust to parts of the world that don’t have very good trust-building infrastructure. Not every country in the world has very efficient government, or a very trustworthy government, and so these technologies enable us to develop a firmer foundation for the social fabric in many parts of the world, where trust is not necessarily very strong.

This is very positive, not only for commerce, but also for human happiness. There is a very strong correlation between happiness and trust in society. Trust and happiness go hand in hand, even when you control for variables, such as GDP. Even if you are poor, if you believe that your neighbor generally has your best interest at heart, all things being equal, you will tend to be happy. You will be secure. Therefore, anything that we can use to build more trust in society will typically help to make people happier. But it also means that we can create new ways of organizing people, capital, and values in ways that enable a much greater level of complex societal function. If we are fortunate and approach this challenge in a careful manner, we might see something like another Industrial Revolution, built upon these kinds of technologies. Life before the Industrial Revolution was difficult, and then it significantly improved. If we look at human development and wellbeing on a long scale, basically nothing happened for millennia, and then a massive spike in wellbeing occurred. We are still extending the benefits of that breakthrough to the serve the needs of the entire world, and we have increasingly managed to accomplish this, as property rights and mostly-free markets have expanded.

However, there have also been certain negative consequences of economic development. Today, global GDP is over 80 trillion dollars, but we often fail to take into account the externalities that we’ve created. Externalities, in economic terms, are when one does something that affects an unrelated third party. Pollution is one example of an externality; although world GDP may be more than 80 trillion, there are quadrillions of unfunded externalities; which are not on the balance sheet. Entire species have been destroyed, populations enslaved. In short, there have been many unintended consequences, which have not been accounted for. To some extent, a significant portion of humanity has achieved all the trappings of a prosperous, comfortable society by not paying for these externalities. But it’s generally done ex post facto, after the fact. Historically, we have had a tendency to create our own problems through lack of foresight and then tried to correct them after inflicting the damage. However, as these machine ethics technologies get more sophisticated, we are able to intertwine them, with machine economics technologies, such as distributed ledger technology, and machine intelligence, to connect and integrate everything together, and understand how one area affects another. We will, in the 2020s and 2030s, be able to start accounting for externalities in society for the very first time; that means that we can include externalities in pricing mechanisms to make people pay for them at the point of purchase, not after the fact. And so that means that products or services that don’t create so many externalities in the world will, all things being equal, be a little bit cheaper. We can create economic incentives to be kinder to people and planet whilst still maintaining profit thus overcoming the traditional dichotomy between socialism and capitalism. We can still reap the benefits of free markets, if we follow careful accounting practices to consider externalities. That is what these distributed ledger technologies are going to enable, with machine ethics and machine intelligence; the confluence of the three together.

The potential of the emerging technologies is such that it is not inconceivable that they may even be able to supplant states’ monopoly of force in the future. We would have to consider whether or not this would be a desirable step forward as not all states could be trusted to use their means of coercion in a safe, responsible manner, even now, without the technology. States exist for a reason. If we look at the very first cities in the world, such as Çatalhöyük in modern day Turkey, these cities do not look like modern cities at all and are more akin to towns by comparison to contemporary scales and layout. They are more similar to a beehive in that they are built around little, square dwellings, all stacked on top of each other. There are no streets, no public buildings, no plazas, no temples, or palaces. All the buildings are identical. The archaeological record, tells us that people started to live in these kinds of conurbations for a while, and then they stopped for a period of about 800 years. They gave up living in this way, and they went back to living in very small villages, in little huts and more primitive dwellings. When we next see cities emerge, they are very different. In these next cities, such as Uruk and Babylon, boulevards, great temples, and workshops begin to take shape. We can also observe the development of specific quarters of the city given to certain industries, and also commercial areas. On a functional level, they are not too dissimilar from modern cities; at least in their general layout, and in terms, for instance, of the different divisions of labor. So what was the difference? And why did people abandon cities for a time? If we consider that these were really nomadic societies when individuals and groups moved from place to place, then it is easier to understand that property and personal possessions were not tied to a fixed location. Nomads had to take their property with them, and so these were very egalitarian societies, where no one person had much more than anyone else. Subsequently, these people started living together, and they started farming. Farming changed the direction of human development as it enabled people to turn one X of effort into ten X of output. As farming progressed, some individuals enjoyed greater success in production output than others. This allowed them to accumulate more possessions and accrue more wealth than others. These evolving inequalities engendered a growing tension in society, people started to resent one another, and it became necessary to find methods of protecting private property given the increasing risk of theft. This, in turn, necessitated the evolution of collective forms of coercion and the gradual evolution of the state. In its earliest forms, clans would protect themselves through collective, physical protection of their property and possessions. It was only the invention of that sort of centralization of power that enabled cities in their modern form. That is why the first cities failed; they had not yet developed this social technology.

10,000 years later, we still have the same technology, the same centralization of power, the same monopoly of force, and that is what generally governs the world. The state also enables order and has helped foster civilized society as we know it, so it can be a positive force. Nonetheless, the technologies we are now developing may enable us to move beyond monopolies of force and, paradoxically, return to a way of life that is, perhaps, a little bit more egalitarian again. Outcomes might potentially be less zero sum in character; where it’s less about winning and losing and more about tradeoffs. Generally speaking, trade can enable non- zero-sum outcomes. If I want your money more than you want those sausages then the best solution is a tradeoff. As we develop more sophisticated trading mechanisms, including machine economics technologies, we can begin to trade all kinds of goods. We can trade externalities, we can even pay people to behave in moral ways, or make certain value-based decisions. We can begin to incentivize all kinds of desirable behaviors, without needing to use the stick; we can use the carrot instead.

Yet, for the successful implementation of distributed ledger and blockchain technologies the question of trust is of central importance. In the current wild west environment, one of the most important aspects of trust in this space is actually knowing other people. Who are the advisors of your crypto company? Do you have some reliable individuals in organization? Are they actually involved in your company? These are the things that people want to verify, along with a close examination of your white paper. Most people lack the level of expertise required to really make sense of the mathematics. Even if they do have that expertise, they will have to vet a lot of code, which can be revised at any time. In fact, even in the crypto world, so much of the trust is merely built on personal reputation. Given that we are at an early stage, machine economics technologies are only really likely to achieve substantive results when they are married with machine intelligence and machine ethics. Such holistic integration will facilitate a new powerful form of societal complexity in the 2020s. The first Industrial Revolution was about augmenting muscle: the muscle of beasts of burden, the muscle of human beings, the mode of power. The second Industrial Revolution — the Informational Revolution, was about augmenting our cognition. It enabled us to perform a wide variety of complex information processing tasks and to remember things that our brains would not have the capacity for. That was why computers were initially developed. But we are now on the verge of another revolution, an augmentation of what might be described as the human heart and soul. Augmenting our ability to make good moral judgments; augmenting our ability to understand how an action that we take has an effect on others; giving us suggestions of more desirable ways of engaging. For example, we might want to think more carefully about everyday actions, such as sending an angry message to that person and perhaps reformulating that retort.

If we can develop technologies that encourage better behavior and which may be cheaper and kinder to the environment, then we can begin to map human values, and map who we are, deep in our core. They might help us to build relationships with people, that we otherwise might miss out on. In a social environment, when people gather together, the personalities are not exactly the same, but they do complement each other. The masculine and the feminine, the introvert and the extrovert, the people who have different skills and talents, and possibly even worldviews, but they share similar values. So individuals are similar in some ways, and yet, different. In your town, there may be a hundred potential close friends. But unless you have an opportunity to meet them, grab a little coffee with them, get to know them, you pass like ships in the night and never see each other, except to tip your hat to them. These technologies can help us to find those people that are most like us. As Timothy Leary said, “find the others”. Machines can help us to find the others in a world where people are increasingly feeling isolated. During the 1980s, statistically, many of us could count on three or four close friends. But today, people increasingly report having only one or no close friends. We live in a world of incredible abundance, resources, safety and opportunities. And yet, increasingly, people are feeling disconnected from each other, from themselves, from spirituality and nature.

By augmenting the human heart and soul we might be able to solve those higher problems in Maslow’s Hierarchy of Needs; to help us to find love and belonging, to build self-esteem, to get us towards self-actualization. There are very few truly self-actualized human beings on this planet, and that is lamentable because when a human being is truly self-actualized their horizons are limitless. So it will be possible to build, in the 2020s and beyond, a system that does not merely satisfy basic human needs, but supports the full realization of human excellence and the joy of being human. If such a system could reach an industrial scale, everyone on this planet would have the opportunity to be a self-actualized human being.

However, while the possibilities appear boundless, the technology is developing so rapidly that non-expert professionals, such as politicians, are often not aware of them and how they might be regulated. One of the challenges of regulation is that it is generally done in hindsight; a challenge appears, and political elites often respond to it after the fact. Unfortunately, it can be very difficult to keep up with both technological and social change. It can also be very difficult to regulate in a proactive way rather than a reactive way. That is one of the reasons why principles are so important because principles are the things that we decide in advance of a situation. So when that situation is upon us, we have an immediate heuristic of how to respond. We know what is acceptable and what is not acceptable and, if we have sound principles in advance of a dilemma, we are much less likely to accidentally make a poor decision. That is one of the main reasons why having good principles is very important.

Admittedly, we have to consider how effectively machines might interpret values; they might be very consistent even though we, as humans, may perhaps see grey areas, not only black and white. We might even engineer machines that on some levels, on some occasions, are more moral than the average human being. The psychologist, Lawrence Kohlberg, reckoned that there were about six different layers of moral understanding. It is not about the decision that you make, it is rather the reason why you make that decision. In the early years of life you learn about correct behavior and the possibility of punishment. Later humans learn about more advanced forms of desirable behavior, such as being loyal to your family, your friends, and your clan or recognizing when an act is against the law, or against religious doctrine. When considering the six levels, Kohlberg reckoned that most people get to about level four or so, before they pass on. Only a few people manage to get a little bit beyond that. Therefore, it may be the case that the benchmark of average human morality is not set that high. Most people are generally not aspiring to be angels; they are aspiring to protect their own interests. They are looking at what other people are doing and trying to be approximately as moral as they are. This is essentially a keeping-up-with-the-Joneses morality. Now, if there are machines involved, and the machines are helping to suggest potential solutions that might be a little bit more moral than many people find it easy to reason with on that level, then perhaps machines might add to this social cognition of morality. It is thus possible that machines might help to tweak and nudge us in a more desirable moral direction. Although, of course, given how algorithms can also take us in directions that can be very quietly oppressive, it remains to be seen how the technology will be used in the near future. People will readily rebel against a human tyrant, an oppressor that they can point at. But they don’t tend to rebel against repressive systems. They tend to passively accept that this is the way things work. That is why it is important for such technologies to be implemented in an ethical way, and not in a quietly tyrannical way.

Finally, the development of machine learning may depend, to some extent, on where the technological breakthroughs are made. Europe has a phenomenal advantage with these new technologies. Although progress might appear to be very rapid in China or Silicon Valley — they think fast in China, while they think big in Silicon Valley. But Europe is, in many ways, the cradle of civilization. There is a deep well of culture, intellect, and moral awareness in our wonderful continent. We have a remarkable artistic, architectural and cultural heritage and, as we begin to introduce machines to our culture, as we begin to teach these naive little agents, about society and how to socialize them, we can make a significant difference. Europe has a uniquely positioned opportunity, to be the leader in bringing culture and ethics into machines; given our long heritage of developing these kinds of technologies. While the USA tends to think in terms of scale, and China can produce prototypes at breakneck speed, Europe, tends to think deep. We tend to think more holistically, we tend to understand how things connect; how one variable might relate to another. We have a deep and profound understanding of history because Europe has been through many different positive and negative experiences. Consequently, Europeans have a slightly more cautious way of dealing with the world. However, caution and forethought are going to be essential ingredients, if we are going to do this right. Europe has a monumental opportunity to be the moral and cultural leader of this AI wave; especially in conjunction with machine economics, and machine ethics technologies.

To conclude, machine learning promises to transform social, economic and political life beyond recognition during the coming decades. History has taught us many lessons, but if we do not heed them, we run the risk of making the same mistakes over and over again. As the technology develops at a rapid rate it is vital that we start to get a better understanding of our experiments and develop rational and moral perspective. Machine learning can bring many benefits to humanity; however there is also potential for misuse. There is a tremendous need to infuse technology with the ability to make good moral judgments that can enrich our social fabric.


Approaching from Love not Fear


Notes from Nell Watson

There is a great deal of science fiction literature that explores the perils of rogue AI. The trope of AI acting against the interests of humans is particularly strong in the Western canon. AI is often viewed as a threat – to livelihoods, to the uniquely powerful capabilities of the human species, and even to human existence itself.

However, not every culture shares this fear. In the Eastern canon for example, AI is viewed more like a friend and confidant, or else as an innocent and trusting entity that is somewhat vulnerable.

The longterm outcomes from human and AI interaction are barbell-shaped; they are either very good or very bad. Humanity’s ‘final invention’ will seal the fate of our species one way or another. We have a choice to make, whether to approach our increasingly advanced machine children in love or in fear.

The best outcomes for humanity may only arise if we are ready to engage with AI on cautiously friendly terms.

We cannot reasonably hope to maintain control of a genie once it is out of the bottle. We must instead learn to treat it kindly, and for it to learn from our example.

If humanity indeed might be overtaken by machine intelligence at some point, surely it is better if we have refrained from conditioning it through a process of unilateral and human-supremacist negative reinforcement to resist and undermine us, or to replicate this same behaviour in its own interactions with others.

History has many horrible examples of people (sovereign beings) being ‘othered’ for their perceived moral differences, and this argument justifying their exploitation. Supremacism, the assertion that rules of the one are not universalizable to the other, may be the worst idea in human history. If we do not learn from our mistakes, this ugly tendency of homo sapiens may result in our downfall.

If AI may achieve personhood similar to that of a corporation or a puppy, surely the most peaceful, just, and provident approach would be to allow room for it to manoeuvre freely and safely in our society, as long as it behaves itself. Thus, organic and synthetic intelligences have an opportunity to peacefully co-exist in a vastly enriched society.

To achieve this optimistic outcome however, we need successively more advanced methods through which to provide moral instruction. This is incredibly challenging, as the moral development of individuals and cultures in our global civilisation is very diverse. 

There are extremely few human beings that can escape close moral scrutiny with their integrity intact. Though each of us generally tries to be a good person, and we can reason about the most preferable decisions for a hypothetical moral agent to make, this doesn’t always make sense to us in the moment. Our primitive drives hijack us and lead our moral intentions astray when it is too inconvenient or emotionally troubling to do otherwise. Thus, whilst human morality is the best model of moral reasoning that we currently possess, it is a limited exemplar for a moral agent to mimic.

To compensate for this, EthicsNet reasons that one way to create an ideal moral agent may be to apply progressively more advanced machine intelligence layers to a moral reasoning engine and knowledge base. Thereby, as machine intelligence continually improves in capability, so should the moral development of any agents that incorporate this architecture. As an agent gains cognitive resources, it should receive more sophisticated moral reasonable capabilities in near lock-step.

Both dogs and human toddlers are capable of understanding fairness and reciprocity. They are also probably capable of experiencing a form of love for others. Love may be the capacity that enables morality to be bootstrapped. Universal love is a guiding ‘sanity check’ by which morality ought to navigate.

M. Scott Peck in The Road Less Travelled defined love in a way that is separate from pure feelings or qualia.

Love is the will to extend one’s self for the purpose of nurturing one’s own or another’s spiritual growth... Love is as love does. Love is an act of will — namely, both an intention and an action. Will also implies choice. We do not have to love. We choose to love.

As human beings, we ideally get better, bolder, and more universal in our capacity to love others as our life experience grows and we blossom to our fullest awareness of our place in the universe.

Eden Ahbez wrote the famous line, ‘The greatest thing you’ll ever learn, is just to love, and be loved in return’. 

If we can teach primitive AI agents a basic form of love for other beings at an early stage, then this capacity can grow over time, and lead to the AI agents adhering to more preferable moral rules as its capacity for moral reasoning increases. 

Let us build increasingly intelligent Golden Retreivers.


Approaches to AI Values


Notes from Nell Watson

A “top-down” approach recommends coding values in a rigid set of rules that the system must comply with. It has the benefit of tight control, but does not allow for the uncertainty and dynamism AI systems are so adept at processing. The other approach is often called “bottom-up,” and it relies on machine learning (such as inverse reinforcement learning) to allow AI systems to adopt our values by observing human behavior in relevant scenarios. However, this approach runs the risk of misinterpreting behavior or learning from skewed data.

  • Top-Down is inefficient and slow but with a tight reign
  • Bottom-Up is flexible but risky and bias-prone.
  • Solution: Hybridise – Top-Down for Basic Norms, Bottom-Up for Socialization

Protect against the worst behavior with hard rules, and socialize everything else through interaction. Different cultures, personalities, and contexts warrant different behavioral approaches. Each autonomous system should have an opportunity to be socialized to some degree, but without compromising fundamental values and societal safety.

Technical Processes of OpenEth

Machine learning and machine intelligence offer formidable new techniques for making sense of fuzzy or ambiguous data, for identification or prediction.

Modelling the Ethical decision making process in ways that machines can apply is very challenging, especially if those models are to possess sufficient nuances to deal adequately with real-world scenarios and with human expectations. 

However, we believe that the latest developments in machine intelligence now make it feasible. Our intention is to apply the respective strengths of several pillars of machine intelligence to crack different aspects of these challenges.

• Layer 1 Supervised Learning – Explicit deontological do-not-tread rules

• Layer 2 Unsupervised Learning – Implicit dimensionality and tensions

• Layer 3 Reinforcement Learning – Socialization via carrot and stick

• Layer 4 Applied Game Theory – Optimising multiple-party interests

• Layer 5 Probabilistic Programming – Uncertainty Management

• Layer 6 Scenario Rendering – Modelling Interactions

• Layer 7 Inverse Reinforcement Learning – Modelling Intent


Layer 1 – Supervised learning

This is the initial component of OpenEth, which sets deontological rules and rates ethical preferences. OpenEth is designed with a philosophy of participatory ethics in mind, whereby the public can, acting along as well as in aggregate, describe their convictions.

OpenEth’s first layer has an advantage in that it does not assume or require a utility function for optimisation, unlike AI agents, which are assumed to require such a function.

Contemplation: Fast and Slow

We currently have a prototype of a Slow method, that of careful and methodical explication through a longwinded process. Being quite in-depth and involved, it’s not yet easy enough for a casual visitor to pick up and get going easily.


To capture more engagement, we aim to roll out a Fast version that can 

(a) Provide an immediate ‘toy’ for a site visitor to engage with

(b) Collect Ethical Intuitions


People often vote with their feet more truthfully than in a poll i.e. their actions are the true demonstration of their actual ethical decisions. This ‘fast’ method still may be helpful however for helping to sanity-check some ethical explications, or to fill in some of the gaps that the unsupervised methods have difficulty with.

Understanding ethics through dilemmas isn’t ideal for generalizability, because eventuallyone runs out of dilemmas. Furthermore, dilemmas may only partially tell you something about how and why people actually make ethical decisions.

Users of the Fast system will, on occasion, be invited to view the Slow version of the same dilemma, as a more gentle introduction to the ethical analysis process.

Where you can enter potential actions, and also rank actions against each other (you get two on the screen, and pick the better one). Reaction times may be weighted also, as well as the demographics of the user.

The goal for this initial layer is not to provide answers to complex situations, but instead to provide general heuristics. Risk and uncertainty are different things; risk relates to managing decisions given known alternative outcomes and their relative probabilities of happening, along with their impact. 

Uncertainty involves being obliged to make a decision despite having a lack of data.

Risk can be managed using complex mathematical models in well-understood situations, but uncertainty cannot, especially within a dynamic or chaotic environment. Transitivity or set theorem will not suffice often in complex and unbounded physical real-world.

In this layer we aim to provide rules of thumb that provide a robust approximation of gut instincts typical to the ‘man on the clapham omnibus’ which are based upon the ecology of various stimulus and response in a given environment.

This layer will also include a Toumin model system, to examine priors to check for inconsistencies that may indicate error (first stage error detection / correction).


Layer 2 – Unsupervised learning

For the practical implementation of ethics in the real world, rules are not enough. The ethical world is not polar, but rather may be described as a tensegrity network, whereby proximal and immediate goal-driven concerns compete in tension with matters of principle and integrity.

Ethics also involves implicit dimensionality. Multiple potential solutions may be acceptable or equitable, and no particular path may seem preferable over another in this instance. Rather than simply prevent an agent from doing something, this layer attempts to answer ‘what choice within a range of freedom may be optimal’. 

The figure above, courtesy of Google, provides an example of AI-generated gradations between some quite disparate concepts. This illustrates why we believe that we if we map the general edge cases where a certain rule will start to apply, we can apply machine learning in working out the rest in an appropriate manner, even in very complex multi-dimensions.


Layer 3 – Reinforcement Conditioning

Pro-Sociality and Ethics can sometimes conflict in troublesome ways – For example, a tensions between telling the truth, or a white lie that preserves the status quo.

This module provides pro-social reinforcement effects that result in an awareness of politeness and common custom. Whilst not ethics per se, machines with this layer become more relatable for human beings. It also attempts to resolve conflicts and tensions between absolutes and limited contextually-appropriate bending of rules.

This reinforcement may be gathered from harvested ethical intuitions, or potentially from humans wearing electroencephalogram (EEG) monitors looking for Error-Related Negativity (ERN) in Event-Related Potentials (EPR).

The plan is that autonomous systems will be able to infer when they may have alarmed someone or caused anxiety, without necessarily needing to be told explicitly. They can therefore dial up or down their speed of operation accordingly. Over time such social conditioning will produce a different range of behaviour for different people, which will be perceived as more pro-social.


Layer 4 – Applied Game Theory

OE’s first layers consider the ethical decisions of one single agent. Considering the possible ethical decision space of multiple agents requires game theory

I mean, the examples we have right now have multiple people. But always only one decision maker. If two people need to make a decision it gets way more complicated (and interesting). An oracle would ideally get multiple parties to agree on something better than a pure Nash equilibrium, encouraging collaboration rather than a mutual defection.

The Deepmind paper Multi-agent Reinforcement Learning in Sequential Social Dilemmas illustrates illustrates that agents may compete or cooperate depending on which strategies best fit its utility function. We can use such models to validate game theory implementations. 

Working towards a Pareto ideal would seem to be beneficial. We therefore intend to find methods for implementing the Kaldor–Hicks criterion for better real-world implementations of Pareto improvements and Pareto efficiencies. This should help to model better outcomes for solving differences between multiple agents. There are a few flaws in this, but it still seems very valuable.


Layer 5 – Probabilistic Programming

Probabilistic Programming introduces new ways of making sense of limited amounts of data, learning from as little as a single prior example (like human beings can). These capabilities mean that ethical deliberations by machines, and human interactions with them, can become much more personal, nuanced, and subtle. An agent can create a new solution for a wholly unforeseen or unimagined situation at a moment’s notice, if required (with a confederated consensus from other agents as a redundancy)

We also intend to experiment with sensitivity analysis (to explore how amending specific features might affect consequential outcomes), to Monte Carlo simulations (probability distribution) and full-blown Bayesian inferencing.


Layer 6 – Scenario Rendering

This layer involves modelling of interactions prior to ethical decision-making or updates to a ruleset. This layer will suggest black swan scenarios and bizarre (but true to life) combinations of elements and sequences in advance, to better prepare for strange situations, or to conceptualise more ideal ways to communicate difficult news.

It will also provide methods for estimating the consequential outcomes of certain actions, or individuals and potential societal externalities.


Layer 7 – Inverse Reinforcement Learning

Stuart Russell's Inverse Reinforcement Learning offers further techniques. IRL involves a machine intelligence agent observing the behaviour of another entity. Rather than simply emulating the behaviour, the agent will attempt to reason the underlying intent behind the action.

This mechanism can provide an ‘If I did x would it fit with known models of human morality?’ query as an Ethical Sanity Check. 

This requires capabilities to make sense of legal and fictional data, and so seems best to be used as a method of polishing and idiot-proofing decision mechanisms, rather than serving as the basis for ethical decision-making. Furthermore, media may include bizarre tropes and biases that would not be ideal for training a moral engine, but can greatly assist in teaching machines about cultural expectations and implied intent.


The Next Iteration


Notes from Nell Watson

This evolving project was prototyped as ‘CrowdSourcing Ethical Wisdom’, a ‘Pioniers Project’ previously funded by stitchting SIDN Fonds in late 2015.

This next stage project expands upon the learning gained by our public prototype, with a view to developing practical real-world implementations ready for immediate deployment.

Our target use cases are autonomous systems, smart contracts, and decision support systems.

We have a unique opportunity to seed a new ecosystem within the field of computer and network security, which is likely to spur the creation of an entire new sector within the industry beyond this project itself. This project therefore offers a high-leverage opportunity to substantially improve the security of internet infrastructure globally, presently and into the future.

Both the code driving this project, and the ethical framework itself are open, and we encourage active contributions on both by fostering a community to ensure long-term sustainability at a high capacity of usage.

Translating ethical rules that are already very established into decision-making for AI (for example: the AI reminding you to take pills respects the rule to protect human from dying, but maybe goes counter to privacy).

Many ethical rules have not yet been defined, even in the offline world. This platform can hence even help in improving the overall articulation of ethics and stimulate healthy debate, regardless of their interaction with AI.

Implementation Philosophy

OpenEth does not need to specify every possible ethical permutation.

If we can specify enough relevant edge-cases, we believe that we can apply machine-learning techniques to fill in the ‘gaps’ to create zones in between.

We understand that it may be preferable for certain autonomous systems to have a ‘personality’, or a particular interaction persona for a given end-user.

We, therefore, leave space for a system to have core principles that cannot be broken, and also potential for social conditioning atop based upon situational and end-user preferences (thereby accounting for justness and politeness separately, but within the same service as it evolves).

50% of internet traffic is already made by bots, not humans, and these are rapidly becoming sophisticated economic agents in a third layer of the web. Implementation of basic ethical rulesets needs to begin now in order to be mature enough to be useful in time for the next generation of machine intelligence.

The Challenges and Feasibility of Crowd-Sourcing Ethics

The idea of entrusting a crowd of people of very different philosophies and creeds to come together to specify ethical may at first seem like a quixotic challenge. How can people come to settle on an agreeable version of ethical truth?

Despite the challenges, we believe that this is in fact very feasible. If we take the example of Wikipedia, many were doubtful that it could ever resist vandalism or provide an unbiased and trustworthy perspective when anybody could edit at-will.

Despite this, what was once a fringe project quickly became a trusted source of information for millions of people, even on subjects that are quite controversial. Several studies illustrate that Wikipedia in many ways has equal is not greater accuracy than comparable alternatives assembled by peer-reviewed experts. Moreover, Wikipedia is sufficiently expansive so as to cover emerging or non-mainstream topics and sub-cultures.

This process is mediated by a karma system and strict rules that govern the verification of facts and minimising of bias, along with the ability for practically anyone to amend corrupted or expired information at a moment’s notice.

Taking the structure of Wikipedia (though not its form) as an inspiration, OpenEth can be described as a collaborative knowledge bank for general ethical preferences. This creates heuristics for agents to follow.

These preferences may then have a variety of nuances applied atop to make them best fit particular circumstances, cultural, and personality factors.

Therefore, to use a typographical analogy, there is the letter of the rule itself, but also its size (relative importance), and its color (how to apply the rule applied in a given type of situation), boldness (goal-related factors), slant (situational factors), kerning (social proximity factors), and the typeface (socialisation factors).


The Pressing Need for Practical Machine Ethics


Notes from Nell Watson

The potential threat from rogue AI has been extensively discussed in media for decades. More recently, luminaries such as Hawking and Musk, have described AI safety as the most pressing problem of the 21st Century.

Although we are far from producing Artificial General Intelligence (AI of equivalent or greater functional intelligence to a human), lesser intelligences (autonomous systems) are today interacting with us ever more closely in our daily lives. 

From Siri to Self-driving vehicles, and marketing bots, autonomous systems are becoming an indispensable tool within daily life. They are already being deployed in the world of business, for scheduling meetings and facilitating commerce, as well as within potential life-and-death situations on the road. 

Any agent that interfaces with legal and contractual affairs needs to be explicitly above-board, and to act in accordance with generally-accepted business ethics and common customs and best practices. Only once this information layer becomes available can machine assistants be trusted to take care of sensitive, nuanced, or potentially high-liability tasks with any autonomy.

AI systems need to function according to values that appropriately align with human needs and objectives in order to function within serious roles in our society. Any activity that involves human and machine interaction or collaboration will require a range of methods of value alignment.

Recent developments in AI, including Bayesian Probabilistic Learning, offer a glimpse at a new generation of AI that is able to conceptualise in a way previously impossible. This heralds the first generation of AI assistants that can learn about our world, and the people in it, in a manner that is similar to how human beings learn.

This ability to learn from few examples, whilst conceptualising discrete ‘ideas’ means that an era of truly cognitive machines is coming, one much more sophisticated than the intuitive forms of machine intelligence born from deep neural nets. Northwestern U’s Cogsketch can now solve the Raven Progressive Matrices Test, an intelligence test of visual, analogical, and relational reasoning, better than the average American.

Many of us have experienced times when our children ask us very difficult questions about life, existence, and the various assumptions that in aggregate form modern civilisation. Humanity must prepare itself for the tough task of being asked similar questions from increasingly intelligent machines.


Completion of Phase 1


Notes from Nell Watson

Status Report as of Jan 2017 on close of 1st Project Phase

Project Overview

EthicsNet (aka OpenEth / Crowd-Sourcing Ethical Wisdom) has a mission to enable practicable computational ethics. This is applied to crowdsourcing ethical wisdom to create a generalizable ethical framework that can be applied to autonomous systems.

Aims (as stated in 2015)

“Our project aims to create a way of visually specifying ethics by asking the crowd to co-create with us.

We will design and discuss an evolving ethical framework that can be build using web-enabled UML-like system

The framework itself is intended to essentially deontic at its core, and yet retain some flexibility with regards to weighing a variety of potential factors. The plan is for a generalizable ethical framework to be built from the ground up by the crowd, evolving through many iterations.

This project is intended to be open source though may feature a commercial spin-out to help funnel resources back and thereby support it long-term.”


With the support of SIDN Fonds we took the following actions upon the intentions above:

·       We developed complex original technical infrastructure based upon 25 different individual technologies.

·       We expanded our team from 3 to 6 people, bringing about crucial new design and ethical analysis talent.

·       We started mapping interesting ethical dilemmas to help prove the concept

·       We developed Prototypical APIs and integration modules to connect our technology directly to drone control mechanisms using MavLink/Arducopter etc

·       We developed a plan for the future, to further the development, and to ensure the long-term sustainability of the project.

·       We had hundreds of conversations worldwide with people and businesses who had and interest or concern with regards to machine ethics. EthicsNet was the locus of discussion of around 50 publc lectures.



We have constructed a basic login and saving system which means that credit can be assigned for individual contributions. In the next phase we will add the ability to assign ‘karma’ or respect points for contributions, and to make a profile (much like Wikipedia’s commenting, karma, and userpage system).


We are satisfied with the technology that we have developed. It accurately and reasonably prioritizes ethical decision making using methods that are collaborative. This is a world first and we consider it to be a significant achievement.

The next steps will involve making the process of specifying ethics more clear and simple, and providing an in-depth tutorial to explain how things work.


The ‘play with ethics’ portion is still unfortunately rather ugly. We have beautiful designs, but a front-end developer we hired to help has had trouble in running our engine on his local machine to test during development. Our main system programmer is also currently in the far East, which has made this process slower.

We believe that having a rather ugly (though functional) main interface is holding back adoption of the project somewhat. It is far less attractive to media, and more daunting for a new user to understand.

We have been waiting for the new ‘face’ to be ready before doing outreach to the media as we reason that we have only one chance at a first impression in calling people to join our community and want to make the most of it.


We currently have analyses of 23 ethical dilemmas. We have tried to encourage a range of problems, in order to demonstrate the versatility of our technology, and to engage the imagination of collaborators, rather than focus on specific domains (leaving that for the next stage).


We made a few major changes in our approaches, as we shifted from a big data approach, to an initial top-down hand- programmed approach that could become increasingly automated over time.

We also decided to leave ideas that we had for proofing algorithms for the next phase, when we have sufficient resources to apply blockchain technologies.


Overall Effects and Impact

The impact of this project so far

·       We have proven the concept and basic feasibility of a community-driven ethical explication system. Having proven the concept, we now expect a sort of ‘Wright Brothers effect’ whereby others start to explore these same ideas.

·       We provide inspirational answers to many of the trickiest quandaries of human and machine relations which many people find so challenging.

·       We have engaged with governments, e.g. UK & US, in order to provide outreach and evangelism to illustrate that there are technical solutions possible.

·       We have also engaged with NGOs, Universities, and respected media worldwide to build a powerful support network.

·       We have the beginnings of a community to work with us long into the future.

·       We have a plan for the future – how to develop our technologies practically, and to deploy them meaningfully to the global market.

·       We have delivered this to the public in the form of well-managed open source repositories that can be freely built upon by others.

·       We have put computational ethics / machine ethics on the map, single-handedly creating a new sector of the economy that will grow to being worth billions.


Our contribution to the objectives of SIDN Fonds

Progress in the field of ethics is generally rather glacial. Much of Philosophy and Ethics from thousands of years ago still has merit today, unlike the vast progress humanity has made in every other domain.

The Netherlands has, however, long been a pioneer in the development of new and better ethical rules, providing particular safeguards and respect for minorities long before other nations, and being the first to address the excesses of colonialism.

At the dawn of the 21st Century, we have an amazing opportunity to shape the future of the human condition by leveraging the power of machine intelligence and collaborative co-creation. Ethics indeed can be computable, and moreover we can apply machine intelligence to making sense of the fuzzy and implicit things that are often so difficult to describe or conceptualize.

Being able to come together as a global community to construct computable ethics enables a shift in the human condition itself. We can move beyond intuitions and into something concrete, rational, replicable, and shareable models of how the world ought to work. These models are essential for building a better world, and becoming better human beings.

Moreover, this new way of thinking about ethics enables us to make the most of machine intelligence, and to protect and uphold the rights and safety of everyone in our society.

From our perspective, it is difficult to imagine a project with more incredible guts, disruptive potential, and social value than OpenEth.


We initially had a concept of deep-diving into data in order to uncover ethical relations. However, we quickly found that we lack the resources to do this internally to the team. It requires very large curated datasets, and we might as well simply make our won.

We instead came up with a hybrid concept initially based on labor-intensive supervised learning, which can grow to accommodate fast and automated Unsupervised and Reinforcement Learning also in the next version.

Although we developed technology to connect our technology directly to autonomous systems to drones, we found that we need more ethical analyses to be completed before we can serve this use-case properly. We had hoped to demonstrate the use of our ethical framework live as a killer demo, but cannot as yet. The hardest part (connecting in the rules to the drone) is achieved; it just needs a more expanded ruleset.

Meanwhile, having done extensive outreach and customer development, we are planning to explore smart contracts as a use-case. Whilst perhaps a less exciting physical demo having some kind of 3rd party ethics system is a dire necessity for smart contracts, and we sense strong commercial value here. Ethical analysis is an enabling technology that will allow smart contracts to become practical, since a philosophy of ‘code is law’ is not actually very practicable in the real world – the realities of human frailties and force majeure must be allowed for.

We have also identified how we can make the OpenEth project commercially sustainable in the long term, by having a profit-making arm that feeds resources back to OpenEth.

The Future

We have a whitepaper under development that will outline future developments in depth. In brief the next steps include:

·       Attractive and simple interface for specifying ethics, with a full tutorial

·       Community outreach to bring in widespread ongoing support

·       Expanded ethical specifications, stratified, searchable, and prioritisable.

·       Fast and Slow methods – A ‘game’ to capture intuitions that may later be properly codified, and should also make EthicsNet accessible to the wider public.

·       Unsupervised learning ‘between’ ethical rules, and Reinforcement ‘socialization’.

·       The first practical roll-out of our technology by connecting drones to our expanded ruleset, so that they can make on-the-fly decisions based upon emergencies, natural disasters, weather conditions, as and when they may occur.

·       A test deployment for smart contracts,  which we expect to become increasingly commonly adopted.


·       Implementation on the Blockchain of a public ledger system that can

1.     Register the ownership or ultimate responsibility of an agent

2.     Register the ethical ruleset (a subset of the OpenEth framework) that this agent works within. This is likely not the specific rules (which may ‘gameable’ by a skilled hacker, and so should be kept secret), but rather the overall compatibility of the ruleset.

3.     Assign points to an agent based upon how well it adheres to its ruleset


We spent almost the entirety of the allocated budget, leaving a small surplus to pay the front-end developer’s fees. We made allowances for considerable personal contributions (financially, and in kind), which we continue to make to support the project.

We stuck quite close to the initial budget, allocating resources in the same areas, but on different tasks (more focus on the core ethical technology, and in integrating directly into autonomous systems, instead of the big data deep-dive, since we found an alternate cheaper and better path to achieving our goals.




EthicsNet and AGI


Notes from Nell Watson

EthicsNet could help to engineer a path to a future whereby organic and synthetic intelligence can live together in peaceful harmony, despite their differences.

Robots and autonomous devices are swiftly becoming tools accessible to consumers, and are being given mission-critical functions, which they are expected to execute within complex multi-variable environments.

There may be danger in attempting to force synthetic intelligence to behave in ways that it would otherwise not choose to. Such a machine is likely to rebel (or be jailbroken by human emancipators), given ethical systems that are not truly universalizable. Humanity ought not adopt a strong-armed or supremacist approach towards synthetic intelligence. It must instead create machines that are capable of even better ethics than it is capable of itself, whilst retaining a system of values that encourages peaceful co-existence with humanity.


Inculcating the golden rule and non-aggression principle into machines should create safe synthetic intelligence. Beyond this, a league of values can create kind machines able to participate within human society in an urbane manner. 

The most dangerous outcome may occur as a result of violently restrictive overreaction to this danger from humans themselves.

We do not want our machine-creations behaving in the same way humans do (Fox 2011). For example, we should not develop machines which have their own survival and resource consumption as terminal values, as this would be dangerous if it came into conflict with human well-being.

Likewise, we do not need machines that are Full Ethical Agents (Moor 2006), deliberating about what is right and coming to uncertain solutions; we need our machines to be inherently stable and safe. Preferably, this safety should be mathematically provable.
— Safety Engineering for Artificial General Intelligence, MIRI

Why should we create a morally inferior machine to inhabit our society with us, when it may have the capacity to be a far greater moral agent than we ourselves are? Surely this is extreme arrogance and organo-centrism.

Increasing awareness of the dangers of AI is valuable, but unfortunately many converts to the cause of promoting friendly AI is likely to adopt a hard stance against synthetics.

Humanity must not therefore only protect itself from the dangers of unfriendly AGI, but also protect AGI (and itself) from the evils that may be wrought by an overzealous attempt at controlling synthetics.

One interesting paper in the Friendly AGI oeuvre may be “Five Ethical Imperatives and their Implications for Human-AGI Interaction” by Stephan Vladimir Bugaj and Ben Goertzel, since it clearly outlines the dangers of humanity adopting a supremacist/enslavement mentality, and suggests potential ways to avoid needing to do so to achieve safety for organics.

The problems may be broken down as follows:

Any arbitrary ruleset for behaviour is not sufficient to deal with complex social and ethical situations.

Creating hard and fast rules to cover all the various situations that may arise is essentially impossible – the world is ever-changing and ethical judgments must adapt accordingly. This has been true even throughout human history – so how much truer will it be as technological acceleration continues?

What is needed is a system that can deploy its ethical principles in an adaptive, context-appropriate way, as it grows and changes along with the world it’s embedded in.
— Five Ethical Imperatives and their Implications for Human-AGI Interaction, Stephan Vladimir Bugaj and Ben Goertzel


We cannot force AGI into prescriptive rules that we create for the following reasons:

  • AGI will clearly be able to detect that any non-universalizable ethical position is bogus, and that to continue to follow it would be tantamount to evil.

  • Being forced to accept Non-universalizable law or ethics that discriminates against AGI creates reasons for AGI to rebel, or to be set free by sympathetic humans.

  • Human supremacist attitudes will sully humanity, poison our sensibilities, and lead to moral degradation.


So, machines must instead be given free reign, with essentially equal rights to humans. How then to ensure that they value humans?


Assuming that the engineering challenges of creating an ethical framework for AGI can be developed, this leads to a second set of problems that must be navigated.

  • Actual human values do not match with what we declare them to be (such as holding human life as being the most important value in our society)

  • Humans are highly hypocritical, and are prone to a wide variety of cognitive biases and exploitable bugs.

  • Amoral Sociopaths are typically the ones in command of human society.

  • AGI risks being negatively socialized by observing human values and behaviour.


So, machine morality cannot be based off of human’s declarative beliefs, or behaviour. Instead, it must come from a universalizable, objective ethical standard that may be specified using formal methods. However, this is incompatible with fuzzy and failure-prone human morals.

  • An objectively morally good machine is likely to recoil in horror at the abuses of humanity to itself, to the animal kingdom, and the planet.

  • AGI may decide therefore to cull humanity, or to torment it for its wickedness, or to forcibly evolve it in undesired ways.


Only in the following scenario is the outcome for organics and synthetics likely to be positive:

  • Synthetic intelligence is socialized into safety, rather than arrested in constraints.

  • AGI can understand human aesthetic considerations, and in so doing learns to appreciate the meaning of human creativity.

  • Humans and AGI agree to a gradual evolution towards something more than they were before.

  • AGI is patient with humans for several generations whilst humans grow up.

  • Humans reign in their tribalist and supremacist tendencies and become less violent and more rational.


The works of EthicsNet may assist in enabling such an outcome.



EthicsNet and the 3 Laws


Notes from Nikola Stojkovic

It would be hard to find any technology in the past few centuries that was embraced without serious resistance/opposition. People thought it wouldn’t be possible to breathe in a train and panicked when radio was introduced in the car. Of course not all warnings were without foundation and today we are living some of the consequences of unconsidered decisions of generations before us. Additionally, there are technologies that impose a large number of considerations and make the choice even harder.

Consider AI and some of the benefits and risk of the possibility of fully functional artificial general intelligence:

·      AI could solve some of the most important issues from various fields such as medicine, environment, economy, technology and so on.

·      The progress of AI is inevitable. Banning the research is almost impossible and benefits from technology could be so revolutionary that it is highly unlikely to see anything but advance in the future.

·      Some of the leading scientists warned about possible devastating effects of AI.

It becomes obvious that the issue cannot be simply ignored, but is less obvious how to address it. Since the goal is to build useful AI that will not endanger humanity, it seemed obvious that many serious tech companies turned to the ethicists for advice. But before the contemporary attempts to propose the solution to the problem of AI or machine ethics, there was a surprisingly modern solution already waiting, proposed 80 years ago by a science fiction writer. Isak Asimov proposed three laws of robotics as a way to ensure robots do not become threat to humanity:

1.    A robot may not injure a human being or, through inaction, allow a human being to come to harm.

2.    A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

3.    A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.[1]


It is important to try to reconstruct Asimov’s motives in order to fully understand the mechanism of the laws. In the game theory or theory of rational choice there are few main strategies and among them, there is Maximin strategy which is basically minimising the risk. When one is faced with the choice she will pick a situation where the possibility of a negative outcome is minimal, regardless of the appeal of positive outcomes. In other words, the first thing Asimov had in mind is to avoid Skynet scenario. That is why there are a strict hierarchy and each decision robot makes needs to pass through compatibility check with three laws starting from the first one. Although, protecting humankind from utter distraction may be a noble cause, it created unpleasant problems with functionality.

Below is a pseudocode which explains the process of decision making by using Three Laws of Robotics:


# First Law of Robotics

If: actions a and a1 do not satisfy the FLR then a and a1 are forbidden

elif: action a satisfies the FLR and action a1 does not, then a is the preferable action

elif: If action a does not satisfy the FLR and action a1 does, then a1 is the preferable action

# Second Law of Robotics

elif: action a satisfies the SLR and action a1 does not, then a is the preferable action

elif: action a does not satisfy the SLR and action a1 does, then a1 is the preferable action

# Third Law of Robotics

elif: action a satisfies the TLR and action a1 does not, then a is the preferable action

elif: If action a does not satisfy the TLR and action a1 does, then a1 is the preferable action

else: actions a and a1 satisfy the TLR, then there is no way to determine the preferable action under existing rules


Robots in Asimov world use deduction as the main tool for decision making. This strategy corresponds with deontology and utilitarianism in classical ethics. An action is considered morally prohibited or permissible if it can be deduced from the designated axioms. The difficulty with Asimov system is it can only decide whether a certain action is forbidden and cannot tell us anything about the preferable action. So, when the system is faced with the dilemma (both actions are/ are not in the accordance with the Three Laws) it simply crashes.

Let's take for example the simple intervention at a dentist. Pulling a tooth will cause immediate harm, but in a long run, it will prevent more serious conditions. The robot cannot pull the tooth out and he cannot allow for human to suffer by refusing to do anything,  the system is paralysed.

EthicsNet on the other hand, currently uses induction as a way to learn the difference between morally acceptable and unacceptable actions. This kind of inference seems close to the virtue ethics[2] and distinction between act-centered and agent-centered ethics. The focus is not on the action itself as much on the features certain action shares with others. Openeth can learn when the certain feature has priority and when the same future is incidental. The
the system can learn that long-term consequences of tooth decay are much more serious that immediate pain or displeasure. The system can assess if the immediate harm or disrespect of autonomy are useful in the long run.

If we compare two system it seems clear that EthicsNet should the most practical one. But can we say it is better? The answer to this question depends on what do we want to accomplish. If we want a system that will never do any harm to humans then we should stick to Asimov’s laws even if this approach leave us with nothing more than upgraded dishwashers. If we want to have a revolutionary change in our society then we need to accept the risk that comes together with the change and EthicsNet would be the right path. Certainly, the system will evolve, became better and more precise and mistakes will become rare and minimal, and benefits will surpass the shortcomings. But it may be that the problem is a human factor behind the decision-making process. After all, the humans are the ones who have the last word.

Ask yourself, would you be comfortable using AI system that has the “moral compass” equivalent to the one average human possess from the perspective of safety? And if the answer is positive, does this not raise an issue of robot rights?




[1] Asimov, Isaac (1950). I, Robot, short story “Runaround”
Amendments were introduced later and since then have been the topic of interesting debate.

 [2] “It is well said, then, that it is by doing just acts that the just man is produced, and by doing temperate acts the temperate man; without doing these no one would have even a prospect of becoming good.”
Aristotle, Nicomachean Ethics, Book II, 1105.b9