The AI Playbook

Chapter 6: Regulation and Ethics

Summary

  • As consideration of data privacy grows, and with the General Data Protection Regulation (GDPR) in force across the European Union (EU), it is vital to ensure you are using data appropriately. The GDPR applies to all companies processing the personal data of people in the EU, regardless of a company’s location.
  • Companies that are ‘controllers’ or ‘processors’ of personal information are accountable for their handling of individuals’ personal information. Demonstrate compliance with GDPR data handling requirements and the principles of protection, fairness and transparency.
  • Minimise the personal data you require, to reduce regulatory risk, and pseudonymise all personal data through anonymisation, encryption or tokenisation.
  • In addition to standardising data handling requirements and penalties for misuse, the GDPR introduced considerations that can impact AI systems specifically. Verify that automated systems meet GDPR stipulations. Article 22 of the GDPR prohibits legal effects that result solely from automated processing being undertaken without an individual’s explicit consent, when consent is required. Several legislative terms are subject to interpretation at this time. It may be prudent to make your system advisory only, and include a human check, if you are developing a system that could materially impact an individual’s life.
  • ‘Explainability’ – explaining how the outputs of your AI system are derived – is growing in importance. Convention 108 of the Council of Europe, adopted into UK and EU law in May 2018, provides individuals with the right to obtain knowledge of the reasoning underlying data processing systems applied to them. Explainability can be challenging in relation to deep learning systems. Explore varying approaches to explainability including Inferred Explanation, Feature Extrapolation and Key Variable Analysis. Each offers trade-offs regarding difficulty, speed and explanatory power.
  • Develop a framework forethical data use to avoid reputational and financial costs. The ALGOCARE framework, developed by the Durham Police Constabulary in partnership with academics, highlights issues you should consider when managing data. It incorporates: the nature of system output (Advisory); whether data is gathered lawfully (Lawful); whether you understand the meaning of the data you use (Granularity); who owns the IP associated with the data (Ownership); whether the outcomes of your system need to be available for individuals to challenge (Challenge); how your system is tested (Accuracy); whether ethical considerations are deliberated and stated (Responsible); and whether your model has been explained accessibly to as great an extent as possible (Explainable).

Regulation & Ethics: The Checklist

Comply with regulations and license requirements

  • Review your compliance with current legislation including the UK Data Protection Act, the EU GDPR and EU Convention 108
  • Monitor proposed legislation to anticipate implications
  • Review permissions for the customer data you collect
  • Check that the data sets you use are available for commercial use
  • Validate licenses for open source models you use

Deliver explainable, ethical AI

  • Understand industry-specific explainability requirements
  • Define an explainability framework
  • Select and apply an approach to explainability
  • Update your framework documentation as your models and data change
  • Validate that your use of data is ethical

As consideration of data privacy grows, and with the new General Data Protection Regulation (GDPR) in force across the European Union, it is important to ensure you are using data appropriately. Today, data permissioning and management are critical aspects of any AI-driven company. Below, we describe regulatory and ethical considerations to help you manage safely the data you use to build your models. Seek legal advice to ensure compliance with any applicable legislation; the information below is introductory in nature and will not reflect your company’s individual circumstances.

Ensure compliance with GDPR data handling requirements

The GDPR came into force across the European Union on 25th May 2018. It applies to all companies processing the personal data of people in the EU, regardless of a company’s location. Among other considerations, it standardises data handling requirements and penalties for data misuse. Article 83 of the GDPR specifies fines of up to 4% of a company’s global revenue or €20m – whichever is greater – for non-compliance.

Individuals, organisations and companies which, according to the GDPR, are either “controllers” or “processors” of personal information are accountable for their handling of individuals’ personal information. Companies must “implement measures which meet the principles of data protection by design and by default”. Transparency and fairness are also key concepts within the GDPR. You must be clear and honest regarding how you will use individuals’ personal data – and must not use personal data in a way that is unduly detrimental, unexpected or misleading for the individuals concerned.

Demonstrate compliance with the GDPR principles of protection, fairness and transparency in multiple ways, including by:

  • Collecting only the data you require
  • Being transparent regarding why data is collected, what it will be used for and who will have access to it
  • Ensuring you have appropriate permissions to store and process your data
  • Removingunnecessarypersonaldata
  • Deleting data when its agreed purpose has been fulfilled
  • Anonymising data, where possible, to remove personal identifiers
  • Encrypting personal data
  • Securing physical access to your data storage
  • Limiting access to your data
  • Monitoring data access and saving an audit trail of individuals who have viewed or changed data
  • Using data only for the purposes agreed
  • Implementing a process to provide an individual with a copy of all the data you hold about him or her
  • Implementing a process to remove all the data you hold about a specific individual.

The GDPR has expanded the definition of personal data, which broadly refers to information relating to an identified or identifiable person, to include information “specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person”. This includes job titles, employers and social media handles – even if the individual has made these public on selected websites. While some information may directly identify an individual, other information may do so indirectly if combined with other information. In both circumstances, the data is deemed personal information. Further, if you are inferring personal information – such as gender, age or salary – using your system, you must treat the information as if it were gathered from the user directly.

“Demonstrate compliance with GDPR principles of protection, fairness and transparency in multiple ways.”

Certain personal data – in categories including racial origin, religious beliefs, genetic data and data concerning a person’s sexual orientation – may be considered “sensitive data” and require special care. Pseudonymise all personal data through anonymisation, encryption or tokenisation:

  • Anonymisation: Remove or replace personal data with random information. Even if unauthorised individuals read
  • the data, they will be unable to identify the data subject.
  • Encryption: Encrypt personal data fields. The decryption key will be required to identify an individual from the encrypted data. The decryption key must be stored safely. AI techniques remain effective on encrypted data, enabling you to identify patterns even if the input data is not human-readable. This offers a way to incorporate personal attributes more safely.
  • Tokenisation: Remove personal data from the maindata set and replace it with numerical tokens that relate to each aspect of personal data. The process may be as simple as providing each individual with a unique identifier. The personal data and corresponding token are stored on a separate, more secure, system, allowing the data to be reconnected at a later date. Tokenisation is effective when one party has permission to view personal data and needs to interpret the results of the AI system, but the company providing the AI system does not need to view the personal data – for example, a medical analysis system.

“Consider the security of data not just when it is stored but when it enters, moves within and leaves your environment.”

Even with security best practices in place, holding personal data remains a risk. Minimise the personal data you require. If you can fully anonymise your data and avoid the need to store any personal information, do so. If you must store personal data, consider the security of data not just when it is stored but when it enters, moves within and leaves your environment. Examine every point where personal data could be read by an employee or malicious third party and ensure you have pursued every measure within your control to protect it. Delete data when it has been processed according to its agreed purpose.

Verify that automated systems meet GDPR stipulations

In addition to standardising data handling requirements and penalties for misuse, the GDPR introduced considerations that can impact AI solutions:

  • Article 22 (Paragraph 1): “The data subject shall have the right not to be subject to a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.”
  • Article 22 (Paragraph 2): “Paragraph 1 shall not apply if the decision:
    • is necessary for entering into, or performance of, a contract…[or]
    • is based on the data subject’s explicit consent.”
  • Article 22 (Paragraph 3): “In the cases referred to in [Paragraph 2], the data controller shall implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision … and, in Recital 71 only, the right to an explanation of the decision.”

These articles are yet to be comprehensively tested in court. However, they explicitly prohibit legal effects – such as sentencing and parole decisions – that result solely from automated processing undertaken without the individual’s explicit consent, when consent is required.

What constitutes “similarly significant” effects, “explicit consent” (beyond acceptance of an extensive set of online conditions containing a relevant paragraph) and whether something is “necessary” to perform a contract are subject to interpretation at this time.

If you are developing an AI system that could materially impact an individual’s life, therefore, it is prudent to consider making your system advisory only and including a human check. Once case law has better established the meanings of the terms above, there will be greater clarity regarding the implications of the legislation.

The GDPR and Convention 108 impose obligations of explainability

Article 22 (Paragraph 3) of the GDPR, which requires companies to protect the data they control and allows individuals to challenge an automated system they believe is treating them unfairly, demands a robust explanatory framework for the outputs of your systems. Convention 108 of the Council of Europe (https://bit.ly/2n6POrT), adopted into UK and EU law in May 2018, imposes related requirements:

  • Convention 108 (Article 8): “Every individual shall have a right… to obtain, on request, knowledge of the reasoning underlying data processing where the results of such processing are applied to him or her.”

Convention 108 affords individuals the right to understand how decisions made about them, using data processing, have been developed. Because every individual possesses this right, you must be able to explain, in lay terms, how decisions that affect individuals are made.

Explore varying approaches to explainability

Beyond the stipulations of Convention 108 of the Council of Europe, there is growing demand more broadly for greater explainability of AI systems. Improved explainability was a recommendation, for example, from the UK Parliamentary Science and Technology Select Committee on AI. Regulatory or pragmatic demands may force you to consider the explainability of your systems.

For a system that uses a decision tree, it will be straightforward to explain how data maps to the system’s decision. For machine learning-based systems, and particularly deep learning-based systems, this will not be possible. There may be thousands of abstract numbers corresponding to the connections in the network that contribute to its output. These numbers will be meaningless to individuals who seek to understand the system, which is why many AI systems are considered to be ‘black box’ and inexplicable.

There are, however, means of explanation that do not involve a system’s mathematics. These approaches consider the impact of variables inputted to a system and their influence on the output. There are several techniques you can apply including Inferred Explanation, Feature Extrapolation and Key Variable Analysis (Fig. 41). Which you favour will depend on the level of explainability you require and the challenge of providing it.

Fig. 41. Three approaches to explainability
Approach Difficulty Speed Advantages Disadvantages
Inferred Explanation Low Fast Easy to understand Limited explanatory power
Feature Extrapolation Moderate Slow Easy to understand Limited applicability
Key Variable Analysis Very high Very slow Thorough Challenging to understand

Source: MMC Ventures

1. Inferred Explanation: Inferred Explanation is the easiest way to explain AI. The algorithm is not described and a ‘black box’ is retained around it. Correlations are considered between system inputs and outputs, without explaining the steps between.

By demonstrating examples of decisions, individuals can see the correlations between input data and output decisions (Fig. 42), without detail regarding how the inputs and outputs are connected. Inferred explanation does not provide complete clarity regarding a model, but will demonstrate how decisions relate to inputs in a manner that will be satisfactory in many situations.

Fig. 42. Inferred Explanation

Source: MMC Ventures

“There are means of explanation that do not involve a system’s mathematics.”

2. Feature Extrapolation: Some systems, including financial models and systems that materially impact individuals, require an explanation – beyond correlation – of how models reach their conclusions. While more effort, it is possible to evaluate features in data that are activating parts of a network. This is particularly fruitful for image classification systems. Using test data, and reversing the flow of data in a network, you can create images that demonstrate the features that activate a particular layer in the network (Fig. 43). Further, Google recently released

a library for TensorFlow to undertake this visualisation automatically, within a browser, during training (bitly.com/2R6XeZu). While not suitable for all AI systems, feature extrapolation provides a degree of explainability in a manner that non-technical individuals can appreciate.

“Some systems require an explanation – beyond correlation – of how models reach their conclusions.”

Fig. 43. Feature Extrapolation

Source: Zeiler and Fergus, https://bit.ly/2JjF4R0

“It is possible to evaluate features in data that are activating parts of a network.”

3. Key Variable Analysis: If you wish to provide the most precise explanation of your system, you must analyse the impact of each input on the system’s decision-making process and overall decision. This will require a full statistical analysis and is a significant undertaking. For each output decision, you will require an example of input data that strongly results in that decision. Change each variable, in turn, from the value that leads to Decision 1 to the value that leads to Decision 2. Then, change the variables in combination. The effort required will increase exponentially according to the number of variables you have, and the process will be time-consuming. However, you will be able to determine whether any system inputs, singularly or in combination, have a disproportionate effect on your system’s output decision (“variable bias”). You may find, for example, that your model places a high importance on gender when providing an output decision. This is possible, even if gender is not explicit in your data, if you have other closely associated variables (such as vocation in a gender biased industry).

Key variable analysis has drawbacks as well as advantages. In addition to being complex and resource-intensive, it can be difficult to explain results accessibly. Further, if you explain your model in a high degree of detail, malicious third parties can use this information to force results from your model that they wish to see.

“If you wish to provide the most precise explanation of your system, you must analyse the impact of each input on the system’s decision-making process and overall.”

Fig. 44. How to select an approach to explainability
Use this approach if you: Avoid this approach if you:
Inferred Explanation
– Seek a high-level overview of your AI system
– Believe correlation offers sufficient explainability
– Require detail regarding how variables lead to decisions
Feature Extraction
– Require detail from within the network
– Have a network type (e.g. images) where abstractions can be mapped onto input data
– Have limited time
– Require precise impact of input variables, not general features
– Are not using an assignment–based or generative AI network
Key Variable Analysis
– Require detail about the importance of variables
– Seek to prevent unwanted bias in your variables
– Have limited time
– Seek to publish your results
– Wish to offer a layperson’s guide to your model

Source: MMC Ventures

Develop a framework for ethical data use

When developing and deploying AI systems, as well as providing sufficient explainability it is important to use data ethically. “Plan for ethical AI from the outset and underpinning all initiatives. It’s got to be foundational, not an afterthought” (Steven Roberts, Barclays). In addition to the intrinsic importance of doing so, a growing number of companies are incurring reputational and financial costs from failing to do so.

The Durham Police Constabulary, in conjunction with computer science academics, is trialling a framework – ALGOCARE – to ensure its AI system uses data transparently within an explainable, ethical process. Many companies with AI systems also have frameworks in place, albeit privately and often loosely defined. While every company’s framework differs, ALGOCARE highlights issues you should consider when managing data.

  • Advisory: Is the output of the algorithm used as a suggestion or a fact? How to interpret the output of a system is a key consideration. Does a human, or automated system, act on the output without further thought or investigation? Do you wish your car, for example, to brake automatically if danger is perceived (even when the system is incorrect) or to warn you so you can make the decision? To an extent, what is optimal may be determined by the domain of the problem.Too often, the numbers returned alongside a label are interpreted and used as a confidence score. Usually, however, they will be a probability distribution that has been tuned to give an output above a specific level for the predicted result. Even incorrect results can have high probabilities. Decisions based on the “confidence” of the network decision, therefore, can be disastrous. While there are tuning techniques that better align a network’s prediction probabilities with confidence levels (Guo et al, https://bit.ly/2JiRNTS) they are rarely used given the time required and teams’ focus on measures of accuracy.
  • Lawful: Is your data gathered lawfully and do you have the right to use it? Under the GDPR, individuals may not have consented to their data being processed in the way you intend. Data gathered without the informed consent of the individual should not be used.
  • Granularity: Do you understand the meaning of the data you feed into your model? To avoid biased results and models that fail when exposed to real-world data, it is important to do so. What variables are missing, combined or inferred? How varied is your data? Do you have sufficient time series data? Data scientists can excel in this regard, questioning data and anticipating problems before building models.
  • Ownership: Who owns the intellectual property associated with the data and algorithms you use? Many companies use academic data sets and models to validate concepts in the early stages of development. Confirm that the licenses, for both the data and models you use, are suitable for commercial use. The availability of something on the internet does not confer on your company the right to use it.
  • Challengeable: Do the outcomes of your system need to be available to individuals? For example, under GDPR may the system be challenged? In some sectors, there will be a greater need than others to be open about the basis of your results. If you undertake all projects assuming that your results will be challenged, you will be prepared.

“Plan for ethical AI from the outset and underpinning all initiatives. It’s got to be foundational, not an afterthought.”

Steven RobertsBarclays
  • Accuracy: How is your system tested? How is changing data or inaccuracies fed back into the system? Is dated data removed? Many companies report ‘accuracy’ by cherry- picking precision or recall (Chapter 5), which will overstate model performance. Continuous testing with real world data is the only way to verify your model’s performance.
  • Responsible: Are ethical considerations deliberated and stated? The impact of your system will depend on its domain and the consequences of false positives and false negatives. If your system could adversely impact an individual’s life, you must understand and consider the implications.
  • Explainable: Has your model been explained, in jargon- free language, to as great an extent as possible without exposing your intellectual property? Explanations that avoid technical terminology will enable everyone in your business to understand what you are creating and to challenge it. “Having a framework to explain your models is valuable for all stakeholders” (Dr Janet Bastiman, Chief Science Officer, StoryStream). What does your model do? How do inputs correlate with outputs?

“EU and UK Parliamentary committees are engaged on the issues of AI and explainability.”

EU and UK Parliamentary committees, including the Science and Technology Select Committee and the House of Lords Artificial Intelligence Select Committee on AI, are engaged on issues of AI, explainability and data privacy. The UK Science and Technology Select Committee, for example, launched an inquiry into the use of algorithms in public and business decision-making. Further, more specific, legislation is probable. Ensure that a senior member of your team (Chief Science Officer, Head of AI or Head of Data) is responsible for staying up-to-date regarding proposed legislation and the impact it could have on your business.

“Ensure that a senior member of your team is responsible for staying up-to-date regarding proposed legislation.”