The AI Playbook

Summary

Chapter 1: Strategy

  • Recognise AI’s potential for value creation. While you should not add AI to your initiatives for the sake of doing so, you risk losing competitive advantage if you fail to explore what AI can offer.
  • Identify appropriate problems for AI to solve. AI is particularly effective at: assignment (identifying what something is, or the extent to which items are connected); grouping (determining correlations and subsets in data); generation (creating images or text based on inputs) and forecasting (predicting changes in time series data). All businesses will have challenges where the above apply and, therefore, where AI can be fruitful.
  • Prioritise projects according to value and viability. Ensure you have a clear, concise specification of the problem and desired outcomes. Assessing viability includes considering whether your training data is balanced (free from bias), exhaustive (captures all relevant variables), diverse (captures rare situations) and is of sufficient volume.
  • Timescales for creating AI are less certain than for traditional software development – and typically extend non-linearly with desired accuracy. Timescales vary according to the problem type, subject domain and data availability. Frequently, a prototype with limited accuracy can be developed within three months.
  • Align your budget with your goals and deployment strategy. The budget an AI initiative requires will depend on multiple factors including the complexity of the problem, the availability and quality of training data, and the deployment strategy you select.
  • AI deployment strategies include: calling third party Application Programming Interfaces (APIs); using managed AI services from third parties; building a small in-house AI team; and building an extensive in-house AI team. A large, in-house team is a multi-million-pound annual investment. Many companies develop a proof-of-concept using their existing development teams, and third-party APIs or paid services. Then, they create a budget proposal and begin with a small, in-house AI team.
  • Seek sponsorship from senior executives. Support from management will be important for new AI initiatives to succeed. To build support, educate senior management regarding the benefits of AI while setting realistic expectations regarding timescales and results.
  • Anticipate and mitigate cultural concerns about AI. To some, AI will be unfamiliar. Others will see their workflows change. Many people may be concerned about the impact of AI on job security. Frequently, AI will enhance an individual’s role by offering ‘augmented intelligence’. Address concerns proactively by highlighting the ways in which AI will support individuals’ goals and enable team members to redirect their time to engaging aspects of their roles.
  • Expect non-traditional security considerations. Protect against malicious activity via thorough system testing and exception handling.
  • When your first project is underway, anticipate the longer- term aspects of your AI strategy. Consider: maintenance; data (budget to retrain your system as data evolves and increases); evolving algorithms (new techniques will offer better results in the future); scaling (extending useful AI systems to additional business units and geographies); innovation (a roadmap for new AI initiatives); and regulation (a strategy to comply with new legislation as it emerges).
To engage effectively with AI, separate AI myths from reality
Myth Reality
“AI is a distant dream.” While general, human-level artificial intelligence will not be available for many years, there are many applications for AI that are viable today and offer companies cost savings and revenue growth.
“We don’t have the budget to implement AI.” While a large, in-house AI team will require extensive investment, third parties offer access to AI services (via API) for as little as several hundred pounds. Further, as AI democratises, growing libraries of pre-trained models offer results at low cost. If you have a software engineering team, you can validate benefit from AI at minimal cost.
“AI is dominated by the big technology companies. There’s no point in my company trying to compete.” While companies including Amazon, Google, IBM and Microsoft have developed extensive AI services, they lack the strategic desire, data advantage or domain expertise to tackle the many sector – or function – specific applications for AI. Today, a rich ecosystem of startups, scale-ups and corporates are deploying AI for competitive advantage.
“We can’t use AI because our business requires explainable processes.” There are several ways to explain what is occurring inside an AI system (see Chapter 6). Some AI is directly explainable. With deep learning systems, where explainability is a challenge, it is possible to explain how input variables influence output.
“I can throw AI at my data and it will offer efficiencies.” AI is a tool that requires a structured problem and appropriate data to be effective.

Source: MMC Ventures

Chapter 2: People

  • In AI, job titles vary and can be difficult to interpret. We describe characteristics and salaries for six key roles: Data/Machine Learning Engineer; Data scientist; Machine Learning Researcher; Head of Data; Head of Research/ AI; and Chief Scientist/Chief Science Officer. For each, individuals’ capabilities vary across competencies in research, engineering, production and strategy.
  • The composition of your team should depend upon the problem being solved and your approach to doing so. It is advisable, however, to avoid hiring solo talent. Begin with a small team, and ensure you have a robust AI strategy in place before expanding your AI personnel.
  • We suggest team structures, first hires and next steps for six scenarios: “I want insights into internal data”; “I want to implement third party AI APIs”; “I want to outsource AI development”; I want to create bespoke AI models”; “I want to use a combination of bespoke and third party AI”; and “I have an idea that’s cutting edge.”
  • Recruiters, conferences and universities are primary sources of talent. Traditional recruitment agents find it difficult to screen AI candidates, so engage with specialist recruiters. Conferences and meetups are powerful vehicles for talent acquisition; be active in the AI community, attend and speak at conferences, and grow your network to discover capable candidates. Engage with universities; post on their job boards, establish partnerships and pay for projects to engage students who may seek future opportunities with you.
  • Diversity delivers economic value and competitive advantage. Review the culture in your company, AI team and hiring practices to ensure diversity, representation and inclusion.
  • An effective job description should emphasise projects (the nature of the engagements on which the successful candidate will work), skills and impact. Most data scientists seek work that will ‘make a difference’. To attract talent, demonstrate how the successful candidate’s work will do so.
  • When hiring, prioritise adaptable problem-solvers. In addition to having role-specific and technical skills, a strong AI candidate will: understand available tools to enable rapid research and development; appreciate when to release an imperfect solution and when to wait; and communicate and collaborate well.
  • Optimise everystage of your recruitment funnel. We provide best practices for: CV screening; phone screening; technical testing; face-to-face interviews and post-interview follow-up.
  • AI talent is in shortsupply. Challenge, culture and company are key for retention. In addition to an attractive financial package, consider: offering flexible working hours; offering challenging problems and minimising drudgery through automation; creating a culture in which diverse ideas are shared; avoiding ‘lone workers’; ensuring your AI team receives recognition for its work; and supporting team members’ publishing and presentation of work.

“Most data scientists seek work that will ‘make a difference’. To attract talent, demonstrate how the successful candidate’s work will do so.”

Chapter 3: Data

  • For effective AI, develop a data strategy. A data strategy spans: data acquisition & processing; quality; context; storage; provisioning; and management & security. Define your data strategy at the outset of your AI initiative.
  • Accelerate data acquisition by using multiple sources. Developers draw on several sources including: free resources (such as dataset aggregators); partnerships with third parties (companies, universities, data providers and government departments); and new, proprietary data.
  • A high-quality data set has appropriate characteristics to address your business challenge, minimises bias and offers training data labelled with a high degree of accuracy. Develop a balanced data set – if you possess significantly more samples of one type of output than another, your system will exhibit bias.
  • Primary forms of bias are: unwarranted correlations (between inputs and output classifications); erroneous assumptions which cause relationships to be missed (‘underfitting’); and modelling noise instead of valid outputs (‘overfitting’). Adjust for overfitting and underfitting by using different data volumes and model structures. Remove unwarranted correlations through testing.
  • Ensure that the results of your internal testing will be maintained when applied to real-world data. Test early, and frequently, on real-world data.
  • Managing ‘dirty data’ is data scientists’ most significant challenge (Kaggle). Smaller volumes of relevant, well- labelled data will typically enable better model accuracy than large volumes of poor-quality data. To label data effectively: consider developing a supporting system to accelerate data labelling and improve accuracy; draw on existing AI and data techniques; and seek data labelled by multiple individuals to mitigate mislabelling.
  • Understand the data you use. Ensure you capture the human knowledge regarding how your data was gathered, so you can make downstream decisions regarding its use. Capture data provenance (where your data originated and how it was collected). Define your variables (differentiate between raw data, merged data, labels and inferences). Understand the systems and mappings through which your data pass to retain detail.
  • Store and structure data optimally to support your objectives. Storage options include basic file-based, relational, NoSQL or a combination. When selecting storage plan for growth in data volume, updates, resilience and recoverability.
  • One in three data scientists report that access to data is a primary inhibitor of productivity (Kaggle). Develop a provisioning strategy that: ensures data is accessible across your organisation when needed; contains safeguards to protect your company against accidents; optimises system input/output; and maintains data freshness.
  • Implement robust data management and security procedures consistent with local and global regulations. Personal data is protected by UK and EU law and you must store it securely. Draw on principles of appropriate storage, transmission and minimum required access.
The six components of an effective data strategy

Source: MMC Ventures

Chapter 4: Development

  • There are many ways your company can engage with AI. Use third party AI APIs; outsource; use a managed service; build an in-house team; or adopt a ‘hybrid’ approach combining an in-house team with third party resources.
  • Third party AI APIs fulfil specific functions to a moderate or high standard at low cost. Most solve problems in the domains of vision and language. Numerous APIs are available from Amazon, Google, IBM, Microsoft and also other smaller companies. Features vary; we provide a summary. APIs offer immediate results without upfront investment, at the expense of configurability and differentiation. Use an API if you seek a solution to a generic problem for which an API is available. APIs are unsuitable if you seek solutions to narrow, domain-specific problems, wish to configure your AI, or seek long-term differentiation through AI.
  • Managed services enable you to upload your data, configure and train models using a simple interface, and refine the results. Managed services abstract away much of the difficulty of developing AI and enable you to develop a custom solution rapidly. Managed services offer greater flexibility and control than APIs, but less flexibility than an in-house team, and also require you to transfer data to a third party and may create dependencies.
  • If a third-party solutionis unavailable and an in-house team is too expensive, you can outsource your AI development. Whether outsourcing is appropriate will depend upon your domain, expertise, required time to value and data sensitivity. If outsourcing, specify desired frameworks and standards, who will provide training data, costs, timescales and deployment considerations. Outsource if you require trusted expertise quickly and a cheaper option than permanent employees. Avoid outsourcing if your data permissions prohibit it, you require domain or sector knowledge that an outsourcer lacks, or you wish to build knowledge within your own company.
  • An in-house AI team offers maximum control, capability and competitive differentiation – at a price. A small in- house team will cost at least £250,000 to £500,000 per year. A large team requires a multi-million-pound annual investment. To develop an in-house team your company must also: attract, manage and retain AI talent; select development frameworks and techniques; gather and cleanse data; learn how to productise AI into real-world systems; and comply with regulatory and ethical standards. Build an in-house team if you have a problem that cannot be solved with existing solutions, seek differentiation in the market, or seek to maintain control over your data.
  • A ‘hybrid’ approach is ideal for many companies. Plan for an in-house team that will address your requirements to a high standard over time, but use third party APIs to solve an initial, simpler version of your challenge. A hybrid approach can be attractive if you seek rapid initial results, wish to limit spend until a business case is proven and want greater differentiation and resilience over time.
  • To develop AI yourself you have choices to make regarding your AI ‘technology stack’. The stack comprises six layers: hardware; operating systems; programming languages; libraries; frameworks; and abstractions. Not all problems require the full stack.
  • Ensure your team has hardware with graphical processing units (GPUs) that support NVIDIA’s CUDA libraries. Laptops with high performance graphics cards offer flexibility. For greater power, desktop machines with powerful GPUs are preferable. To train large models, use dedicated servers. Cloud-based servers offered by Amazon, Google or Microsoft are suitable for most early stage companies.
  • Apply AI techniques suited to your problem domain. For assignment problems consider: Support Vector Classification; Naïve Bayes; K-Nearest Neighbour Classification; Convolutional Neural Networks; Support Vector Regression; or ‘Lasso’ techniques. We describe each and explain their advantages and limitations. For grouping problems, explore: Meanshift Clustering; K-Means; and Gaussian Mixture Models. For generation, consider: Probabilistic Prediction; Variational Auto-Encoders; and Generative Adversarial Networks (GANs).
With one network, GANs generate output from random noise; a second network serves as a discriminator

Source: https://medium.freecodecamp.org/an-intuitive-introduction-to-generative-adversarial-networks-gans-7a2264a81394

Chapter 5: Production

  • An unused AI systemdelivers no value. Develop a production process that smoothly transitions AI systems you have in development to live use.
  • AI production follows a conventional development process and requires you to undertake research, develop a prototype and create a minimum viable product (MVP). Once in production, undertake cycles of ideation, research, development and quality assurance.
  • Effective R&D requires rapid iteration. Initially, optimise for speed over quality. Releasing an early model into production for feedback is preferable to waiting until a research model is perfect.
  • During the R&D phase, solicit feedback about prototypes from beyond the AI and production teams to minimise expensive redevelopment later.
  • When moving from MVP to production, select an appropriate hosting environment. On-premise hosting is suitable for those with highly sensitive data and existing on-premise hardware, but is rarely preferred by early stage companies given high upfront costs, unpredictable activity levels and required security expertise. Hosting your own hardware in a data centre offers control and value over the long term. Upfront costs can be high, however, and managing a data centre can prove a distraction for young companies. Cloud hosting, which offers low upfront costs and high levels of flexibility, is well suited to many early stage companies – although annual costs can be double that of a self-managed data centre and cloud hosting may be unsuitable for highly sensitive data. Consider the physical location in which your cloud servers are hosted. Different countries have varying rules regarding data and you may be required to keep your data within its area of origin.
  • Proving that AI systems are effective differs from the typical software quality assurance (QA) process. Test your AI system at multiple stages – during training, validation and continuously through its life. Efficiency is critical; automate testing to as great an extent as possible.
  • Understand the three common measures of ‘accuracy’ in AI – recall, precision and accuracy – and monitor all three to capture performance. Balancing precision and recall is challenging. Whether you elect to minimise false positives or false negatives should depend upon the nature of your sector and the problem you are solving.
  • An effective maintenance programme will sustain your AI’s intelligence. Beyond the maintenance you would typically perform on a software system, you should verify and update your AI system on an ongoing basis. AI technology is developing at pace. Invest in continual improvement to ensure your system avoids obsolescence.
The AI production pipeline is similar to a normal development practice

Source: MMC Ventures

“Understand the three common measures of ‘accuracy’ in AI – recall, precision and accuracy – and monitor all three to capture performance.”

Chapter 6: Regulation & Ethics

  • As consideration of data privacy grows, and with the General Data Protection Regulation (GDPR) in force across the European Union (EU), it is vital to ensure you are using data appropriately. The GDPR applies to all companies processing the personal data of people in the EU, regardless of a company’s location.
  • Companies that are ‘controllers’ or ‘processors’ of personal information are accountable for their handling of individuals’ personal information. Demonstrate compliance with GDPR data handling requirements and the principles of protection, fairness and transparency.
  • Minimise the personal data you require, to reduce regulatory risk, and pseudonymise all personal data through anonymisation, encryption or tokenisation.
  • In addition to standardising data handling requirements and penalties for misuse, the GDPR introduced considerations that can impact AI systems specifically. Verify that automated systems meet GDPR stipulations. Article 22 of the GDPR prohibits legal effects that result solely from automated processing being undertaken without an individual’s explicit consent, when consent is required. Several legislative terms are subject to interpretation at this time. It may be prudent to make your system advisory only, and include a human check, if you are developing a system that could materially impact an individual’s life.
  • ‘Explainability’ – explaining how the outputs of your AI system are derived – is growing in importance. Convention 108 of the Council of Europe, adopted into UK and EU law in May 2018, provides individuals with the right to obtain knowledge of the reasoning underlying data processing systems applied to them. Explainability can be challenging in relation to deep learning systems. Explore varying approaches to explainability including Inferred Explanation, Feature Extrapolation and Key Variable Analysis. Each offers trade-offs regarding difficulty, speed and explanatory power.
  • Develop a framework for ethical data use to avoid reputational and financial costs. The ALGOCARE framework, developed by the Durham Police Constabulary in partnership with academics, highlights issues you should consider when managing data. It incorporates: the nature of system output (Advisory); whether data is gathered lawfully (Lawful); whether you understand the meaning of the data you use (Granularity); who owns the intellectual property (IP) associated with the data (Ownership); whether the outcomes of your system need to be available for individuals to challenge (Challenge); how your system is tested (Accuracy); whether ethical considerations are deliberated and stated (Responsible); and whether your model has been explained accessibly to as great an extent as possible (Explainable).

“Companies that are ‘controllers’ or ‘processors’ of personal information are accountable for their handling of individuals’ personal information. Demonstrate compliance with GDPR data handling requirements and the principles of protection, fairness and transparency.”

How to select an approach to explainability
Use this approach if you: Avoid this approach if you:
Inferred Explanation
– Seek a high-level overview of your AI system
– Believe correlation offers sufficient explainability
– Require detail regarding how variables lead to decisions
Feature Extraction
– Require detail from within the network
– Have a network type (e.g. images) where abstractions can be mapped onto input data
– Have limited time
– Require precise impact of input variables, not general features
– Are not using an assignment–based or generative AI network
Key Variable Analysis
– Require detail about the importance of variables
– Seek to prevent unwanted bias in your variables
– Have limited time
– Seek to publish your results
– Wish to offer a layperson’s guide to your model

Source: MMC Ventures