The AI Playbook

Chapter 5: Production


  • An unused AI system delivers no value. Develop a production process that smoothly transitions AI systems you have in development to live use.
  • AI production follows a conventional development process and requires you to undertake research, develop a prototype and create a minimum viable product (MVP). Once in production, undertake cycles of ideation, research, development and quality assurance.
  • Effective R&D requires rapid iteration. Initially, optimise for speed over quality. Releasing an early model into production for feedback is preferable to waiting until a research model is perfect.
  • During the R&D phase, solicit feedback about prototypes from beyond the AI and production teams to minimise expensive, later redevelopment.
  • When moving from MVP to production, select an appropriate hosting environment. On-premise hosting is suitable for those with highly sensitive data and existing on-premise hardware, but is rarely preferred by early stage companies given high upfront costs, unpredictable activity levels and required security expertise. Hosting your own hardware in a data centre offers control and value over the long term. Upfront costs can be high, however, and managing a data centre can prove a distraction for young companies. Cloud hosting, which offers low upfront costs and high levels of flexibility, is well suited to many early stage companies – although annual costs can be double that of a self-managed data centre and cloud hosting may be unsuitable for highly sensitive data. Consider the physical location in which your cloud servers are hosted. Different countries have varying rules regarding data and you may be required to keep your data within its area of origin.
  • Proving that AI systems are effective differs from the typical software quality assurance (QA) process. Test your AI system at multiple stages – during training, validation and continuously through its life. Efficiency is critical; automate testing to as great an extent as possible.
  • Understand the three common measures of ‘accuracy’ in AI – recall, precision and accuracy – and monitor all three to capture performance. Balancing precision and recall is challenging. Whether you elect to minimise false positives or false negatives should depend upon the nature of your sector and the problem you are solving.
  • An effective maintenance programme will sustain your AI’s intelligence. Beyond the maintenance you would typically perform on a software system, you should verify and update your AI system on an ongoing basis. AI technology is developing at pace. Invest in continual improvement to ensure your system avoids obsolescence.

Production: The Checklist

Optimise Research & Development

  • Clarify the required characteristics of your planned system
  • Limit abstract research – focus resources on solutions to business problems
  • Identify system demands and associated RAM/GPU requirements
  • Leverage existing infrastructure, if present

Prototype effectively

  • Identify relative system priorities (speed, precision, recall)
  • Create production-ready code, even for prototypes
  • Solicit feedback from people outside the AI team
  • Understand the additional development required for an MVP

Develop efficient Production and Deployment workflows

  • Establish a controlled release process
  • Plan for rapid deployment of code and models
  • Co-deploy models and related code
  • Anticipate continual cycles of improvement
  • Select an appropriate hosting environment
  • Establish increasing automation over time
  • Ensure problematic deployments can be reversed

Create a rigorous testing process

  • Report against all measures of accuracy (precision, recall, accuracy)
  • Establish definitions for ‘better’ models
  • Automate testing to as great an extent possible

Establish an effective maintenance programme

  • Validate live results frequently
  • Test edge cases
  • Establish defined downtime and update procedures

An unused AI system delivers no value. It’s essential to develop a production process that smoothly transitions the AI systems you have in development to live use. Below, we describe an optimal production pipeline – in which rapid iteration, appropriate hardware, suitable hosting, rigorous testing and ongoing maintenance deliver high quality results.

AI production follows a conventional development process

By the time you are considering how to take your solution live you should:

  • Understand how you are going to solve the business problem
  • Possess the required data
  • Know the development languages and frameworks you will use
  • Appreciate whether hardware with Graphical Processing Units (GPUs) will be required
  • Understand whether your live system will be powered by local hardware or the cloud.

The aforementioned will enable you to determine an optimal process to move from development to a live solution – the production environment.

Progressing an AI system from idea to reality should follow a broadly conventional development practise – although timescales may be less certain. After ideation, undertake research, prototype and then develop a minimum viable product (MVP). Once in production undertake cycles of ideation, research, development and quality assurance.

Fig. 37. The AI production pipeline is similar to a normal development practice

Source: MMC Ventures

For effective R&D, iterate rapidly and use appropriate hardware

Whether you have an in-house team or are outsourcing, ensure the team understands the characteristics required from the end system. Which considerations are flexible? Which are not? Different decisions will be made if speed is more important than accuracy – and vice versa.

Even in the research phase, ensure that development is undertaken in the language used for deployment. If you plan to deploy in Python, for example, avoid the overhead of rewriting models created in MatLab or R.

Initially, optimise for speed over quality. It’s better to release an early version of a model from the research environment into production, and then to solicit feedback in the live environment, than to wait until the research model is perfect. “Spend a month to get a weak model and then iterate to make it great” (Eddie Bell, Director of Machine Learning, Ravelin). Isolating models within the research environment will push considerations of deployment, usability, performance and scalability to the end of the project instead of addressing them early. In addition, it increases the risk of a model performing poorly with unexpected real-world data. Many data scientists resist releasing models that are not ‘good enough’. Overcome this hurdle by developing a culture in which the dynamics of AI development are understood and people are not blamed for early, poor quality results.

Effective research & development requires appropriate hardware – see page 55 for guidance.

Ensure your AI team has its code in a source control system – Git, Mercurial and Subversion are popular – and update it regularly. The size of trained models can exceed file size limits on these systems. If file size is a constraint, find an alternative way of versioning and storing your files. A simple solution (for example, creating zip files on a shared drive) can be effective but ensure these files are regularly backed up to prevent accidental deletion or changes breaking your AI models.

Your research team may find that it is creating many similar models – for comparable problems or multiple clients. Automate repetitive tasks to as great an extent as possible, with your Research team validating the results and using their specialised skills to adjust the network architectures.

Develop prototypes and solicit feedback beyond the AI team

During the research and development phase, your non-AI development and production teams should take the AI models you have in development and insert them into environments in which the models may be used.

These early prototypes will be incomplete and unfriendly for users, but will show the capacity for AI to solve the problem. Before your system can become a minimum viable product (MVP), prototypes will highlight related development work required – including website changes, the creation of database connections, mobile application modifications or development of application programming interfaces (APIs). Prototyping will engage stakeholders, allow other applications to call the model, enable initial scrutiny of results, and serve as a starting point for improvement.

During the prototype phase it is critical to solicit feedback from people outside the AI and production teams. Begin with internal stakeholders and, with each improvement, move closer to feedback from end users. Are your models:

  • adequately performing their intended function?
  • as fast as they need to be?
  • scaling with usage as required?

Answering these questions early will avoid expensive redevelopment later. As with developing non-AI systems, frequent and iterative changes offer flexibility to address difficulties as they emerge.

Before your team completes the research and development iterations that feed your prototypes, finalise plans for a release process and for deploying code to its final environment. The number of stages in this process, and its complexity, will depend on factors including: the importance of controlling the code (processes for code review, testing, code merging, build, and versioning); the implications of system downtime; and the level of automation you require.

Considerations are company-specific – but evaluate:

  • During development, will your code and models be tested with every update or only when there is a viable release candidate?
  • Will testing be automated? How frequently will tests be updated?
  • Does a successful test trigger a live deployment? What manual steps are required for a system to be made live?
  • Will you deploy code directly or create containers? How will large AI model files be deployed? Will system downtime be required to make new versions live?

If you have existing development practises, adopt these to the extent possible to ensure that AI is not considered separately from the rest of your team’s development efforts.

Whileautomating release based on certain metrics may be straightforward, understanding whether a new AI system is an improvement overall may be difficult. A new version of your AI system may offer improved accuracy at the expense of speed, or vice versa. Whether you are automating deployment or verifying it manually, prioritise what is important to your use case.

“Spend a month to get a weak model and then iterate to make it great.”

Eddie BellRavelin

When moving from MVP to Production, select an appropriate hosting environment

With an initial model, supporting code and an established deployment process you should have a minimum viable product (MVP) ready for release to your production (live) environment. The MVP is distinct from your prototypes – while imperfect, it will contain all the elements required to solve the problem including peripheral components (web interfaces, APIs, storage and versioning control).

Having a model in production does not mean it needs to be publicly visible or impact live results. It should, however, be exposed to live data so your team can make refinements until it meets the requirements for a major release. With live data you can undertake longer-running tests and provide your data science team with feedback on what is working well and what is not. At this stage, prioritise establishing a controlled release process with thorough code testing, and the stability of your solution. You should also monitor the performance and scalability of your system.

Plan continual cycles of improvement – investigate and implement ideas for iterating the model, changing the interface and responding to feedback. New models must be demonstrably superior to old ones. Test all changes before updates are released to the production environment, allocating work between the AI team and the general develop- ment team. These cycles will continue for the life of the system.

If you’ve yet to decide where your system will run – on premise, in a data centre or in the cloud – at this point you will have the information you need to select an environment, and hardware, that are suitable for your needs.

On-premise: If your data is highly sensitive and cannot leave your network, or you wish to keep data and inferencing entirely under your control, you may wish to host your AI systems within your own premises. Usually, this is possible only for companies that have their own internal hardware infrastructure already. This can be a highly cost-effective option if the volume of requests to manage is known and relatively stable. However, all new hardware must be ordered and provisioned, which will limit scalability. Further, security will be entirely your responsibility. As such, on-premise deployment is a less preferred option for early stage companies that will lack these specialised skills.

Fig. 38. When to use on-premise deployment
Use on-premise if you:
Need to fix your costs
Have existing on-premise hardware
Are working on highly sensitive data
Avoid on-premise if you:
Do not have robust in-house security expertise
Cannot guarantee volumes of requests
Need your models to be accessed from outside your network

Source: MMC Ventures

Data centre: If you can afford the capital expense of buying servers, and have limited need to scale rapidly, hosting your own hardware in a data centre – either your own or a third party – can be an attractive option. The cost, even over a year, can be far lower than using a cloud service and you will maintain control over your system’s performance. Using a data centre can also be sensible when you already have large volumes of data on your own servers and wish to avoid the cost of uploading the data to a cloud service.

The capital expense of establishing and managing a data centre can, however, be high – although for early stage companies there are programmes, such as NVIDIA Inception, which offer discounted hardware. As with the on-premise option, only consider a data centre approach if your company already has its own data centre for other servers, as well as staff with skills to install and configure your hardware. In addition to the upfront cost, the distraction of managing a data centre may prove an inappropriate and unwelcome distraction for your early stage company focused on its core initiatives.

Cloud: For good reason, many early stage companies choose cloud hosting from the outset. Amazon AWS, Google Cloud, Microsoft Azure and Rackspace are popular cloud providers. The majority of cloud providers offer specialised options so you can begin quickly and need set up little more than a security layer and links to other systems in their cloud. Further, time-based costings allow rapid upscaling and downscaling of resources as required. For companies without dedicated system administrators, cloud may be an easy choice. You will, however, pay a premium for the service. A year of cloud hosting can cost twice as much as hosting in a data centre. You will also pay to transfer data in and out. Nonetheless, costs are payable monthly, rather than as a single large capital expenditure, and you will also avoid the cost of staff to manage the hardware.

Fig. 39. When to use a data centre for deployment
Use this approach if you:
Wish to fix your costs
Have existing data centre hardware
Seek control over your data
Avoid this approach if you:
Require flexibility in your resourcing
Wish to avoid high up-front capital costs

Source: MMC Ventures

Unless there is a compelling reason to do so – cost, location or you are replacing a supplier – it is usually desirable to use the same cloud provider for your AI hosting that you use for your other infrastructure. This will limit data transfer costs and provide a single infrastructure to manage for security and resilience.

Although cloud systems offer extensive physical security, be aware that you are placing your software and data on the internet. If you do not secure the cloud servers that you establish, you will be at risk. Your cloud provider should ensure that: access to their data centre is secure; their data centre has multiple power sources and internet connections; and there is resilience in all supporting infrastructure such that the provider can resist any direct security challenge either in person or via attempted hacks into their network. They should also ensure that the data images they provide to you are secured from the rest of their infrastructure and other customers’.

It is your responsibility, however, to ensure that your systems on their infrastructure are secure. Direct access to your account should only be via multi-factor authentication – not a simple username and password. Data stored should be private and any external data access or calls to your AI must be established using best practices for authentication. There are many malicious individuals who scan the IP addresses registered to cloud providers, looking for unsecured systems they can exploit. Finally, consider the physical location in which your cloud servers are hosted. Different countries have varying rules regarding data and hardware. You may need to keep your data within its area of origin. Be aware of local laws that could allow the cloud servers to be restricted. US law for example, allows hardware from a cloud provider to be seized if authorities suspect its use for criminal activity. If you are unlucky enough to have data on the same physical system, you could lose access to your systems without notice. This risk can readily be mitigated with appropriate monitoring of your remote systems and images of your servers that you can start in other zones if required. Finally, different regions may have varying performance at different times of day – a dynamic you can use to your advantage.

Fig. 40. When to use cloud deployment
Use this approach if you:
Need flexibility in resource
Have existing systems and data in the cloud
Have limited capital to get started
Avoid this approach if you:
Already have systems and personnel established in a data centre
Use highly sensitive data

Source: MMC Ventures

“For good reason, many early stage companies choose cloud hosting from the outset.”

Test for precision, recall and accuracy at multiple stages

Proving that new AI releases are effective, and an improvement on prior versions, differs from the typical software quality assurance (QA) process. Test your AI system at multiple stages:

  • During training: While your model is being trained, constantly test it against a subset of training data to validate its accuracy. The results will not represent the performance of the model fully, because the randomised test data will have influenced the model. As a result, this testing will overstate the model’s accuracy.
  • During validation: Set aside a part of your training data for validation. This test set – known as the validation set – is never used for training. Accordingly, the predictions your AI system makes from the validation set will better represent the predictions it makes in the real world. Validation accuracy is usually lower than training accuracy. If your data set does not represent real world data well, however, validation accuracy will still over-report the accuracy of your model.
  • Continuously: Once your model has been created, test it against live data for a more appropriate measure of accuracy.

”Accuracy” has a specific meaning in AI – but, confusingly, is also used as a general term to cover several measures. There are three commonly-used measures of accuracy in AI: recall, precision and accuracy. Understand these measures to decide which are important for your systems so you can validate them appropriately.

Consider an AI that determines whether an apple is ‘good’ or ‘bad’ based on a picture of the apple. There are four possible outcomes:

1. True positive: The apple is good – and the AI predicts ‘good’.
2. True negative: The apple is bad – and the AI predicts ‘bad’.
3. False positive: The apple is bad – but the AI predicts ‘good’.
4. False negative: The apple is good – but the AI predicts ‘bad’.

Using the example above:

  • Recall: What proportion of the good apples did I find correctly?
    The number of correctly identified good apples divided by the total number of good apples (whether correctly identified or not).
  • Precision: What proportion of the apples I said are good, did I get right?
    The number of correctly identified good apples divided by the total number of apples labelled as good (whether correctly identified or not).
  • Accuracy: What proportion of the apples did I label correctly?
    The number of apples correctly identified as good or bad, divided by the total number of apples.

Avoid the temptation to use a single measure that flatters results. You will obtain a truer picture by using all three measures.

Balancing precision and recall can be difficult. As you tune your system for higher recall – fewer false negatives – you will increase false positives, and vice versa. Whether you elect to minimise false negatives or false positives will depend on the problem you are solving and your domain. If developing a marketing solution, you may wish to minimise false positives. To avoid the embarrassment of showing an incorrect logo, missing some marketing opportunities may be acceptable. If developing medical diagnostics, on the other hand, you may wish to minimise false negatives to avoid missing a diagnosis.

Automate testing to as great an extent as possible. Every new model should be tested automatically. “Efficiency is critical. If you have to do something more than once, automate it.” (Dr. Janet Bastiman, Chief Science Officer, Storystream). If all measures of accuracy are higher, the decision to deploy the new model will be straightforward. If measures of accuracy decrease, you may need to verify the new model manually. A decrease in one measure of accuracy may not be problematic – you might have re-tuned your model for precision or recall, or decided to change the entire model to improve performance. If your models produce results that are concerning, speak to your AI team to discuss why. It may be that your training data set does not contain enough appropriate data. If you encounter problems, add examples of these types of data to your test set so you can monitor improvements.

An effective maintenance programme will sustain your AI’s intelligence

A deployed AI solution reflects a point in time; available data, business requirements, market feedback and available techniques will change. Beyond the typical maintenance you would perform on any software system, you need to verify and update your AI system on an ongoing basis. Once your solution is live, ensure it continues to perform well by:

  • Continuously sampling its result and verifying that the outcome from your model is as you expect from live data.
  • Adding problematic data to your test set to ensure your team has addressed issues.
  • Exploring whether new, third-party APIs are available which outperform your model, which will enable you to focus your AI team’s efforts onto harder problems.
  • Ensuring your system issues alerts if incoming data fails, so problems can be addressed.

AI technology is developing at pace. Further, the varieties and volume of available training data continue to evolve. Invest in continual improvement to ensure the system you develop today avoids obsolescence.

“Available data, business requirements and techniques will change over time. Invest in continual improvement to avoid obsolescence.”