Centre for eResearch machine learning service
Sina Masoud-Ansari, Research IT Specialist, Centre for eResearch
Figure 1. Machine Learning as a Service (MLaaS) prototype website. Currently creates a Jupyter Notebook environment on request with all the tools required for popular deep learning workflows.
The initiative
There has been significant growth in interest around machine learning and its applications to research. This has largely been driven by advances in algorithms, particularly in neural networks where the use of Graphics Processing Units (GPUs) has reduced the time to train complex or ‘deep’ neural networks by orders of magnitude. With the rise of ‘big data’, researchers are increasingly finding opportunities to apply these methods to deliver new insights.
Machine learning is often useful when explicit formal models are difficult to develop, for example, a model to identify human faces. Whereas in the past, researchers would ‘tell’ the computer what features to look for, machine learning takes the opposite approach and attempts to ‘learn’ appropriate features based on successive attempts at guessing the correct answer and adjusting to reduce mistakes.
Examples of research in this area that the Centre of eResearch is supporting include using machine learning to improve modelling of financial markets, developing intelligent question/answer systems for understanding text and developing predictors of heart disease based on the shape of coronary arteries. In order to support these and future projects, CeR has initiated a Machine Learning Service to cater to a wide range of needs and experience levels at the University.
Powerful graphics cards are crucial to this type of research and can be difficult to get hold of in sufficient quantities. In addition, the software stack required to run these workflows can be barrier for researchers who are curious about using these tools but do not have the expertise or time to set up the required environment.
Self-service booking portal
To make it easier for researchers to experiment with machine learning and ‘deep learning’ in particular, we are developing a self service booking portal where researchers can request time on computers with powerful hardware and have a fully capable Jupyter Notebook environment available to them with all the required tools (Figure 1). Our self-service notebooks are hosted in the University’s Docker containers which allow each researcher to have an isolated computing environment on the shared system. This system was successfully trialled at the Winter Boot Camp in July where we ran an introduction to machine learning workshop with 40 participants working concurrently on the system (Figure 2). The benefit of using Docker containers in research is also in their reproducibility, where each instance has the same container operating system and libraries. This is helpful when researchers need to publish or replicate their workflows.
What more can CeR help with
For more experienced researchers or for those who would like require more control over their computing environment, we offer remote access to a shared Linux server with powerful GPUs and fast SSD based storage for demanding workflows. Researchers can customise this environment to suit their needs, and the Centre provides additional support in installing libraries and additional software required for machine learning. For researchers looking to run large scale projects using multiple GPUs, we are planning to offer access to the Nectar research cloud where we have a large memory server with four GPUs. Our plan is to allow researchers to request virtual machines on the Nectar system where they can select the number of GPUs they would like to use.
Figure 2. An example of a Jupyter Notebook running via the MLaaS prototype. The Notebook covers an introduction to neural networks for image recognition used as part of the Winter Bootcamp machine learning workshop in July this year.
For most demanding workflows, we offer training and advice for researchers to help them make use of NeSI Pan cluster and its array of 20 GPUs ready machines. The Centre also has the capacity to support researchers development of machine learning approaches to solving research problems. For example, working with Dr. Susann Beier from the Faculty of Medical and Health Sciences, the Centre has started on a project exploring the applications of machine learning to improve segmentation of coronary arteries from medical images and exploring the potential of assessing coronary disease risk from the resultant 3D structures.