Big funding for big data follows Prime Minister’s AI speech May 28, 2018 Funding of £1million has been announced to help University of Dundee scientists tackle the challenges of dealing with big data, just 24 hours after Theresa May heralded the potential of artificial intelligence to transform healthcare. Professor Geoff Barton, Head of Computational Biology in the University’s School of Life Sciences, has received the funding by the Biotechnology and Biological Sciences Research Council (BBSRC). The award will enable the development of advanced computational methods that allow researchers to organise, compare, understand and exploit the vast amounts of data produced in modern scientific experiments. The tools that Professor Barton and his colleagues at Dundee have created are already used around 500,000 times each month by scientists in over 200 countries, helping to further global scientific understanding and develop new therapies for devastating diseases. Speaking last week, the Prime Minister said that the development of smart technologies to analyse great quantities of data quickly and more accurately than humans was hugely exciting, but experts were quick to point out the many obstacles that must be overcome if AI’s potential was to be realised. It is these barriers that Professor Barton and his team look to break down. “One of the big challenges in biology today is that we are generating huge amounts of information about biological systems and diseases, but we don’t always know to what end,” said Professor Barton. “Instead of these data sets existing in silos, what we are doing is integrating that vast array of data so that researchers can make the most of it in relation to their particular area of interest. Our tools specifically make predictions about the structure and function of proteins, the machines that drive almost all biological processes in the body. This builds on the ability to sequence the genetic code of individuals and examine when proteins go wrong due to mutations that give rise to disease. “The tools developed here at Dundee are some of the best available anywhere in the world at integrating all available data and using AI techniques called machine learning to make predictions about those proteins that we are unable to measure via traditional experimentation. The techniques we have developed are very general, so they can be applied to any disease or biological question in plants, animals or people. As a consequence, it is very widely used across the world. In a way it has been funny for me to see the way big data has been discussed across the world recently. It is presented in some quarters as an end to all human suffering, and as evil in the context of Cambridge Analytica. We have been using AI and machine learning in our research for 20 years and have been very successful getting information from biological data for the good but there is no doubt that the technology underpinning big data raises serious ethical questions.” Resources developed by Professor Barton and his colleagues include JPred, which predicts protein structure by machine learning, and ProIntVar, which analyses differences in proteins at the amino acid residue level. Both feed into Jalview, one of the most widely used tools for visualising sequences. The BBSRC funding lasts for five years and will allow the team to further develop their tools based on rapidly evolving data sets and technological advancements. The next generation of resources will also integrate an enormous body of data from the 100,000 Genomes Project, which is sequencing genomes from tens of thousands of NHS patients with rare diseases or cancer, plus their families. Professor Barton is one of a number of researchers working on AI projects at Dundee. This includes members of GRE such as Professor Jason Swedlow who works on a new system for publishing scientific datasets, and Professor Angus Lamond who launched spin-out company Platinum Informatics earlier this month to provide state of the art solutions for the management, visualisation and analysis of large and complex data sets.