The rapid advance in scientific knowledge has created such vast and complex quantities of information, or big data, that scientists and researchers have had to develop new infrastructure, tools and approaches to capture, analyse and share it.
The Inter-University Institute for Data Intensive Astronomy (IDIA) has taken a giant leap forward in this regard by becoming the first African institute to launch a cloud-based data centre. This marks the first step in a three-year pilot project to establish the African Research Cloud (ARC) for data-intensive research.
IDIA is a partnership between the University of Cape Town, North-West University (NWU), the University of Pretoria and the University of the Western Cape. It was specifically formed to deal with the big data challenges that will arise from the Square Kilometre Array (SKA).
The SKA project to build the world’s biggest radio telescope in the Karoo is the catalyst for the new radio astronomy big data revolution in Africa. The first phase of the project is already underway with the construction of the MeerKAT telescope.
There are currently two proof-of-concept (POC) projects being run at the ARC – radio astronomy (UCT) and genomics (NWU). These projects are instrumental in developing the technical expertise required to optimise the underlying cloud platform and big data infrastructure for scientific researchers who face the ever-growing big-data challenges.
The ultimate goals of the projects are to:
- Create a sustainable, technological platform with the capacity to engender collaborative research with big data in radio astronomy among partners in South Africa, as well as SKA partners in Africa and Europe.
- Provide a platform for research in other sciences challenged by big data.
- Create a research and development platform for innovation around big data research challenges.
- Train groups of researchers, from partners, in advanced radio astronomy research (on the pathway to SKA).
- Train a new generation of data scientists in analysis, as they will be key to extracting scientific knowledge from big data.
The Astronomy POC, ARCADE, showcases the flexibility and capability of the ARC for training and teaching, data processing and management, and analysis of radio astronomy data. Researchers have used the ARC to deploy a large-scale radio astronomy training exercise with undergraduate students.
Technologies such as next generation sequencing, has left researchers with molecular or biological backgrounds and limited prior exposure to computing, stranded in terms of expertise and research infrastructure for analysis of much larger and more complex datasets generated by these new technologies. Only a handful of universities in South Africa currently boast bioinformatics units with adequate infrastructure and academic training programmes who supports researchers moving into data-driven ‘omics’ projects.
The aim of the Genomics POC is to provide a turnkey desktop solution for microbial metagenomics projects through the implementation of Galaxy, an open-source web-based platform for genomics data analysis tools. The POC addresses the critical need for adequate data analysis environments for researchers at institutions where sufficient bioinformatics support is not available. In South Africa that includes most academic and research institutions. North-West University is collaborating with the South African National Bioinformatics Institute at the University of the Western Cape to develop the genomics POC.
"The initiative is a first for Africa and will be a real benefit to researchers on the continent," says Sakkie Janse van Rensburg, UCT’s executive director of Information Communication Technology Services (ICTS). "Big data has added a new dimension to the research process across all disciplines. Before the launch of this data centre, researchers struggled to manage data-heavy information, with significant challenges when it came to storing it in a way that could be quickly accessed, analysed, visualised and shared.
"Having a research cloud means that researchers and universities no longer need to have their own on-site servers," says Professor Francis Petersen, deputy vice-chancellor for institutional innovation. "It is critical for the researchers to be ahead of the technology curve and have easy access to well-managed high-powered ICT infrastructure on demand."
IDIA researchers are the first users of the new cloud-enabled data centre, they are working on the key science project with the South African MeerKAT telescope – the first step toward the SKA. "The African Big Data Research Cloud will build the capacity for South African researchers to work with the data from MeerKAT and to make the scientific breakthroughs in South Africa," says Professor Russ Taylor, Director of IDIA, Director of the ARC Project and the UCT/UWC SKA Research Chair. Prof Taylor is a leading radio astronomer and founding international scientist of the SKA project.
"The cloud gives researchers the ability to develop within their disciplines collaborative research environments, which share data, compute and tools and which are free of institutional ICT borders," says Professor Danie Visser, deputy vice-chancellor for research and internationalisation. "This will help researchers all over Africa accelerate and advance their research practice to a level that rivals institutions around the globe."
Large-scale analytical techniques
The South African "anchor" for the ARC network is a collaboration of universities, government institutions and industry. Industry partners Dell and Canonical have been instrumental in the development of the first phase of the ARC. The industrial interest doesn’t stop there – SAP will soon join IDIA.
SAP will formally sign an agreement to become an associate member of IDIA today. The partnership will allow researchers to collaborate on big-data challenges, illustrating how data intensive problems are shaping the academic and industrial world alike.
Elke Simon-Keller, SAP Africa Innovation Lead says: "SAP has a long history and proven track record of innovation and delivery in South Africa. We are delighted to become a member of IDIA, as we will be able to collaborate with different institutions and private sector partners, on leading edge technology to drive the research agenda for South Africa in astronomy, big data, and other technological fields. Findings from the project will not only help us to enhance SAP technology, but could also be related to other fields such as genomics analysis, health care, manufacturing, and mining."
"Modern research in science and the humanities is greatly enhanced by sophisticated large-scale analytical techniques," said Mark Baker, Program Manager for OpenStack at Canonical. "We are delighted to support the African Research Cloud and the emergence of a distributed regional cloud infrastructure, with capacity provided by many institutions for collaboration between experts in a wide range of fields. From the Square Kilometre Array to genomics, from physics to the study of language and history, cloud computing is set to reshape higher education and research on the continent, with the ARC leading the way."
The first African Research Cloud Workshop took place in Pretoria on 27-28 October 2016. Experts in research and development came together to discuss technical and strategic plans for the next phase of the ARC project.
Image by Stephen Williams.