Data Science platform

The Strategic Science Investment Fund (SSIF) Data Science platform intends to significantly lift New Zealand’s capability, support and encourage dynamic and world class data science research, and deliver on the Government’s data science investment goals.

The Government’s data science investment goals are to:

  • make sure New Zealand has sufficient advanced data science capability to develop useful and transformative data science techniques
  • create benefits for New Zealand.

The following programmes are currently funded under the SSIF Data science platform:

A data-science driven evolution of aquaculture for building the blue economy

The Research Trust of Victoria University of Wellington is receiving $13 million over 7 years to deliver the Data Science programme 'A data-science driven evolution of aquaculture for building the blue economy'. 

The following is the public statement for this programme from our contract with the Research Trust of Victoria University of Wellington.

Aquaculture is New Zealand’s best opportunity to sustainably grow its Blue Economy, yet the industry is facing significant challenges in achieving its $1 billon revenue target by 2025. We now have more mussel farms, but the total annual yield hasn’t increased. Could it be due to climate change? The industry depends on natural sources of spat (mussel larvae) found in springtime off Ninety Mile Beach. Little is known about where it comes from – or how best to get spat to adhere to the growing ropes. Mussel farming relies on experience, but conditions change from farm to farm and season to season.

The aquaculture industry thinks that data science could be the answer. By bringing marine scientists together with specialists in machine learning, modelling, and data visualization, we will develop new science to support decision-making, so farm managers can respond to climate challenges, manage disease, improve production yields, and farm sustainably at scale. Applying data science to marine farming makes sense. 

In this programme, we will develop innovative data science techniques that will enable the aquaculture industry to produce efficiently and at large scale, producing high-quality, low-carbon protein for New Zealand and the world without compromising the environment. To do this, we will build data science knowledge in the industry.

Māori own large aquaculture assets. With our partners Whakatōhea and the Wakatū Incorporation, we will co-design a programme to educate young Māori in data science and create the next generation of industry leaders. All our students, from undergraduate to PhD, will work with industry within the research programme and through internships and summer projects.

Aquaculture data poses immense challenges for the researchers, so our programme is led by our best data scientists, guided by distinguished international researchers. Our students will learn from the world’s best.

A language platform for a multilingual Aotearoa

Te Hiku Media will receive $13 million (GST exclusive) over 7 years for the data science programme 'A language platform for a multilingual Aotearoa'. 

This is the public statement from our contract with Te Hiku Media.

Led by Te Hiku Media in partnership with Dragonfly Data Science, our research will lead the revitalisation of minority and indigenous languages and the indigenisation of digital devices worldwide. Over the next seven years, our research programme will bring together world-leading data scientists from New Zealand, Cambridge and Oxford Universities, Māori communities and Mozilla in a unique collaboration to tackle this challenge.

Our proposal aims to establish a multilingual language platform to develop natural language processing tools and methods that will enable New Zealanders to engage with technology in the language they use or aspire to use every day. Starting with te reo Māori and New Zealand English, our program will ensure a New Zealand identity is firmly embedded in the digital world. We will also extend into the Pacific and work with Samoan and Hawaiian communities. Our tools will make it possible to switch between these languages, so people can speak into their devices to “find a choice as kai of panipopo”. Most importantly, however, we will secure the future for these languages in a changing, dynamic, digital world.

Currently, minority languages do not have the datasets big enough for existing methods to work, so these languages and their communities are largely invisible and unheard in these contexts. Their existence is under threat because as digital technologies further permeate our day to day life, the ability to engage and transmit the language intergenerationally becomes more and more difficult. Our research programme will make it possible for ‘low resource’ languages and their speakers to be able to fully participate in a digital context by creating cutting edge technology.

Read more about "A language platform for a multilingual Aotearoa"(external link) — Te Hiku Media

Time-Evolving Data Science/Artificial Intelligence for Advanced Open Environmental Science

University of Waikato is receiving $13 million over 7 years to deliver the Data Science programme 'Time-Evolving Data Science/Artificial Intelligence for Advanced Open Environmental Science'. 

The following is the public statement for this programme from our contract with University of Waikato.

Data are essential to research, understand, set policy for and manage New Zealand’s environment, but environmental data presents many challenges that require new data science methods to overcome them, and a substantial increase in the capability of environmental researchers, governors and managers to use data science in their work. This programme will develop those new methods and build the required capability.

In particular, we will focus on developing methods to deal with environmental datasets that are collected in large volumes over time, and must therefore be dealt with as streams that are analysed incrementally, as they are measured, rather than as collections of data that can be analysed all at once. These methods will address underlying characteristics of the data that evolve over time (e.g. due to climatic or ecological changes), and data that are collected at a range of time intervals and spatial scales ranging from broadscale satellite images to singlepoint measurements on the ground, in the water or air. The methods we develop will be interpretable and explainable (to help users understand why an algorithm produces some particular output), identify and understand anomalies (to distinguish 'normal' from 'unusual' measurements) and quantify uncertainty in algorithm output (to help decision-makers understand how confident they can be in conclusions drawn from the data science methods).

To deliver the methods we develop in a form that environmental scientists and managers can use, we will build a new open source framework to do machine learning on time series data, and provide an open access repository of environmental datasets to improve reproducibility in environmental data science. Through workshops, undergraduate and postgraduate research projects within the programme, we will build New Zealand’s capability in fundamental and applied data science relevant to environmental data, from introductory to postdoctoral level.

Read more about Machine Learning for Streams(external link) — Time-Evolving Data Science and Artificial Intelligence for Advanced Open Environmental Science (TAIAO)

Beyond Prediction: explanatory and transparent data science

University of Auckland will receive $10 million (GST exclusive) over 7 years for the data science programme 'Beyond Prediction: explanatory and transparent data science'.

This is the public statement from our contract with University of Auckland.

Data science facilitates new approaches to longstanding problems in healthcare, policy, ecology and economy.  But to make the most effective use of it, we need analytical methods that are straightforward to apply, open to review and audit, and produce results that can be correctly interpreted by practicing researchers and policymakers. Furthermore, we need methods that discover, gather and integrate potentially useful data with minimal human intervention, and ensure that the most suitable analysis methods are used with such data.  Finally we need to empower a whole generation of researchers, across all fields, to use these new methods in robust and defensible ways.

Our team comprises researchers from the Universities of Auckland, Otago, Canterbury and Massey. In it, computer scientists and statisticians will work alongside domain scientists in fields such as computational biology, ecology and public health. Over the next 7 years, we will improve the application of data science methods in complex research settings, make processing more efficient, and create transparent and computationally-reproducible workflows that are published, open and easily reused. We will commit the majority of our budget to training and equipping the doctoral and post-doctoral researchers who will go on to successfully apply data science methods to making improvements to our environment, economy and society.

Last updated: 05 October 2023