Breadcrumbs
Home ›
Science and technology
›
Science and innovation
›
Funding information and opportunities
›
Investment funds
›
Strategic Science Investment Fund
›
Funded programmes
...
›
Data Science platform
-
Strategic Science Investment Fund
-
Funded programmes
- Advanced Energy Technology platform
- Antarctic Science platform
- Crown Research Institute platforms
- Data Science platform
- Independent Research Organisation platforms
- Infectious Disease research platform
- New Zealand Agricultural Green House Gas Research Centre
- Ngā rākau taketake – combatting kauri dieback and myrtle rust
- Ribonucleic Acid (RNA) Development platform
-
Funded infrastructure
- Australian Synchrotron
- Enhanced Geohazards Monitoring
- Genomics Aotearoa
- Mission Operations Control Centre
- National eScience Infrastructure
- Nationally Significant Collections and Databases
- Research and Education Advanced Network New Zealand
- Research Vessel Tangaroa
- Review of scientific collections and databases
-
Funded programmes
Data Science platform
The Strategic Science Investment Fund (SSIF) Data Science platform, intends to significantly lift New Zealand’s capability, support and encourage dynamic and world class data science research, and deliver on the Government’s data science investment goals.
On this page
The Government’s data science investment goals are to:
- make sure New Zealand has sufficient advanced data science capability to develop useful and transformative data science techniques
- create benefits for New Zealand.
MBIE funding
The Government is committed to investing $49 million between 2020 and 2027 in four Strategic Science Investment Fund (SSIF) Data Science research programmes.
Research programmes funded
The research programmes being funded by the SSIF platform for Data Science are:
Read the public statement
Aquaculture is New Zealand’s best opportunity to sustainably grow its Blue Economy, yet the industry is facing significant challenges in achieving its $1 billon revenue target by 2025. We now have more mussel farms, but the total annual yield hasn’t increased. Could it be due to climate change? The industry depends on natural sources of spat (mussel larvae) found in springtime off Ninety Mile Beach. Little is known about where it comes from – or how best to get spat to adhere to the growing ropes. Mussel farming relies on experience, but conditions change from farm to farm and season to season.
The aquaculture industry thinks that data science could be the answer. By bringing marine scientists together with specialists in machine learning, modelling, and data visualization, we will develop new science to support decision-making, so farm managers can respond to climate challenges, manage disease, improve production yields, and farm sustainably at scale. Applying data science to marine farming makes sense.
In this programme, we will develop innovative data science techniques that will enable the aquaculture industry to produce efficiently and at large scale, producing high-quality, low-carbon protein for New Zealand and the world without compromising the environment. To do this, we will build data science knowledge in the industry.
Māori own large aquaculture assets. With our partners Whakatōhea and the Wakatū Incorporation, we will co-design a programme to educate young Māori in data science and create the next generation of industry leaders. All our students, from undergraduate to PhD, will work with industry within the research programme and through internships and summer projects.
Aquaculture data poses immense challenges for the researchers, so our programme is led by our best data scientists, guided by distinguished international researchers. Our students will learn from the world’s best.
Read the public update from the 2022/23 annual report
Over the third year, we have achieved significant progress in the four case study themes on shellfish production, finfish breeding, fish health prediction, and Vision Mātauranga, as well as the fundamental research in data science and AI.
In terms of research, each case study has been progressing well based on the collaborations between experts from both the aquaculture and data science sides. New aquaculture related data and applications are also explored and added to the case studies, greatly enhancing the case studies. The team has a good number of high-quality publications in international journals and conferences. Our team members have also been invited to present our work in different international conferences, workshops and summer schools via plenary and keynote talks, specialised tutorials, best paper awards and competitions.
We have successfully built a pipeline with Māori scholarships/internships, including Undergraduate Scholarships/internships, Summer scholarships, Graduate Awards, Master by Research Scholarships and PhD Scholarships, to attract, support and train our young Māori students and researchers in data science and application to aquaculture. We have successfully offered Māori Undergraduate Scholarships to three Māori students, and expect to offer more soon.
To improve our impact, we have delivered different kinds of talks/speeches, organised different special sessions/issues and workshops, and published newsletters to promote our programme and achievements. We have also invited world-leading researchers to visit us, and invited them to deliver distinguished talks and panel discussions. Our team members also play leading and important roles for data science and aquaculture disciplines, such as associate editors in top journals, and membership of advisory boards for international conferences. We have been actively engaging with industry partners, and attending NZ Aquaculture and Seafood industry conferences/meetings/forums. Finally, our Te Whiri Kawe—Centre for Data Science and AI (https://www.wgtn.ac.nz/cdsai(external link)) was successfully launched by Minister Dr Ayesha Verrall in June 2023.
Launch of Te Whiri Kawe—Centre for Data Science and Artificial Intelligence(external link) — Victoria University of Wellington
Read the public update from the 2021/22 annual report
The goal of this programme is to help the decision marking in the aquaculture industry by bringing marine and aquaculture scientists together with specialists in machine learning, modelling, and data visualisation to develop new science. In this programme, we aim to develop innovative data science techniques that will enable the aquaculture industry to produce efficiently and at a large scale, producing high-quality, low-carbon protein for NZ and the world without compromising the environment.
To do this, we will build data science knowledge in the industry. Māori own large aquaculture assets. With our partners Whakatōhea and the Wakatū Incorporation, we will co-design a programme to educate young Māori in data science and create the next generation of industry leaders. All our students, from undergraduate to PhD, will work with industry within the research programme and through undergraduate internships, postgraduate scholarships and summer projects.
Over the past year, we have made significant achievements in different aspects. We have established four application case study themes which focus on shellfish production, finfish breeding, fish performance and health prediction, and Vision Mātauranga, respectively. Each theme is run by experts from both industry and academia. We have published 59 papers including fundamental research in data science and applications to aquaculture. To improve our impact, we have delivered 19 talks/presentations, organised 21 special sessions/issues and workshops, and published 1 newsletter to make our achievements more visible. We have also started building a pipeline which provides scholarships and internships to help young Māori in data science. In addition, we have been collaborating with top international researchers to carry out research, and engaged with world-leading researchers by inviting talks and discussions. More details can be found on our programme website:
Progress Report 2022 - Groups/DataScienceForAquaculture | ECS(external link) — Victoria University of Wellington
More information:
Data science for aquaculture(external link) — Victoria University of Wellington
Read the public statement
Led by Te Hiku Media, our research will lead the revitalisation of minority and indigenous languages and the indigenisation of digital devices worldwide. Over the next seven years, our research programme will bring together world-leading data scientists from New Zealand, Cambridge and Oxford Universities, Māori communities and Mozilla in a unique collaboration to tackle this challenge.
Our proposal aims to establish a multilingual language platform to develop natural language processing tools and methods that will enable New Zealanders to engage with technology in the language they use or aspire to use every day. Starting with te reo Māori and New Zealand English, our program will ensure a New Zealand identity is firmly embedded in the digital world. We will also extend into the Pacific and work with Samoan and Hawaiian communities. Our tools will make it possible to switch between these languages, so people can speak into their devices to “find a choice as kai of panipopo”. Most importantly, however, we will secure the future for these languages in a changing, dynamic, digital world.
Currently, minority languages do not have the datasets big enough for existing methods to work, so these languages and their communities are largely invisible and unheard in these contexts. Their existence is under threat because as digital technologies further permeate our day to day life, the ability to engage and transmit the language intergenerationally becomes more and more difficult. Our research programme will make it possible for ‘low resource’ languages and their speakers to be able to fully participate in a digital context by creating cutting edge technology.
Read the public update from the 2022/23 annual report
"Kua tawhiti kē tō haerenga mai kia kore e haere tonu. He nui rawa ō mahi kia kore e mahi tonu. You have come too far not to go further; you have done too much not to do more." (Tā Hemi Henare).
Having successfully built a strong research team and a set of foundation tools for te reo Māori, Te Hiku Media and the Papa Reo project focussed on maximising the impact of the tools and their use and relevance to Aotearoa. This meant tackling the bilingual research problem, ensuring the speech recognition tools will accurately transcribe te reo Māori and NZ English. The new bilingual model was released in early 2023 and when benchmarked against models released by the likes of Meta and OpenAI, is at least 50% more accurate for both languages and performs better across subsets such as contemporary speakers of NZ English speakers and archival native speakers of te reo Māori.
As Big Tech launched Large Language Models like ChatGPT into the market with abandon and little care for the consequences, Te Hiku Media have advanced its advocacy for ethical approaches to data collection, training of machine learning models and indigenous data sovereignty. This has seen global recognition with members of the team invited to participate in discussions hosted by Stanford Institute for Human-Centered Artificial Intelligence, the Computing Research Association of America and become members of the Partnership in AI Task Force for Inclusive AI. On the home front, Te Hiku Media continued to make impact in the Māori language community by getting tools such as Rongo and Kaituhi into the hands of users, further claiming technological landscape for te reo Māori.
Read the public update from the 2021/22 annual report
Iti pioke nō Rangaunu, he au tōna. Small as the dog shark of Rangaunu may be, great is its wake.”
The second year of Papa Reo has been about building the momentum of the project. Despite being a small team, the impact of Papa Reo has been felt extensively across the language technology and the indigenous data sovereignty spaces. Papa Reo worked closely with University-based research teams providing access to tools and data. We also supported undergraduate internships and postgraduate research projects.
Papa Reo also invested in growing the capacity and capability of our team and in our tools. Kaitāia now houses one of the fastest AI servers in the country. Ōrongonui, named for the moon phase when it was turned on, means the team has been able to experiment and innovate without constraint. Our new team of data scientists, developers and engineers can now experiment with novel approaches quickly, creating and improving our tools.
With this state-of-the-art infrastructure, Papa Reo continues to develop a multilingual language platform. Papa Reo deployed a new Māori text-to-speech model. We launched the Rongo app on Apple using a model that provides real-time feedback to improve pronunciation. The first Māori speech-to-text model developed by Te Hiku Media continues to be refined with a reduced word error rate while we prepare to train a new model using novel approaches. With these models in place, over the coming years, we will reduce the work required for other under-resourced languages.
We remain an active voice in the recognition of indigenous data sovereignty, contributing to global conversations, for example at APEC, and on the ground in national forums. The growing awareness of Te Hiku Media’s position on data sovereignty continues to create opportunities to educate and drive our decisions when choosing tools and partners.
Read more:
A language platform for a multilingual Aotearoa(external link) — Te Hiku Media
Read the public statement
Data are essential to research, understand, set policy for and manage New Zealand’s environment, but environmental data presents many challenges that require new data science methods to overcome them, and a substantial increase in the capability of environmental researchers, governors and managers to use data science in their work. This programme will develop those new methods and build the required capability.
In particular, we will focus on developing methods to deal with environmental datasets that are collected in large volumes over time and must therefore be dealt with as streams that are analysed incrementally, as they are measured, rather than as collections of data that can be analysed all at once. These methods will address underlying characteristics of the data that evolve over time (e.g. due to climatic or ecological changes), and data that are collected at a range of time intervals and spatial scales ranging from broadscale satellite images to singlepoint measurements on the ground, in the water or air. The methods we develop will be interpretable and explainable (to help users understand why an algorithm produces some particular output), identify and understand anomalies (to distinguish 'normal' from 'unusual' measurements) and quantify uncertainty in algorithm output (to help decision-makers understand how confident they can be in conclusions drawn from the data science methods).
To deliver the methods we develop in a form that environmental scientists and managers can use, we will build a new open source framework to do machine learning on time series data, and provide an open access repository of environmental datasets to improve reproducibility in environmental data science. Through workshops, undergraduate and postgraduate research projects within the programme, we will build New Zealand’s capability in fundamental and applied data science relevant to environmental data, from introductory to postdoctoral level.
Read the public update from the 2022/23 annual report
The vision of TAIAO continues to be to enable the next level of data science to provide robust and fit-for-purpose tools and methods that are accessible and useful to researchers and practitioners across all areas of the New Zealand environment. In the third year of our project, we've achieved significant milestones, including the continued expansion of the community formed during the initial two years. Additionally, our team has gained deeper insights into the needs of potential end-users within Aotearoa New Zealand.
The TAIAO community platform (taiao.ai) received a new look and feel that was launched in November 2022. With this fresh launch, we introduced “Categories”, which facilitate filtering of Datasets, Notebooks, Software, and Tutorials. Aligned with the essence of TAIAO, the platform maintains its status as both data sovereign and open source. These foundational principles are instrumental in upholding our dedication to Vision Mātauranga and in cultivating not only the data and environmental science communities but also the software development community.
In addition to the two existing case studies, we have undertaken the development of a new case study that involves a collaboration with the Department of Conservation, with the goal of creating a flexible and extensible annotation platform. This platform aims to tackle the intricate challenges and intricacies associated with annotating under-sea habitats.
The TAIAO Machine Learning Course for Flood Practitioners took place in June 2023 at the University of Waikato, drawing attendees from the Waikato Regional Council, including flood practitioners, hydrologists, and environmental scientists. This event has generated interest from various other regional councils for potential future editions of the course. Efforts have been initiated to encourage greater engagement on the TAIAO community platform. In line with this, we are planning to establish a new dedicated category within the platform that aligns with this objective.
Read the public update from the 2021/22 annual report
The vision of TAIAO continues to be to enable the next level of data science to provide robust and fit-for-purpose tools and methods that are accessible and useful to researchers and practitioners across all areas of the New Zealand environment. Achievements for year two are continued strong growth of the community established during the first year, and increased team understanding of the needs of potential end-users in Aotearoa New Zealand.
The TAIAO community platform (taiao.ai) has continued its iterative improvement, culminating in a public launch in November 2021. Feedback from the environmental data science community after the launch was strongly supportive of our intent to develop the TAIAO community through the sharing of datasets, notebooks, kōrero and resources. True to the intent of TAIAO, the platform will remain a data sovereign and open source. These are two fundamentals which drive both the commitment to vision matauranga and the ability to build both the data and environmental science communities alongside the software development community.
We have been working on two case studies for the TAIAO data platform based on a data mesh architecture. For the first case study, in collaboration with the Waikato Regional Council and MetService, we created an operational live archive and API for MetService's rain radar data. The archive has been populated with all the historical surveillance radar scans from the Auckland and the Bay of Plenty radars and is progressively being backfilled with data from the other New Zealand rain radars.
The second case study is ongoing work with SCION, which takes advantage of data from the ForestFlows system, which monitors forests in real-time. This work will help to increase the information flow to interested entities regarding the forests of New Zealand.
Read more:
Machine Learning for Streams(external link) — Time-Evolving Data Science and Artificial Intelligence for Advanced Open Environmental Science (TAIAO)
Read the public statement
Data science facilitates new approaches to longstanding problems in healthcare, policy, ecology and economy. But to make the most effective use of it, we need analytical methods that are straightforward to apply, open to review and audit, and produce results that can be correctly interpreted by practicing researchers and policymakers. Furthermore, we need methods that discover, gather and integrate potentially useful data with minimal human intervention, and ensure that the most suitable analysis methods are used with such data. Finally we need to empower a whole generation of researchers, across all fields, to use these new methods in robust and defensible ways.
Our team comprises researchers from the Universities of Auckland, Otago, Canterbury and Massey. In it, computer scientists and statisticians will work alongside domain scientists in fields such as computational biology, ecology and public health. Over the next 7 years, we will improve the application of data science methods in complex research settings, make processing more efficient, and create transparent and computationally-reproducible workflows that are published, open and easily reused. We will commit the majority of our budget to training and equipping the doctoral and post-doctoral researchers who will go on to successfully apply data science methods to making improvements to our environment, economy and society.
Read the public update from the 2022/23 annual report
The team has achieved significant research progress this year in four distinct areas: (i) phylogenetic analysis—studying the relationships between genomics and disease using advanced data science methods, (ii) ecological modelling—contrasting geographical and genomic differences within endangered species and (iii) live science publishing—creating a new way of publishing science experiments embedded within traditional research articles that are now getting traction with other researchers and (iv) AI tools that can validate or refute scientific claim--for example: "Covid vaccines cause sterility in males". In each of these areas we have developed new methods and related software that is being shared openly with the science community.
Collaborations within our team are now leading to additional grant funding in the areas of AI and genomics, and we have strengthened our collaborations with other research partners based in Aotearoa (notably ESR and Genomics Aotearoa with plans to extend to the national RNA Platform once it is underway).
We sponsored a significant national outreach & training event (ResBaz 2022: https://resbaz.auckland.ac.nz/sessions/(external link)) enabling several hundred applied researchers throughout the country to learn new data science skills that they can apply in their own research. We are also now training a future generation of NZ data scientists via our PhD and post-doctoral scholarships and via related AI Carpentry work.
Taken together, this progress is producing the new methods, code, collaborations and workforce that will ensure Aotearoa can take better advantage of data science in its future research endeavours in key industries such as genomics and AI and branches of government such as public health and debate on science and policy.
For more information, contact Prof. Mark Gahegan, Professor of Computer Science at the University of Auckland, email: m.gahegan@auckland.ac.nz
Read the public update from the 2021/22 annual report
The team has achieved significant research progress this year in three distinct areas: (i) phylogenetic analysis—studying the relationships between genomics and disease, (ii) ecological modelling—contrasting geographical and genomic differences within endangered species and (iii) live science publishing—creating a new way of publishing science experiments embedded within traditional research articles. In each of these areas we have developed new methods and related software that has been shared openly with the science community.
Collaborations within our team are now leading to additional grant proposals in the areas of AI and genomics, and we have strengthened our collaborations with other research partners based in Aotearoa (notably ESR and Genomics Aotearoa).
We sponsored a significant national outreach & training event (ResBaz 2021: https://resbaz.auckland.ac.nz/sessions/(external link)) enabling hundreds of applied researchers throughout the country to learn new data science skills that they can apply in their own research. We are also now training a future generation of NZ data scientists via our PhD and post-doctoral scholarships.
Taken together, this progress is producing the new methods, code, collaborations and workforce that will ensure Aotearoa can take full advantage of data science in its future research endeavours in key industries and branches of government.
For more information, contact Prof. Mark Gahegan, Professor of Computer Science at the University of Auckland, email: m.gahegan@auckland.ac.nz