top of page
Writer's pictureWhale Seeker

Building Better AI: The Importance of Data Quality in Marine Mammal Detection

Updated: Sep 19

Image description: Aerial image featuring complex ice patterns, with a spotlight on a seal detected by Möbius. Photo credit : National Oceanic and Atmospheric Administration.


Are you about to conduct a marine mammal survey and looking for guidance on providing the best possible data? At Whale Seeker, we understand the challenges you face and are here to support you with our advanced image annotation tools. Our tools offer precise detection of marine mammals, helping you achieve accurate and reliable results in your surveys. 


And the key lies in preparation: whether you're using aerial photos, satellite images, drones, or ship-mounted systems, data quality is key to obtaining the answers you need for your research question, management decision or compliance work.


What Constitutes High-Quality Data?

As my former process improvement students know, when it comes to data, I love to repeat: Garbage In, Garbage Out. AI models definitely need accurate and abundant data but in addition, the data must be collected with the end goal in mind. At Whale Seeker, we accompany our clients to determine the exact research question or problem they are trying to solve. 


  • Our clients’ data collected needs to be relevant to the specific problem. For instance, our data for detecting marine mammals must include clear images with identifiable features, avoiding irrelevant data that could introduce noise and make it impossible to detect animals. For instance, we had contracts where part of the survey was almost unusable due to glare caused by the sun position or where there was fog at the time of collection. Being aware of this prior to processing the data will greatly speed up the process of the images and the training of the model.


  • Our clients’ data needs to cover the full range of scenarios the AI might encounter for the analysis of the images. The complexity of ocean environments is really hard on detection models. Having that diversity ensures our models can accurately detect marine mammals in various conditions such as different sea states and different seasons.


  • In the case of complex and diverse datasets, consistency on the meta-data (type, format) and acquisition process is key. Missing or partial meta-data in previous contracts slowed down the identification process by forcing us to investigate why we had missing meta-data. It also delayed the whole processing as we had to re-process whole surveys data. However, this does  not mean that historic data are not exploitable. They will just take more time to process depending on their quality. Consistency on meta-data  will translate into consistency in the model training and predictions, as well as allowing for smooth data processing which will speed up the analysis and provide our clients with results faster. 


Why Prioritize Data Quality?

Investing in data quality brings several key benefits as accurately summarised in this article by Hugging Face:


Benefits

Impact

Enhanced model outcomes

Eliminates noise and inaccuracies, leading to better performance

Robustness and generalisation

Diverse data helps prevent overfitting, ensuring models are reliable across different real-world scenarios and not only on benchmark datasets

Efficiency

Leads to more efficient models and models don’t need to be run multiple times, requiring fewer resources

Representation and inclusivity

Inclusion of different groups helps address biases and promotes equity.

Governance and accountability

Transparency in data management builds trust with our clients and is key to AI governance

Scientific reproducibility

Increased validity of findings and support for further research.


The Role of Data Quality in Marine Mammal Detection

For our marine mammal detection solutions, high-quality data is crucial. It ensures our models can accurately identify and monitor marine mammals, aiding in conservation efforts and environmental impact assessments. By prioritizing data quality, we uphold ethical standards, mitigate inter/intraobserver detection  biases, and ensure our AI systems operate responsibly.


Data Quality in Practice at Whale Seeker

At Whale Seeker, we implement rigorous data quality practices to support our AI solutions and ultimately the best results for our customers. According to Chloé Benko-Prieur, Whale Seeker’s intern, this includes the following elements:


“Meticulous data curation

We ensure relevance and accuracy through careful preprocessing, such as deduplication and content filtering. We also extract, standardize and validate meta-data such as camera specs and geospatial transforms to discard any problematic data and optimize the creation of high-quality training datasets. For example, we look at the spread of altitude values as shown in the figure below.


Participatory data collection

Involving stakeholders in data creation enhances representation and inclusivity. Our data validation algorithm efficiently identifies any missing spatio-temporal information our biologists need, allowing for immediate and precise communication with our clients. This proactive approach facilitates collaborative improvements and ensures that our datasets meet the specific needs for effective marine mammal detection.


Robust data governance framework and documentation

Clear policies and standards ensure consistent data management and accountability. Detailed documentation, including dataset cards, improves usability and transparency. 


Regular quality assessments

Metrics like accuracy and completeness help us identify and address issues early. Our algorithm automates the process of checking for requirements like correct geospatial alignment, image resolution and image quality. By automating these assessments, we can promptly identify anomalies and inaccuracies, enabling quicker adjustments and ensuring high-quality datasets for AI training.”


Guidance Tool for Optimal Data Collection

You are ready to do your marine mammal survey? We have developed a comprehensive guidance tool—a checklist designed to help you obtain the best data possible. This checklist covers all critical aspects of data collection, allowing you to capture high-quality, relevant, and comprehensive data for your surveys to optimize results leveraging our AI-assisted tools. Armed with this checklist, you can confidently gather the data needed to make informed decisions and achieve accurate results leveraging Whale Seeker’s tools.


At Whale Seeker, we are committed to leveraging the best data quality practices to develop AI solutions that not only meet the needs of our clients but also contribute to advancing marine mammal protection. By focusing on data quality, we build better AI systems that are reliable, efficient, and ethical.


For more information on our image annotation tools or for data collection guidance for your specific project, please contact us today. Together, we can make a positive impact on marine mammal conservation and environmental management.



Commenti


bottom of page