NCEI Archiving
Project 1: National PAM Archival Storage: NCEI Archiving
Team leads
Carrie Wall Bell (NCEI), Jason Gedamke (OST)
Goal
Build efficient workflows to archive large volumes of NMFS PAM data and meet the projected data growth needs
Summary
The NCEI Passive Acoustic Data (PAD) Archive team will work collaboratively with each NMFS FMC to develop configurable tools that will facilitate FMCs to prepare numerous datasets in a more automated way for archiving at NCEI. This will allow this collective group to meet the PAM SI goal of archiving 900 TB of data in three years (Objective 3) and make significant progress towards meeting NOAA PARR requirements.
Accomplishments
Successfully archived 22TB of passive acoustic data. This volume will continue to rise as the team continues to ingest the packaged data.
Developed a solid ingest process to handle mobile marine platforms.
Released a new PAM data packaging software, PACE, to support efficient data archiving of large datasets.
Initiated the development of a cloud pipeline to efficiently ingest data and associated metadata from the PAM GCP to the NCEI Passive Acoustic Data Archive.
Project 2: Native to FLAC Conversion
Storing and processing audio data as FLAC files, a lossless compression format for audio data, saves substantial storage space and processing time. Fully integrating FLAC files into the PAM processing and storage workflow would require additional investment beyond the current PAM SI funding to develop software to convert from native to FLAC and ensure that common processing softwares accept FLAC formats. The PAM SI team will leverage existing effort to make progress towards this project as possible, but it is currently unfunded.
Target outcome
A more time-efficient workflow that omits steps in the file conversion process prior to archiving
Summary
The goal of this effort will be to eliminate the time-consuming step of converting native audio files into wav and then into FLAC by creating a workflow that directly converts native files into a FLAC format. It will also make progress on ensuring FLAC files will be accepted inputs from some of the most commonly used processing software.
Action Items
- Convert to FLAC: Develop processing pipeline to save native data formats to FLAC. This will include three sub tasks:
- Establish a list of what the native file formats are that we want to evaluate going from native to FLAC.
- Determine the level of effort to convert the formats listed in subtask 1 to FLAC and who will complete that work.
- Create processing code to convert identified native file types to FLAC.
- Process FLAC: Update common analysis software to read FLAC. This will include three sub tasks:
- Establish a list of standard software to develop this capacity.
- Determine what is needed to adjust the software list in subtask 1 to input FLAC files.
- Issue contract to complete the work identified in subtask 2.