NCEI Archiving
Project 1: National PAM Archival Storage: NCEI Archiving
Team leads
Carrie Wall Bell (NCEI), Jason Gedamke (OST)
Goal
Build efficient workflows to archive large volumes of NMFS PAM data and meet the projected data growth needs
Summary
The NCEI Passive Acoustic Data (PAD) Archive team will work collaboratively with each NMFS FMC to develop configurable tools that will facilitate FMCs to prepare numerous datasets in a more automated way for archiving at NCEI. This will allow this collective group to meet the PAM SI goal of archiving 900 TB of data in three years (Objective 3) and make significant progress towards meeting NOAA PARR requirements.
Action items
Archive Breakdown: FMCs are asked to update this spreadsheet with revised information based on the original table of datasets to archive found in PAM SI proposal / slide 12. The request is to provide more details on prioritized datasets, target month(s) for data submission, identify POCs to work with the archive team, and describe how metadata are stored (e.g., spreadsheet, Oracle database).
FLAC Check: Confirm FLAC conversion doesn’t alter audio values with Peter Dugan.
Kourtney’s Kode: Kourtney will share her Passive Packer process - using semi automated code to convert deployment metadata spreadsheet to Passive Packer ready database.
Metadata Madness: Meeting with NCEI PAD team, Kourtney, Jeff Walker, and Genevieve Davis to review metadata fields and ensure alignment between NCEI Passive Packer / Kourtney’s code /PACM.
Bracing for the Data Tsunami: NCEI PAD team assesses data prioritization and metadata information provided by the FMCs (Action 1) and outlines a plan for building the necessary capabilities into Passive Packer and the archive pipeline.
Call Me Maybe: Establish bimonthly meetings with NCEI PAD team and POCs to ensure regular communication regarding progress of archive development tasks and FMC dataset preparation.
Project 2: Native to FLAC Conversion
Storing and processing audio data as FLAC files, a lossless compression format for audio data, saves substantial storage space and processing time. Fully integrating FLAC files into the PAM processing and storage workflow would require additional investment beyond the current PAM SI funding to develop software to convert from native to FLAC and ensure that common processing softwares accept FLAC formats. The PAM SI team will leverage existing effort to make progress towards this project as possible, but it is currently unfunded.
Target outcome
A more time-efficient workflow that omits steps in the file conversion process prior to archiving
Summary
The goal of this effort will be to eliminate the time-consuming step of converting native audio files into wav and then into FLAC by creating a workflow that directly converts native files into a FLAC format. It will also make progress on ensuring FLAC files will be accepted inputs from some of the most commonly used processing software.
Action Items
- Convert to FLAC: Develop processing pipeline to save native data formats to FLAC. This will include three sub tasks:
- Establish a list of what the native file formats are that we want to evaluate going from native to FLAC.
- Determine the level of effort to convert the formats listed in subtask 1 to FLAC and who will complete that work.
- Create processing code to convert identified native file types to FLAC.
- Process FLAC: Update common analysis software to read FLAC. This will include three sub tasks:
- Establish a list of standard software to develop this capacity.
- Determine what is needed to adjust the software list in subtask 1 to input FLAC files.
- Issue contract to complete the work identified in subtask 2.