Skip to content

Documentation for aalibrary.utils

The utils submodule provides powerful and necessary functions for interacting with cloud providers. These functions include obtaining meta-metadata about the data files, such as how many files exist in a particular cruise in NCEI and more...

Modules:

Name Description
cloud_utils

This file contains all utility functions for Active Acoustics.

discrepancies

This file is used to identify discrepancies between what data exists on

frequency_data

This module contains the FrequencyData class.

gcp_utils

This file contains code pertaining to auxiliary functions related to parsing

helpers

For helper functions.

ices
nc_reader

This file is used to get header information out of a NetCDF file. The

ncei_cache_daily_script

Script to get all objects in the NCEI S3 bucket and cache it to BigQuery.

ncei_utils

This file contains code pertaining to auxiliary functions related to parsing

sonar_checker
timings

"This script deals with the times associated with ingesting/preprocessing

cloud_utils

This file contains all utility functions for Active Acoustics.

Functions:

Name Description
bq_query_to_pandas

Takes a SQL query and returns the end result as a DataFrame.

check_existence_of_supplemental_files

Checks the existence of supplemental files (idx, bot, etc.) for a raw

check_if_file_exists_in_gcp

Checks whether a particular file exists in GCP using the file path

check_if_file_exists_in_s3

Checks to see if a file exists in an s3 bucket. Intended for use with

check_if_netcdf_file_exists_in_gcp

Checks if a netcdf file exists in GCP storage. If the bucket location is

count_objects_in_s3_bucket_location

Counts the number of objects within a bucket location.

count_subdirectories_in_s3_bucket_location

Counts the number of subdirectories within a bucket location.

create_s3_objs

Creates the s3 objects needed for using boto3 for a particular bucket.

delete_file_from_gcp

Deletes a file from the storage bucket.

download_file_from_gcp

Downloads a file from the blob storage bucket.

download_file_from_gcp_as_string

Downloads a file from the blob storage bucket as a text string.

get_data_lake_directory_client

Creates a data lake directory client. Returns an object of type

get_object_key_for_s3

Creates an object key for a file within s3 given the parameters above.

get_service_client_sas

Gets an azure service client using an SAS (shared access signature)

get_subdirectories_in_s3_bucket_location

Gets a list of all the subdirectories in a specific bucket location

list_all_folders_in_gcp_bucket_location

Lists all of the folders in a GCP storage bucket location.

list_all_objects_in_gcp_bucket_location

Gets all of the files within a GCP storage bucket location.

list_all_objects_in_s3_bucket_location

Lists all of the objects in a s3 bucket location denoted by prefix.

setup_gbq_client_objs

Sets up Google Big Query client objects used to execute queries and

setup_gcp_storage_objs

Sets up Google Cloud Platform storage objects for use in accessing and

upload_file_to_gcp_bucket

Uploads a file to the blob storage bucket.

bq_query_to_pandas(client=None, query='')

Takes a SQL query and returns the end result as a DataFrame.

Source code in src\aalibrary\utils\cloud_utils.py
def bq_query_to_pandas(client: bigquery.Client = None, query: str = ""):
    """Takes a SQL query and returns the end result as a DataFrame."""

    job = client.query(query)
    return job.result().to_dataframe()

check_existence_of_supplemental_files(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', debug=False)

Checks the existence of supplemental files (idx, bot, etc.) for a raw file. Will check for existence in all data sources.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (do not include the dot "."). Defaults to "".

'raw'
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
debug bool

Whether or not to print debug statements. Defaults to False.

False

Returns:

Name Type Description
RawFile RawFile

Returns a RawFile object, existence can be accessed as a boolean via the variable within. Ex. rf.idx_file_exists_in_ncei

Source code in src\aalibrary\utils\cloud_utils.py
def check_existence_of_supplemental_files(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    debug: bool = False,
) -> RawFile:
    """Checks the existence of supplemental files (idx, bot, etc.) for a raw
    file. Will check for existence in all data sources.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        RawFile: Returns a RawFile object, existence can be accessed as a
            boolean via the variable within.
            Ex. rf.idx_file_exists_in_ncei
    """

    # Create connection vars
    gcp_stor_client, gcp_bucket_name, gcp_bucket = setup_gcp_storage_objs()
    _, s3_resource, _ = create_s3_objs()

    # Create the RawFile object.
    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        debug=debug,
        gcp_bucket=gcp_bucket,
        gcp_bucket_name=gcp_bucket_name,
        gcp_stor_client=gcp_stor_client,
        s3_resource=s3_resource,
    )

    return rf

check_if_file_exists_in_gcp(bucket=None, file_path='')

Checks whether a particular file exists in GCP using the file path (blob).

Parameters:

Name Type Description Default
bucket Bucket

The bucket object used to check for the file. Defaults to None.

None
file_path str

The blob file path within the bucket. Defaults to "".

''

Returns:

Name Type Description
Bool bool

True if the file already exists, False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py
def check_if_file_exists_in_gcp(
    bucket: storage.Bucket = None, file_path: str = ""
) -> bool:
    """Checks whether a particular file exists in GCP using the file path
    (blob).

    Args:
        bucket (storage.Bucket, optional): The bucket object used to check for
            the file. Defaults to None.
        file_path (str, optional): The blob file path within the bucket.
            Defaults to "".

    Returns:
        Bool: True if the file already exists, False otherwise.
    """

    return bucket.blob(file_path).exists()

check_if_file_exists_in_s3(object_key='', s3_resource=None, s3_bucket_name='')

Checks to see if a file exists in an s3 bucket. Intended for use with NCEI, but will work with other s3 buckets as well.

Parameters:

Name Type Description Default
object_key str

The object key (location of the object). Defaults to "".

''
s3_resource resource

The boto3 resource for this particular bucket. Defaults to None.

None
s3_bucket_name str

The bucket name. Defaults to "".

''

Returns:

Name Type Description
bool bool

True if the file exists within the bucket. False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py
def check_if_file_exists_in_s3(
    object_key: str = "",
    s3_resource: boto3.resource = None,
    s3_bucket_name: str = "",
) -> bool:
    """Checks to see if a file exists in an s3 bucket. Intended for use with
    NCEI, but will work with other s3 buckets as well.

    Args:
        object_key (str, optional): The object key (location of the object).
            Defaults to "".
        s3_resource (boto3.resource, optional): The boto3 resource for this
            particular bucket. Defaults to None.
        s3_bucket_name (str, optional): The bucket name. Defaults to "".

    Returns:
        bool: True if the file exists within the bucket. False otherwise.
    """

    try:
        s3_resource.Object(s3_bucket_name, object_key).load()
        return True
    except Exception:
        # object key does not exist.
        # print(e)
        return False

check_if_netcdf_file_exists_in_gcp(file_name='', ship_name='', survey_name='', echosounder='', data_source='', gcp_storage_bucket_location='', gcp_bucket=None, debug=False)

Checks if a netcdf file exists in GCP storage. If the bucket location is not specified, it will use the helpers to parse the correct location.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".

''
gcp_storage_bucket_location str

The string representing the blob's location within the storage bucket. Defaults to "".

''
gcp_bucket Bucket

The bucket object used for downloading.

None
debug bool

Whether or not to print debug statements. Defaults to False.

False

Returns:

Name Type Description
bool bool

True if the file exists in GCP, False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py
def check_if_netcdf_file_exists_in_gcp(
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    gcp_storage_bucket_location: str = "",
    gcp_bucket: storage.Bucket = None,
    debug: bool = False,
) -> bool:
    """Checks if a netcdf file exists in GCP storage. If the bucket location is
    not specified, it will use the helpers to parse the correct location.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        gcp_storage_bucket_location (str, optional): The string representing
            the blob's location within the storage bucket. Defaults to "".
        gcp_bucket (storage.Bucket): The bucket object used for downloading.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        bool: True if the file exists in GCP, False otherwise.
    """

    if gcp_storage_bucket_location != "":
        gcp_storage_bucket_location = (
            helpers.parse_correct_gcp_storage_bucket_location(
                file_name=file_name,
                file_type="netcdf",
                survey_name=survey_name,
                ship_name=ship_name,
                echosounder=echosounder,
                data_source=data_source,
                is_metadata=False,
                debug=debug,
            )
        )
    netcdf_gcp_storage_bucket_location = (
        get_netcdf_gcp_location_from_raw_gcp_location(
            gcp_storage_bucket_location=gcp_storage_bucket_location
        )
    )
    # check if the file exists in gcp
    return check_if_file_exists_in_gcp(
        bucket=gcp_bucket, file_path=netcdf_gcp_storage_bucket_location
    )

count_objects_in_s3_bucket_location(prefix='', bucket=None)

Counts the number of objects within a bucket location. NOTE: This DOES NOT include folders, as those do not count as objects.

Parameters:

Name Type Description Default
prefix str

The bucket location. Defaults to "".

''
bucket resource

The bucket resource object. Defaults to None.

None

Returns:

Name Type Description
int int

The count of objects within the location.

Source code in src\aalibrary\utils\cloud_utils.py
def count_objects_in_s3_bucket_location(
    prefix: str = "", bucket: boto3.resource = None
) -> int:
    """Counts the number of objects within a bucket location.
    NOTE: This DOES NOT include folders, as those do not count as objects.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.

    Returns:
        int: The count of objects within the location.
    """

    count = sum(1 for _ in bucket.objects.filter(Prefix=prefix).all())
    return count

count_subdirectories_in_s3_bucket_location(prefix='', bucket=None)

Counts the number of subdirectories within a bucket location.

Parameters:

Name Type Description Default
prefix str

The bucket location. Defaults to "".

''
bucket resource

The bucket resource object. Defaults to None.

None

Returns:

Name Type Description
int int

The count of subdirectories within the location.

Source code in src\aalibrary\utils\cloud_utils.py
def count_subdirectories_in_s3_bucket_location(
    prefix: str = "", bucket: boto3.resource = None
) -> int:
    """Counts the number of subdirectories within a bucket location.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.

    Returns:
        int: The count of subdirectories within the location.
    """

    subdirs = set()
    for obj in bucket.objects.filter(Prefix=prefix):
        prefix = "/".join(obj.key.split("/")[:-1])
        if prefix and prefix not in subdirs:
            subdirs.add(prefix)
            # print(prefix + "/")
    return len(subdirs)

create_s3_objs(bucket_name='noaa-wcsd-pds')

Creates the s3 objects needed for using boto3 for a particular bucket.

Parameters:

Name Type Description Default
bucket_name str

The bucket you want to refer to. The default points to the NCEI bucket. Defaults to "noaa-wcsd-pds".

'noaa-wcsd-pds'

Returns:

Name Type Description
Tuple Tuple

The s3 client (used for certain portions of the boto3 api), the s3 resource (newer, more used object for accessing s3 buckets), and the actual s3 bucket itself.

Source code in src\aalibrary\utils\cloud_utils.py
def create_s3_objs(bucket_name: str = "noaa-wcsd-pds") -> Tuple:
    """Creates the s3 objects needed for using boto3 for a particular bucket.

    Args:
        bucket_name (str, optional): The bucket you want to refer to. The
            default points to the NCEI bucket. Defaults to "noaa-wcsd-pds".

    Returns:
        Tuple: The s3 client (used for certain portions of the boto3 api), the
            s3 resource (newer, more used object for accessing s3 buckets), and
            the actual s3 bucket itself.
    """

    # Setup access to S3 bucket as an anonymous user
    s3_client = boto3.client(
        "s3",
        aws_access_key_id="",
        aws_secret_access_key="",
        config=Config(signature_version=UNSIGNED),
    )
    s3_resource = boto3.resource(
        "s3",
        aws_access_key_id="",
        aws_secret_access_key="",
        config=Config(signature_version=UNSIGNED),
    )

    s3_bucket = s3_resource.Bucket(bucket_name)

    return s3_client, s3_resource, s3_bucket

delete_file_from_gcp(gcp_bucket, blob_file_path)

Deletes a file from the storage bucket.

Parameters:

Name Type Description Default
gcp_bucket bucket

The bucket object used for downloading from.

required
blob_file_path str

The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.

required

Raises: AssertionError: If the file does not exist in GCP. Exception: If there is an error deleting the file.

Source code in src\aalibrary\utils\cloud_utils.py
def delete_file_from_gcp(
    gcp_bucket: storage.Client.bucket, blob_file_path: str
):
    """Deletes a file from the storage bucket.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
    Raises:
        AssertionError: If the file does not exist in GCP.
        Exception: If there is an error deleting the file.
    """

    file_exists_in_gcp = check_if_file_exists_in_gcp(
        gcp_bucket, blob_file_path
    )
    assert (
        file_exists_in_gcp
    ), f"File does not exist in GCP at `{blob_file_path}`."

    blob = gcp_bucket.blob(blob_file_path)
    try:
        blob.delete()
        return
    except Exception:
        print(traceback.format_exc())
        raise

download_file_from_gcp(gcp_bucket, blob_file_path, local_file_path, debug=False)

Downloads a file from the blob storage bucket.

Parameters:

Name Type Description Default
gcp_bucket bucket

The bucket object used for downloading from.

required
blob_file_path str

The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.

required
local_file_path str

The local file path you wish to download the blob to.

required
debug bool

Whether or not to print debug statements.

False
Source code in src\aalibrary\utils\cloud_utils.py
def download_file_from_gcp(
    gcp_bucket: storage.Client.bucket,
    blob_file_path: str,
    local_file_path: str,
    debug: bool = False,
):
    """Downloads a file from the blob storage bucket.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
        local_file_path (str): The local file path you wish to download the
            blob to.
        debug (bool): Whether or not to print debug statements.
    """

    blob = gcp_bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Download from blob
    try:
        blob.download_to_filename(local_file_path)
        if debug:
            print(f"New data downloaded to {local_file_path}")
    except Exception:
        print(traceback.format_exc())
        raise

download_file_from_gcp_as_string(gcp_bucket, blob_file_path)

Downloads a file from the blob storage bucket as a text string.

Parameters:

Name Type Description Default
gcp_bucket bucket

The bucket object used for downloading from.

required
blob_file_path str

The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.

required
Source code in src\aalibrary\utils\cloud_utils.py
def download_file_from_gcp_as_string(
    gcp_bucket: storage.Client.bucket,
    blob_file_path: str,
):
    """Downloads a file from the blob storage bucket as a text string.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
    """

    blob = gcp_bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Download from blob
    try:
        return blob.download_as_text(encoding='utf-8')
    except Exception:
        print(traceback.format_exc())
        raise

get_data_lake_directory_client(config_file_path='')

Creates a data lake directory client. Returns an object of type DataLakeServiceClient.

Parameters:

Name Type Description Default
config_file_path str

The location of the config file. Needs a [DEFAULT] section with a azure_connection_string variable defined. Defaults to "".

''

Returns:

Name Type Description
DataLakeServiceClient DataLakeServiceClient

An object of type DataLakeServiceClient, with connection to the connection string described in the config.

Source code in src\aalibrary\utils\cloud_utils.py
def get_data_lake_directory_client(
    config_file_path: str = "",
) -> DataLakeServiceClient:
    """Creates a data lake directory client. Returns an object of type
    DataLakeServiceClient.

    Args:
        config_file_path (str, optional): The location of the config file.
            Needs a `[DEFAULT]` section with a `azure_connection_string`
            variable defined. Defaults to "".

    Returns:
        DataLakeServiceClient: An object of type DataLakeServiceClient, with
            connection to the connection string described in the config.
    """

    config = configparser.ConfigParser()
    config.read(config_file_path)

    azure_service = DataLakeServiceClient.from_connection_string(
        conn_str=config["DEFAULT"]["azure_connection_string"]
    )

    return azure_service

get_object_key_for_s3(file_url='', file_name='', ship_name='', survey_name='', echosounder='')

Creates an object key for a file within s3 given the parameters above.

Parameters:

Name Type Description Default
file_url str

The entire url to the file resource in s3. Starts with "https://" or "s3://". Defaults to "". NOTE: If this is specified, there is no need to provide the other parameters.

''
file_name str

The file name (includes extension). Defaults to "".

''
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
Source code in src\aalibrary\utils\cloud_utils.py
def get_object_key_for_s3(
    file_url: str = "",
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
):
    """Creates an object key for a file within s3 given the parameters above.

    Args:
        file_url (str, optional): The entire url to the file resource in s3.
            Starts with "https://" or "s3://". Defaults to "".
            NOTE: If this is specified, there is no need to provide the other
            parameters.
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
    """

    if file_url:
        # We replace the beginning of common file paths
        file_url = file_url.replace(
            "https://noaa-wcsd-pds.s3.amazonaws.com/", ""
        )
        file_url = file_url.replace("s3://noaa-wcsd-pds/", "")
        return file_url
    else:
        # We default to using the parameters to create an object key according
        # to NCEI standards.
        object_key = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        return object_key

get_service_client_sas(account_name, sas_token)

Gets an azure service client using an SAS (shared access signature) token. The token must be created in Azure.

Parameters:

Name Type Description Default
account_name str

The name of the account you are trying to create a service client with. This is usually a storage account that is attached to the container.

required
sas_token str

The complete SAS token.

required

Returns:

Name Type Description
DataLakeServiceClient DataLakeServiceClient

An object of type DataLakeServiceClient, with connection to the container/file the SAS allows access to.

Source code in src\aalibrary\utils\cloud_utils.py
def get_service_client_sas(
    account_name: str, sas_token: str
) -> DataLakeServiceClient:
    """Gets an azure service client using an SAS (shared access signature)
    token. The token must be created in Azure.

    Args:
        account_name (str): The name of the account you are trying to create a
            service client with. This is usually a storage account that is
            attached to the container.
        sas_token (str): The complete SAS token.

    Returns:
        DataLakeServiceClient: An object of type DataLakeServiceClient, with
            connection to the container/file the SAS allows access to.
    """
    account_url = f"https://{account_name}.dfs.core.windows.net"

    # The SAS token string can be passed in as credential param or appended to
    # the account URL
    service_client = DataLakeServiceClient(account_url, credential=sas_token)

    return service_client

get_subdirectories_in_s3_bucket_location(prefix='', s3_client=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')

Gets a list of all the subdirectories in a specific bucket location (called a prefix). The return can be with full paths (root to folder inclusive), or just the folder names.

Parameters:

Name Type Description Default
prefix str

The bucket folder location. Defaults to "".

''
s3_client client

The bucket client object. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False
bucket_name str

The bucket name. Defaults to "noaa-wcsd-pds".

'noaa-wcsd-pds'

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the subdirectory. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\cloud_utils.py
def get_subdirectories_in_s3_bucket_location(
    prefix: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
    bucket_name: str = "noaa-wcsd-pds",
) -> List[str]:
    """Gets a list of all the subdirectories in a specific bucket location
    (called a prefix). The return can be with full paths (root to folder
    inclusive), or just the folder names.

    Args:
        prefix (str, optional): The bucket folder location. Defaults to "".
        s3_client (boto3.client, optional): The bucket client object.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
        bucket_name (str, optional): The bucket name. Defaults to
            "noaa-wcsd-pds".

    Returns:
        List[str]: A list of strings, each being the subdirectory. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """
    if not s3_client:
        s3_client, _, _ = create_s3_objs(bucket_name)

    subdirs = set()
    result = s3_client.list_objects(
        Bucket=bucket_name, Prefix=prefix, Delimiter="/"
    )
    for o in result.get("CommonPrefixes"):
        subdir_full_path_from_prefix = o.get("Prefix")
        if return_full_paths:
            subdir = subdir_full_path_from_prefix
        else:
            subdir = subdir_full_path_from_prefix.replace(prefix, "")
            subdir = subdir.replace("/", "")
        subdirs.add(subdir)
    return list(subdirs)

list_all_folders_in_gcp_bucket_location(location='', gcp_bucket=None, return_full_paths=True)

Lists all of the folders in a GCP storage bucket location.

Parameters:

Name Type Description Default
location str

The blob location you would like to get the folders of. Defaults to "".

''
gcp_bucket bucket

The gcp bucket to use. Defaults to None.

None
return_full_paths bool

Whether or not to return full paths. Defaults to True.

True

Returns:

Type Description
List[str]

List[str]: A list of strings containing the folder names or full paths.

Source code in src\aalibrary\utils\cloud_utils.py
def list_all_folders_in_gcp_bucket_location(
    location: str = "",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = True,
) -> List[str]:
    """Lists all of the folders in a GCP storage bucket location.

    Args:
        location (str, optional): The blob location you would like to get the
            folders of. Defaults to "".
        gcp_bucket (storage.Client.bucket, optional): The gcp bucket to use.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not to return full
            paths. Defaults to True.

    Returns:
        List[str]: A list of strings containing the folder names or full paths.
    """

    if location and not location.endswith("/"):
        location += "/"

    blobs_iterator = gcp_bucket.list_blobs(prefix=location, delimiter="/")

    folder_prefixes = []
    # We MUST iterate through all blobs, since this is a lazy-loading iterator.
    for _ in blobs_iterator:
        ...

    if blobs_iterator.prefixes:
        for p in blobs_iterator.prefixes:
            folder_prefixes.append(p)

    if return_full_paths:
        return folder_prefixes
    else:
        return [b.split("/")[-2] for b in folder_prefixes]

list_all_objects_in_gcp_bucket_location(location='', gcp_bucket=None)

Gets all of the files within a GCP storage bucket location.

Parameters:

Name Type Description Default
location str

The location to search for files. Defaults to "". Ex. "NCEI/Reuben_Lasker/RL2107"

''
gcp_bucket bucket

The gcp bucket to use. Defaults to None.

None

Returns:

Type Description
List[str]

List[str]: A list of strings containing all URIs for each file in the bucket.

Source code in src\aalibrary\utils\cloud_utils.py
def list_all_objects_in_gcp_bucket_location(
    location: str = "", gcp_bucket: storage.Client.bucket = None
) -> List[str]:
    """Gets all of the files within a GCP storage bucket location.

    Args:
        location (str, optional): The location to search for files. Defaults
            to "".
            Ex. "NCEI/Reuben_Lasker/RL2107"
        gcp_bucket (storage.Client.bucket, optional): The gcp bucket to use.
            Defaults to None.

    Returns:
        List[str]: A list of strings containing all URIs for each file in the
            bucket.
    """

    all_blobs_in_this_location = []
    for blob in gcp_bucket.list_blobs(prefix=location):
        all_blobs_in_this_location.append(blob.name)
    return all_blobs_in_this_location

list_all_objects_in_s3_bucket_location(prefix='', s3_resource=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')

Lists all of the objects in a s3 bucket location denoted by prefix. Returns a list containing str. You get full paths if you specify the return_full_paths parameter.

Parameters:

Name Type Description Default
prefix str

The bucket location. Defaults to "".

''
s3_resource resource

The bucket resource object. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False
bucket_name str

The bucket name. Defaults to "noaa-wcsd-pds".

'noaa-wcsd-pds'

Returns:

Type Description
List[str]

List[str]: A list of strings containing either the objects name or path, dependent on the return_full_paths parameter.

Source code in src\aalibrary\utils\cloud_utils.py
def list_all_objects_in_s3_bucket_location(
    prefix: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
    bucket_name: str = "noaa-wcsd-pds",
) -> List[str]:
    """Lists all of the objects in a s3 bucket location denoted by `prefix`.
    Returns a list containing str. You get full paths if you specify the
    `return_full_paths` parameter.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        s3_resource (boto3.resource, optional): The bucket resource object.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
        bucket_name (str, optional): The bucket name. Defaults to
            "noaa-wcsd-pds".

    Returns:
        List[str]: A list of strings containing either the objects name or
            path, dependent on the `return_full_paths` parameter.
    """
    if not s3_resource:
        _, s3_resource, _ = create_s3_objs(bucket_name)

    object_keys = set()
    bucket = s3_resource.Bucket(bucket_name)
    for obj in bucket.objects.filter(Prefix=prefix):
        if return_full_paths:
            object_keys.add(obj.key)
        else:
            object_keys.add(obj.key.split("/")[-1])

    return list(object_keys)

setup_gbq_client_objs(location='US', project_id='ggn-nmfs-aa-dev-1')

Sets up Google Big Query client objects used to execute queries and such.

Parameters:

Name Type Description Default
location str

The location of the big-query tables/database. This is usually set when creating the database in big query. Defaults to "US".

'US'
project_id str

The project id that the big query instance belongs to. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'

Returns:

Name Type Description
Tuple Tuple[Client, GCSFileSystem]

The big query client object, along with an object for the Google Cloud Storage file system.

Source code in src\aalibrary\utils\cloud_utils.py
def setup_gbq_client_objs(
    location: str = "US", project_id: str = "ggn-nmfs-aa-dev-1"
) -> Tuple[bigquery.Client, gcsfs.GCSFileSystem]:
    """Sets up Google Big Query client objects used to execute queries and
    such.

    Args:
        location (str, optional): The location of the big-query
            tables/database. This is usually set when creating the database in
            big query. Defaults to "US".
        project_id (str, optional): The project id that the big query instance
            belongs to. Defaults to "ggn-nmfs-aa-dev-1".

    Returns:
        Tuple: The big query client object, along with an object for the Google
            Cloud Storage file system.
    """

    gcp_bq_client = bigquery.Client(location=location)

    gcp_gcs_file_system = gcsfs.GCSFileSystem(project=project_id)

    return gcp_bq_client, gcp_gcs_file_system

setup_gcp_storage_objs(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data')

Sets up Google Cloud Platform storage objects for use in accessing and modifying storage buckets.

Parameters:

Name Type Description Default
project_id str

The project id of the project you want to access. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'
gcp_bucket_name str

The name of the exact bucket you want to access. Defaults to "ggn-nmfs-aa-dev-1-data".

'ggn-nmfs-aa-dev-1-data'

Returns:

Type Description
Tuple[Client, str, bucket]

Tuple[storage.Client, str, storage.Client.bucket]: The storage client, followed by the GCP bucket name (str) and then the actual bucket object itself (which will be executing the commands used in this api).

Source code in src\aalibrary\utils\cloud_utils.py
def setup_gcp_storage_objs(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
) -> Tuple[storage.Client, str, storage.Client.bucket]:
    """Sets up Google Cloud Platform storage objects for use in accessing and
    modifying storage buckets.

    Args:
        project_id (str, optional): The project id of the project you want to
            access. Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The name of the exact bucket you want
            to access. Defaults to "ggn-nmfs-aa-dev-1-data".

    Returns:
        Tuple[storage.Client, str, storage.Client.bucket]: The storage client,
            followed by the GCP bucket name (str) and then the actual bucket
            object itself (which will be executing the commands used in this
            api).
    """

    gcp_stor_client = storage.Client(project=project_id)

    gcp_bucket = gcp_stor_client.bucket(gcp_bucket_name)

    return (gcp_stor_client, gcp_bucket_name, gcp_bucket)

upload_file_to_gcp_bucket(bucket, blob_file_path, local_file_path, debug=False)

Uploads a file to the blob storage bucket.

Parameters:

Name Type Description Default
bucket bucket

The bucket object used for uploading.

required
blob_file_path str

The blob's file path. Ex. "data/itds/logs/execute_code_files/temp.csv" NOTE: This must include the file name as well as the extension.

required
local_file_path str

The local file path you wish to upload to the blob.

required
debug bool

Whether or not to print debug statements.

False
Source code in src\aalibrary\utils\cloud_utils.py
def upload_file_to_gcp_bucket(
    bucket: storage.Client.bucket,
    blob_file_path: str,
    local_file_path: str,
    debug: bool = False,
):
    """Uploads a file to the blob storage bucket.

    Args:
        bucket (storage.Client.bucket): The bucket object used for uploading.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_code_files/temp.csv"
            NOTE: This must include the file name as well as the extension.
        local_file_path (str): The local file path you wish to upload to the
            blob.
        debug (bool): Whether or not to print debug statements.
    """

    if not bucket:
        _, _, bucket = setup_gcp_storage_objs()

    blob = bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Upload a new blob
    try:
        blob.upload_from_filename(local_file_path)
        if debug:
            print(f"New data uploaded to {blob.name}")
    except Exception:
        print(traceback.format_exc())
        raise

discrepancies

This file is used to identify discrepancies between what data exists on local versus what exists on the cloud. It considers the following things when comparing: * Number of files per cruise * File Name/Types * File Sizes * Checksum

Functions:

Name Description
compare_local_cruise_files_to_cloud

Compares the locally stored cruise files (per echosounder) to what

get_local_file_size

Gets the size of a local file in bytes.

get_local_sha256_checksum

Calculates the SHA256 checksum of a file.

compare_local_cruise_files_to_cloud(local_cruise_file_path='', ship_name='', survey_name='', echosounder='')

Compares the locally stored cruise files (per echosounder) to what exists on the cloud by number of files, file sizes, and checksums. Reports any discrepancies in the console.

Parameters:

Name Type Description Default
local_cruise_file_path str

The folder path for the locally stored cruise data. Defaults to "".

''
ship_name str

The ship name that the cruise falls under. Defaults to "".

''
survey_name str

The survey/cruise name. Defaults to "".

''
echosounder str

The specific echosounder you want to check. Defaults to "".

''
Source code in src\aalibrary\utils\discrepancies.py
def compare_local_cruise_files_to_cloud(
    local_cruise_file_path: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
):
    """Compares the locally stored cruise files (per echosounder) to what
    exists on the cloud by number of files, file sizes, and
    checksums. Reports any discrepancies in the console.

    Args:
        local_cruise_file_path (str, optional): The folder path for the locally
            stored cruise data. Defaults to "".
        ship_name (str, optional): The ship name that the cruise falls under.
            Defaults to "".
        survey_name (str, optional): The survey/cruise name. Defaults to "".
        echosounder (str, optional): The specific echosounder you want to
            check. Defaults to "".
    """

    # Create vars for use later
    _, s3_resource, _ = create_s3_objs()

    # Get all local files paths in cruise directory
    all_raw_file_paths = glob.glob(local_cruise_file_path + "/*.raw")
    all_idx_file_paths = glob.glob(local_cruise_file_path + "/*.idx")
    all_bot_file_paths = glob.glob(local_cruise_file_path + "/*.bot")
    # Check file numbers & types
    num_local_raw_files = len(all_raw_file_paths)
    num_local_idx_files = len(all_idx_file_paths)
    num_local_bot_files = len(all_bot_file_paths)
    num_local_files = (
        num_local_raw_files + num_local_idx_files + num_local_bot_files
    )
    # Get file names along with file paths
    # [(local_file_path, file_name_with_extension), (...)]
    all_raw_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_raw_file_paths
    ]
    all_idx_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_idx_file_paths
    ]
    all_bot_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_bot_file_paths
    ]

    # Compare number of files in cruise, local vs cloud
    num_files_in_s3 = get_all_file_names_in_a_surveys_echosounder_folder(
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        s3_resource=s3_resource,
        return_full_paths=False,
    )
    if num_files_in_s3 == (num_local_files):
        print(
            "NUMBER OF FILES MATCH FOR"
            f" {ship_name}/{survey_name}/{echosounder}"
        )
    elif num_files_in_s3 != (num_local_files):
        print(
            "NUMBER OF FILES DO NOT MATCH FOR"
            f" {ship_name}/{survey_name}/{echosounder}"
        )
        print(
            f"NUMBER OF FILES IN S3: {num_files_in_s3} | NUMBER OF LOCAL "
            f"FILES: {num_local_files}"
        )

    # Go through each local file, and compare file existence, size, checksum
    for local_file_path, file_name in all_raw_file_paths:
        # Create s3 object key
        s3_object_key = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        # Get existence of file in s3
        file_exists_in_s3 = check_if_file_exists_in_s3(
            object_key=s3_object_key,
            s3_resource=s3_resource,
            s3_bucket_name="noaa-wcsd-pds",
        )
        # If file exists in s3, get size and checksum
        if file_exists_in_s3:
            # Get file size for s3 object key
            s3_file_size = get_file_size_from_s3(
                object_key=s3_object_key, s3_resource=s3_resource
            )
            # Get checksum for object key
            s3_checksum = get_checksum_sha256_from_s3(
                object_key=s3_object_key, s3_resource=s3_resource
            )

        # Get local file size
        local_file_size = get_local_file_size(local_file_path)
        # Get local file checksum
        local_file_checksum = get_local_sha256_checksum(local_file_path)

        # Compare existence
        if not file_exists_in_s3:
            print(
                f"LOCAL FILE {local_file_path} DOES NOT EXIST IN S3:"
                f" {s3_object_key}"
            )
        elif file_exists_in_s3:
            # Compare file sizes
            if local_file_size != s3_file_size:
                print(
                    f"FILE SIZE MISMATCH FOR {local_file_path} | LOCAL: "
                    f"{local_file_size} | S3: {s3_file_size}"
                )
            # Compare checksums
            if local_file_checksum != s3_checksum:
                print(
                    f"CHECKSUM MISMATCH FOR {local_file_path} | LOCAL: "
                    f"{local_file_checksum} | S3: {s3_checksum}"
                )

get_local_file_size(local_file_path)

Gets the size of a local file in bytes.

Parameters:

Name Type Description Default
local_file_path str

The local file path.

required

Returns:

Name Type Description
int int

The size of the file in bytes.

Source code in src\aalibrary\utils\discrepancies.py
def get_local_file_size(local_file_path: str) -> int:
    """Gets the size of a local file in bytes.

    Args:
        local_file_path (str): The local file path.

    Returns:
        int: The size of the file in bytes.
    """
    return os.path.getsize(local_file_path)

get_local_sha256_checksum(local_file_path, chunk_size=65536)

Calculates the SHA256 checksum of a file.

Parameters:

Name Type Description Default
local_file_path str

The path to the file.

required
chunk_size int

The size of chunks to read the file in (in bytes). Larger chunks can be more efficient for large files.

65536

Returns:

Name Type Description
str str

The SHA256 checksum of the file as a hexadecimal string.

Source code in src\aalibrary\utils\discrepancies.py
def get_local_sha256_checksum(local_file_path, chunk_size=65536) -> str:
    """
    Calculates the SHA256 checksum of a file.

    Args:
        local_file_path (str): The path to the file.
        chunk_size (int): The size of chunks to read the file in (in bytes).
                          Larger chunks can be more efficient for large files.

    Returns:
        str: The SHA256 checksum of the file as a hexadecimal string.
    """

    sha256_hash = hashlib.sha256()
    try:
        with open(local_file_path, "rb") as f:
            # Read the file in chunks to handle large files efficiently
            for chunk in iter(lambda: f.read(chunk_size), b""):
                sha256_hash.update(chunk)
        return sha256_hash.hexdigest()
    except FileNotFoundError:
        return "File not found."
    except Exception as e:
        return f"An error occurred: {e}"

frequency_data

This module contains the FrequencyData class.

Classes:

Name Description
FrequencyData

Given some dataset 'Sv', list all frequencies available. This class

Functions:

Name Description
main

Opens a sample netCDF file and constructs a FrequencyData object to

FrequencyData

Given some dataset 'Sv', list all frequencies available. This class offers methods which help map out frequencies and channels plus additional utilities.

Methods:

Name Description
__init__

Initializes class object and parses the frequencies available

construct_frequency_list

Parses the frequencies available in the xarray 'Sv'

construct_frequency_map

Either using a channel_list or a frequency_list this function

construct_frequency_pair_combination_list

Returns a list of tuple elements containing frequency combinations

construct_frequency_set_combination_list

Constructs a list of available frequency set permutations.

powerset

Generates combinations of elements of iterables ;

print_frequency_list

Prints each frequency element available in Sv.

print_frequency_pair_combination_list

Prints frequency combination list one element at a time.

print_frequency_set_combination_list

Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py
class FrequencyData:
    """Given some dataset 'Sv', list all frequencies available. This class
    offers methods which help map out frequencies and channels plus additional
    utilities."""

    def __init__(self, Sv):
        """Initializes class object and parses the frequencies available
        within the echodata object (xarray.Dataset) 'Sv'.

        Args:
            Sv (xarray.Dataset): The 'Sv' echodata object.
        """

        self.Sv = Sv  # Create a self object.
        self.frequency_list = []  # Declares a frequency list to be modified.

        self.construct_frequency_list()  # Construct the frequency list.
        # TODO : This string needs cleaning up ; remove unneeded commas and
        # empty tuples.
        # Constructs a list of available frequency set permutations.
        # Example : [('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
        # ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
        # ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'),
        # ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),
        # ('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'),
        # ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]
        self.frequency_set_combination_list = (
            self.construct_frequency_set_combination_list()
        )
        # print(self.frequency_set_combination_list)
        # Constructs a list of all possible unequal permutation pairs of
        # frequencies.
        # Example : [('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'),
        # ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
        # ('120 kHz', '200 kHz')]
        self.frequency_pair_combination_list = (
            self.construct_frequency_pair_combination_list()
        )
        # print(self.frequency_pair_combination_list)
        self.construct_frequency_map()

    def construct_frequency_list(self):
        """Parses the frequencies available in the xarray 'Sv'"""
        # Iterate through the natural index associated with Sv.Sv
        for i in range(len(self.Sv.Sv)):
            # Extract frequency.
            self.frequency_list.append(
                str(self.Sv.Sv[i].coords.get("channel"))
                .split(" kHz")[0]
                .split("GPT")[1]
                .strip()
                + " kHz"
            )
        # Log the constructed frequency list.
        logger.debug(f"Constructed frequency list: {self.frequency_list}")
        # Return string array frequency list of the form [18kHz, 70kHz, 200kHz]
        return self.frequency_list

    def powerset(self, iterable):
        """Generates combinations of elements of iterables ;
        powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

        Args:
            iterable (_type_): A list.

        Returns combinations of elements of iterables.
        """
        # Make a list from the iterable.
        s = list(iterable)
        # Returns a list of tuple elements containing combinations of elements
        # which derived from the iterable object.
        return chain.from_iterable(
            combinations(s, r) for r in range(len(s) + 1)
        )

    def construct_frequency_set_combination_list(self) -> List[Tuple]:
        """Constructs a list of available frequency set permutations.
        Example : [
            ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
            ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
            ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
            ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'),
            ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'),
            ('38 kHz', '120 kHz', '200 kHz'),
            ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]


        Returns:
            list<tuple>: A list of tuple elements containing frequency
                combinations which is useful for the KMeansOperator class.
        """
        # Returns a list of tuple elements containing frequency combinations
        # which is useful for the KMeansOperator class.
        return list(self.powerset(self.frequency_list))

    def print_frequency_set_combination_list(self):
        """Prints frequency combination list one element at a time."""

        for (
            i
        ) in (
            self.frequency_set_combination_list
        ):  # For each frequency combination associated with Sv.
            print(i)  # Print out frequency combination tuple.

    def construct_frequency_pair_combination_list(self) -> List[Tuple]:
        """Returns a list of tuple elements containing frequency combinations
        which is useful for the KMeansOperator class.

        Returns:
            list<tuple>: A list of tuple elements containing frequency
                combinations which is useful for the KMeansOperator class.
        """
        # Returns a list of tuple elements containing frequency combinations
        # which is useful for the KMeansOperator class.
        return list(itertools.combinations(self.frequency_list, 2))

    def print_frequency_pair_combination_list(self):
        """Prints frequency combination list one element at a time."""

        # For each frequency combination associated with Sv.
        for i in self.frequency_pair_combination_list:
            # Print out frequency combination tuple.
            print(i)

    def print_frequency_list(self):
        """Prints each frequency element available in Sv."""
        # For each frequency in the frequency_list associated with Sv.
        for i in self.frequency_list:
            # Print out the associated frequency.
            print(i)

    def construct_frequency_map(self, frequencies_provided=True):
        """Either using a channel_list or a frequency_list this function
        provides one which satisfies all requirements of this class structure.
        In particular the channels and frequencies involved have to be known
        and mapped to one another.

        Args:
            frequencies_provided (boolean): was a frequency_list provided at
                object creation? If so then 'True' if a channel_list instead
                was used then 'False'.
        """
        if frequencies_provided is True:
            self.simple_frequency_list = self.frequency_list
            # Declare a frequency map to be populated with string frequencies
            # of the form [[1,'38kHz'],[2,'120kHz'],[4,'200kHz']] where the
            # first element is meant to be the channel representing the
            # frequency. This is an internal object. Do not interfere.
            self.frequency_map = []
            # For each frequency 'j'.
            for j in self.simple_frequency_list:
                # Check each channel 'i'.
                for i in range(len(self.Sv.Sv)):
                    channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
                    # If the channel description contains "ES" then it is an
                    # ES channel.
                    if "ES" in channel_desc:
                        numeric_frequency_desc = (
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split("ES")[1]
                            .split("-")[0]
                            .strip()
                        )
                        if numeric_frequency_desc == j.split("kHz")[0].strip():
                            self.frequency_map.append(
                                [i, numeric_frequency_desc + " kHz"]
                            )
                    # If the channel description contains "GPT" then it is a
                    # GPT channel.
                    if "GPT" in channel_desc:
                        numeric_frequency_desc = (
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("GPT")[1]
                            .strip()
                        )
                        # To see if the channel associates with the
                        # frequency 'j' .
                        if numeric_frequency_desc == j.split("kHz")[0].strip():
                            # If so append it and the channel to the
                            # 'frequency_list'.
                            self.frequency_map.append(
                                [i, numeric_frequency_desc + " kHz"]
                            )
        else:

            channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
            # If the channel description contains "ES" then it is an ES
            # channel.
            if "ES" in channel_desc:
                for i in self.channel_list:
                    self.frequency_map.append(
                        [
                            i,
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("ES")[1]
                            .strip()
                            + " kHz",
                        ]
                    )
            # If the channel description contains "GPT" then it is a
            # GPT channel.
            if "GPT" in channel_desc:
                for i in self.channel_list:
                    self.frequency_map.append(
                        [
                            i,
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("GPT")[1]
                            .strip()
                            + " kHz",
                        ]
                    )

        # Remove duplicates from frequency_list.
        self.frequency_map = [
            list(t) for t in set(tuple(item) for item in self.frequency_map)
        ]

__init__(Sv)

Initializes class object and parses the frequencies available within the echodata object (xarray.Dataset) 'Sv'.

Parameters:

Name Type Description Default
Sv Dataset

The 'Sv' echodata object.

required
Source code in src\aalibrary\utils\frequency_data.py
def __init__(self, Sv):
    """Initializes class object and parses the frequencies available
    within the echodata object (xarray.Dataset) 'Sv'.

    Args:
        Sv (xarray.Dataset): The 'Sv' echodata object.
    """

    self.Sv = Sv  # Create a self object.
    self.frequency_list = []  # Declares a frequency list to be modified.

    self.construct_frequency_list()  # Construct the frequency list.
    # TODO : This string needs cleaning up ; remove unneeded commas and
    # empty tuples.
    # Constructs a list of available frequency set permutations.
    # Example : [('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
    # ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
    # ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'),
    # ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),
    # ('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'),
    # ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]
    self.frequency_set_combination_list = (
        self.construct_frequency_set_combination_list()
    )
    # print(self.frequency_set_combination_list)
    # Constructs a list of all possible unequal permutation pairs of
    # frequencies.
    # Example : [('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'),
    # ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
    # ('120 kHz', '200 kHz')]
    self.frequency_pair_combination_list = (
        self.construct_frequency_pair_combination_list()
    )
    # print(self.frequency_pair_combination_list)
    self.construct_frequency_map()

construct_frequency_list()

Parses the frequencies available in the xarray 'Sv'

Source code in src\aalibrary\utils\frequency_data.py
def construct_frequency_list(self):
    """Parses the frequencies available in the xarray 'Sv'"""
    # Iterate through the natural index associated with Sv.Sv
    for i in range(len(self.Sv.Sv)):
        # Extract frequency.
        self.frequency_list.append(
            str(self.Sv.Sv[i].coords.get("channel"))
            .split(" kHz")[0]
            .split("GPT")[1]
            .strip()
            + " kHz"
        )
    # Log the constructed frequency list.
    logger.debug(f"Constructed frequency list: {self.frequency_list}")
    # Return string array frequency list of the form [18kHz, 70kHz, 200kHz]
    return self.frequency_list

construct_frequency_map(frequencies_provided=True)

Either using a channel_list or a frequency_list this function provides one which satisfies all requirements of this class structure. In particular the channels and frequencies involved have to be known and mapped to one another.

Parameters:

Name Type Description Default
frequencies_provided boolean

was a frequency_list provided at object creation? If so then 'True' if a channel_list instead was used then 'False'.

True
Source code in src\aalibrary\utils\frequency_data.py
def construct_frequency_map(self, frequencies_provided=True):
    """Either using a channel_list or a frequency_list this function
    provides one which satisfies all requirements of this class structure.
    In particular the channels and frequencies involved have to be known
    and mapped to one another.

    Args:
        frequencies_provided (boolean): was a frequency_list provided at
            object creation? If so then 'True' if a channel_list instead
            was used then 'False'.
    """
    if frequencies_provided is True:
        self.simple_frequency_list = self.frequency_list
        # Declare a frequency map to be populated with string frequencies
        # of the form [[1,'38kHz'],[2,'120kHz'],[4,'200kHz']] where the
        # first element is meant to be the channel representing the
        # frequency. This is an internal object. Do not interfere.
        self.frequency_map = []
        # For each frequency 'j'.
        for j in self.simple_frequency_list:
            # Check each channel 'i'.
            for i in range(len(self.Sv.Sv)):
                channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
                # If the channel description contains "ES" then it is an
                # ES channel.
                if "ES" in channel_desc:
                    numeric_frequency_desc = (
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split("ES")[1]
                        .split("-")[0]
                        .strip()
                    )
                    if numeric_frequency_desc == j.split("kHz")[0].strip():
                        self.frequency_map.append(
                            [i, numeric_frequency_desc + " kHz"]
                        )
                # If the channel description contains "GPT" then it is a
                # GPT channel.
                if "GPT" in channel_desc:
                    numeric_frequency_desc = (
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("GPT")[1]
                        .strip()
                    )
                    # To see if the channel associates with the
                    # frequency 'j' .
                    if numeric_frequency_desc == j.split("kHz")[0].strip():
                        # If so append it and the channel to the
                        # 'frequency_list'.
                        self.frequency_map.append(
                            [i, numeric_frequency_desc + " kHz"]
                        )
    else:

        channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
        # If the channel description contains "ES" then it is an ES
        # channel.
        if "ES" in channel_desc:
            for i in self.channel_list:
                self.frequency_map.append(
                    [
                        i,
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("ES")[1]
                        .strip()
                        + " kHz",
                    ]
                )
        # If the channel description contains "GPT" then it is a
        # GPT channel.
        if "GPT" in channel_desc:
            for i in self.channel_list:
                self.frequency_map.append(
                    [
                        i,
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("GPT")[1]
                        .strip()
                        + " kHz",
                    ]
                )

    # Remove duplicates from frequency_list.
    self.frequency_map = [
        list(t) for t in set(tuple(item) for item in self.frequency_map)
    ]

construct_frequency_pair_combination_list()

Returns a list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Returns:

Type Description
List[Tuple]

list: A list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Source code in src\aalibrary\utils\frequency_data.py
def construct_frequency_pair_combination_list(self) -> List[Tuple]:
    """Returns a list of tuple elements containing frequency combinations
    which is useful for the KMeansOperator class.

    Returns:
        list<tuple>: A list of tuple elements containing frequency
            combinations which is useful for the KMeansOperator class.
    """
    # Returns a list of tuple elements containing frequency combinations
    # which is useful for the KMeansOperator class.
    return list(itertools.combinations(self.frequency_list, 2))

construct_frequency_set_combination_list()

Constructs a list of available frequency set permutations. Example : [ ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',), ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]

Returns:

Type Description
List[Tuple]

list: A list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Source code in src\aalibrary\utils\frequency_data.py
def construct_frequency_set_combination_list(self) -> List[Tuple]:
    """Constructs a list of available frequency set permutations.
    Example : [
        ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
        ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
        ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
        ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'),
        ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'),
        ('38 kHz', '120 kHz', '200 kHz'),
        ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]


    Returns:
        list<tuple>: A list of tuple elements containing frequency
            combinations which is useful for the KMeansOperator class.
    """
    # Returns a list of tuple elements containing frequency combinations
    # which is useful for the KMeansOperator class.
    return list(self.powerset(self.frequency_list))

powerset(iterable)

Generates combinations of elements of iterables ; powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

Parameters:

Name Type Description Default
iterable _type_

A list.

required

Returns combinations of elements of iterables.

Source code in src\aalibrary\utils\frequency_data.py
def powerset(self, iterable):
    """Generates combinations of elements of iterables ;
    powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

    Args:
        iterable (_type_): A list.

    Returns combinations of elements of iterables.
    """
    # Make a list from the iterable.
    s = list(iterable)
    # Returns a list of tuple elements containing combinations of elements
    # which derived from the iterable object.
    return chain.from_iterable(
        combinations(s, r) for r in range(len(s) + 1)
    )

print_frequency_list()

Prints each frequency element available in Sv.

Source code in src\aalibrary\utils\frequency_data.py
def print_frequency_list(self):
    """Prints each frequency element available in Sv."""
    # For each frequency in the frequency_list associated with Sv.
    for i in self.frequency_list:
        # Print out the associated frequency.
        print(i)

print_frequency_pair_combination_list()

Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py
def print_frequency_pair_combination_list(self):
    """Prints frequency combination list one element at a time."""

    # For each frequency combination associated with Sv.
    for i in self.frequency_pair_combination_list:
        # Print out frequency combination tuple.
        print(i)

print_frequency_set_combination_list()

Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py
def print_frequency_set_combination_list(self):
    """Prints frequency combination list one element at a time."""

    for (
        i
    ) in (
        self.frequency_set_combination_list
    ):  # For each frequency combination associated with Sv.
        print(i)  # Print out frequency combination tuple.

main()

Opens a sample netCDF file and constructs a FrequencyData object to extract frequency information from it.

Source code in src\aalibrary\utils\frequency_data.py
def main():
    """Opens a sample netCDF file and constructs a FrequencyData object to
    extract frequency information from it."""

    input_path = "/home/mryan/Desktop/HB1603_L1-D20160707-T190150.nc"
    ed = ep.open_converted(input_path)
    Sv = ep.calibrate.compute_Sv(ed)

    freq_data = FrequencyData(Sv)
    logger.debug(freq_data.frequency_map)

gcp_utils

This file contains code pertaining to auxiliary functions related to parsing through our google storage bucket.

Functions:

Name Description
get_all_echosounders_in_a_survey_in_storage_bucket

Gets all of the echosounders in a survey in a GCP storage bucket.

get_all_ship_names_in_gcp_bucket

Gets all of the ship names within a GCP storage bucket.

get_all_survey_names_from_a_ship_in_storage_bucket

Gets all of the survey names from a particular ship in a GCP storage

get_all_surveys_in_storage_bucket

Gets all of the surveys in a GCP storage bucket.

get_all_echosounders_in_a_survey_in_storage_bucket(ship_name='', survey_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

Gets all of the echosounders in a survey in a GCP storage bucket.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Will get normalized to GCP standards. Defaults to None.

''
survey_name str

The survey name/identifier. Defaults to "".

''
project_id str

The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'
gcp_bucket_name str

The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".

'ggn-nmfs-aa-dev-1-data'
gcp_bucket bucket

The GCP storage bucket client object. If none, one will be created for you based on the project_id and gcp_bucket_name. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings containing the echosounder names that exist in a survey.

Source code in src\aalibrary\utils\gcp_utils.py
def get_all_echosounders_in_a_survey_in_storage_bucket(
    ship_name: str = "",
    survey_name: str = "",
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the echosounders in a survey in a GCP storage bucket.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Will get normalized to GCP standards. Defaults to None.
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the echosounder names that
            exist in a survey.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    # Normalize the ship name.
    ship_name = normalize_ship_name(ship_name=ship_name)
    # Search all possible directories for ship surveys
    prefixes = [
        f"HDD/{ship_name}/{survey_name}/",
        f"NCEI/{ship_name}/{survey_name}/",
        f"OMAO/{ship_name}/{survey_name}/",
        f"TEST/{ship_name}/{survey_name}/",
    ]
    all_subfolder_names = set()
    all_echosounders = set()
    # Get all subfolders from this survey, whichever directory it resides in.
    for prefix in prefixes:
        subfolder_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_subfolder_names.update(subfolder_names)
    # Filter out any folder that is not an echosounder.
    for folder_name in list(all_subfolder_names):
        if (
            ("calibration" not in folder_name.lower())
            and ("metadata" not in folder_name.lower())
            and ("json" not in folder_name.lower())
            and ("doc" not in folder_name.lower())
        ):
            # Use 'add' since each 'folder_name' is a string.
            all_echosounders.add(folder_name)

    return list(all_echosounders)

get_all_ship_names_in_gcp_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

Gets all of the ship names within a GCP storage bucket.

Parameters:

Name Type Description Default
project_id str

The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'
gcp_bucket_name str

The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".

'ggn-nmfs-aa-dev-1-data'
gcp_bucket bucket

The GCP storage bucket client object. If none, one will be created for you based on the project_id and gcp_bucket_name. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False. NOTE: You can set this parameter to True if you would like to see which folders contain which ships. For example: Reuben Lasker can have data coming from both OMAO and local upload HDD. It will look like: {'OMAO/Reuben_Lasker/', 'HDD/Reuben_Lasker/'}

False

Returns:

Type Description
List[str]

List[str]: A list of strings containing the ship names.

Source code in src\aalibrary\utils\gcp_utils.py
def get_all_ship_names_in_gcp_bucket(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the ship names within a GCP storage bucket.

    Args:
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
            NOTE: You can set this parameter to `True` if you would like to see
            which folders contain which ships.
            For example: Reuben Lasker can have data coming from both OMAO and
            local upload HDD. It will look like:
            {'OMAO/Reuben_Lasker/', 'HDD/Reuben_Lasker/'}

    Returns:
        List[str]: A list of strings containing the ship names.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )
    # Get the initial subdirs
    prefixes = ["HDD/", "NCEI/", "OMAO/", "TEST/"]
    all_ship_names = set()
    for prefix in prefixes:
        ship_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_ship_names.update(ship_names)

    return list(all_ship_names)

get_all_survey_names_from_a_ship_in_storage_bucket(ship_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

Gets all of the survey names from a particular ship in a GCP storage bucket.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Will get normalized to GCP standards. Defaults to None.

''
project_id str

The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'
gcp_bucket_name str

The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".

'ggn-nmfs-aa-dev-1-data'
gcp_bucket bucket

The GCP storage bucket client object. If none, one will be created for you based on the project_id and gcp_bucket_name. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings containing the survey names.

Source code in src\aalibrary\utils\gcp_utils.py
def get_all_survey_names_from_a_ship_in_storage_bucket(
    ship_name: str = "",
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the survey names from a particular ship in a GCP storage
    bucket.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Will get normalized to GCP standards. Defaults to None.
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the survey names.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    # Normalize the ship name.
    ship_name = normalize_ship_name(ship_name=ship_name)
    # Search all possible directories for ship surveys
    prefixes = [
        f"HDD/{ship_name}/",
        f"NCEI/{ship_name}/",
        f"OMAO/{ship_name}/",
        f"TEST/{ship_name}/",
    ]
    all_survey_names = set()
    for prefix in prefixes:
        survey_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_survey_names.update(survey_names)

    return list(all_survey_names)

get_all_surveys_in_storage_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

Gets all of the surveys in a GCP storage bucket.

Parameters:

Name Type Description Default
project_id str

The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".

'ggn-nmfs-aa-dev-1'
gcp_bucket_name str

The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".

'ggn-nmfs-aa-dev-1-data'
gcp_bucket bucket

The GCP storage bucket client object. If none, one will be created for you based on the project_id and gcp_bucket_name. Defaults to None.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings containing the survey names.

Source code in src\aalibrary\utils\gcp_utils.py
def get_all_surveys_in_storage_bucket(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the surveys in a GCP storage bucket.

    Args:
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the survey names.
    """

    if gcp_bucket is None:
        _, gcp_bucket_name, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    all_ship_prefixes = get_all_ship_names_in_gcp_bucket(
        project_id=project_id,
        gcp_bucket_name=gcp_bucket_name,
        gcp_bucket=gcp_bucket,
        return_full_paths=True,
    )
    all_surveys = set()
    for ship_prefix in all_ship_prefixes:
        # Get surveys from each ship prefix
        ship_surveys = list_all_folders_in_gcp_bucket_location(
            location=ship_prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_surveys.update(ship_surveys)

    return list(all_surveys)

helpers

For helper functions.

Functions:

Name Description
check_for_assertion_errors

Checks for errors in the kwargs provided.

create_azure_config_file

Creates an empty config file for azure storage keys.

get_all_objects_in_survey_from_ncei

Gets all of the object keys from a ship survey from the NCEI database.

get_all_ship_objects_from_ncei

Gets all of the object keys from a ship from the NCEI database.

get_file_name_from_url

Extracts the file name from a given storage bucket url. Includes the

get_file_paths_via_json_link

This function helps in getting the links from a json request, parsing

get_netcdf_gcp_location_from_raw_gcp_location

Gets the netcdf location of a raw file within GCP.

normalize_ship_name

Normalizes a ship's name. This is necessary for creating a deterministic

parse_correct_gcp_storage_bucket_location

Calculates the correct gcp storage location based on data source, file

parse_variables_from_ncei_file_url

Gets the file variables associated with a file url in NCEI.

check_for_assertion_errors(**kwargs)

Checks for errors in the kwargs provided.

Source code in src\aalibrary\utils\helpers.py
def check_for_assertion_errors(**kwargs):
    """Checks for errors in the kwargs provided."""

    if "file_name" in kwargs:
        assert kwargs["file_name"] != "", (
            "Please provide a valid file name with the file extension"
            " (ex. `2107RL_CW-D20210813-T220732.raw`)"
        )
    if "file_type" in kwargs:
        assert kwargs["file_type"] != "", "Please provide a valid file type."
        assert kwargs["file_type"] in config.VALID_FILETYPES, (
            "Please provide a valid file type (extension) "
            f"from the following: {config.VALID_FILETYPES}"
        )
    if "ship_name" in kwargs:
        assert kwargs["ship_name"] != "", (
            "Please provide a valid ship name "
            "(Title_Case_With_Underscores_As_Spaces)."
        )
    if "survey_name" in kwargs:
        assert (
            kwargs["survey_name"] != ""
        ), "Please provide a valid survey name."
    if "echosounder" in kwargs:
        assert (
            kwargs["echosounder"] != ""
        ), "Please provide a valid echosounder."
        assert kwargs["echosounder"] in config.VALID_ECHOSOUNDERS, (
            "Please provide a valid echosounder from the "
            f"following: {config.VALID_ECHOSOUNDERS}"
        )
    if "data_source" in kwargs:
        assert kwargs["data_source"] != "", (
            "Please provide a valid data source from the "
            f"following: {config.VALID_DATA_SOURCES}"
        )
        assert kwargs["data_source"] in config.VALID_DATA_SOURCES, (
            "Please provide a valid data source from the "
            f"following: {config.VALID_DATA_SOURCES}"
        )
    if "file_download_directory" in kwargs:
        assert (
            kwargs["file_download_directory"] != ""
        ), "Please provide a valid file download directory."
        assert os.path.isdir(kwargs["file_download_directory"]), (
            f"File download location `{kwargs['file_download_directory']}` is"
            " not found to be a valid dir, please reformat it."
        )
    if "gcp_bucket" in kwargs:
        assert kwargs["gcp_bucket"] is not None, (
            "Please provide a gcp_bucket object with"
            " `utils.cloud_utils.setup_gcp_storage()`"
        )
    if "directory" in kwargs:
        assert kwargs["directory"] != "", "Please provide a valid directory."
        assert os.path.isdir(kwargs["directory"]), (
            f"Directory location `{kwargs['directory']}` is not found to be a"
            " valid dir, please reformat it."
        )
    if "data_lake_directory_client" in kwargs:
        assert kwargs["data_lake_directory_client"] is not None, (
            f"The data lake directory client cannot be a"
            f" {type(kwargs['data_lake_directory_client'])} object. It needs "
            "to be of the type `DataLakeDirectoryClient`."
        )

create_azure_config_file(download_directory='')

Creates an empty config file for azure storage keys.

Parameters:

Name Type Description Default
download_directory str

The directory to store the azure config file. Defaults to "".

''
Source code in src\aalibrary\utils\helpers.py
def create_azure_config_file(download_directory: str = ""):
    """Creates an empty config file for azure storage keys.

    Args:
        download_directory (str, optional): The directory to store the
            azure config file. Defaults to "".
    """

    assert (
        download_directory != ""
    ), "Please provide a valid download directory."
    download_directory = os.path.normpath(download_directory)
    assert os.path.isdir(download_directory), (
        f"Directory location `{download_directory}` is not found to be a"
        " valid dir, please reformat it."
    )

    azure_config_file_path = os.path.join(
        download_directory, "azure_config.ini"
    )

    empty_config_str = """[DEFAULT]
azure_storage_account_name = 
azure_storage_account_key = 
azure_account_url = 
azure_connection_string = """

    with open(
        azure_config_file_path, "w", encoding="utf-8"
    ) as azure_config_file:
        azure_config_file.write(empty_config_str)

    print(
        f"Please fill out the azure config file at: {azure_config_file_path}"
    )
    return azure_config_file_path

get_all_objects_in_survey_from_ncei(ship_name='', survey_name='', s3_bucket=None)

Gets all of the object keys from a ship survey from the NCEI database.

Parameters:

Name Type Description Default
ship_name str

The name of the ship. Must be title-case and have spaces substituted for underscores. Defaults to "".

''
survey_name str

The name of the survey. Must match what we have in the NCEI database. Defaults to "".

''
s3_bucket resource

The boto3 bucket resource for the bucket that the ship data resides in. Defaults to None.

None

Returns:

Type Description
List[str]

List[str]: A list of strings. Each one being an object key (path to the object inside of the bucket).

Source code in src\aalibrary\utils\helpers.py
def get_all_objects_in_survey_from_ncei(
    ship_name: str = "",
    survey_name: str = "",
    s3_bucket: boto3.resource = None,
) -> List[str]:
    """Gets all of the object keys from a ship survey from the NCEI database.

    Args:
        ship_name (str, optional): The name of the ship. Must be title-case
            and have spaces substituted for underscores. Defaults to "".
        survey_name (str, optional): The name of the survey. Must match what
            we have in the NCEI database. Defaults to "".
        s3_bucket (boto3.resource, optional): The boto3 bucket resource for
            the bucket that the ship data resides in. Defaults to None.

    Returns:
        List[str]: A list of strings. Each one being an object key (path to
            the object inside of the bucket).
    """

    assert ship_name != "", (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert " " not in ship_name, (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert survey_name != "", "Please provide a valid survey name."
    assert s3_bucket is not None, "Please pass in a boto3 bucket object."

    survey_objects = []

    for obj in s3_bucket.objects.filter(
        Prefix=f"data/raw/{ship_name}/{survey_name}"
    ):
        survey_objects.append(obj.key)

    return survey_objects

get_all_ship_objects_from_ncei(ship_name='', bucket=None)

Gets all of the object keys from a ship from the NCEI database.

Parameters:

Name Type Description Default
ship_name str

The name of the ship. Must be title-case and have spaces substituted for underscores. Defaults to "".

''
bucket resource

The boto3 bucket resource for the bucket that the ship data resides in. Defaults to None.

None

Returns:

Type Description
List[str]

List[str]: A list of strings. Each one being an object key (path to the object inside of the bucket).

Source code in src\aalibrary\utils\helpers.py
def get_all_ship_objects_from_ncei(
    ship_name: str = "", bucket: boto3.resource = None
) -> List[str]:
    """Gets all of the object keys from a ship from the NCEI database.

    Args:
        ship_name (str, optional): The name of the ship. Must be title-case
            and have spaces substituted for underscores. Defaults to "".
        bucket (boto3.resource, optional): The boto3 bucket resource for the
            bucket that the ship data resides in. Defaults to None.

    Returns:
        List[str]: A list of strings. Each one being an object key (path to
            the object inside of the bucket).
    """

    assert ship_name != "", (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert " " not in ship_name, (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert bucket is not None, "Please pass in a boto3 bucket object."

    ship_objects = []

    for obj in bucket.objects.filter(Prefix=f"data/raw/{ship_name}"):
        ship_objects.append(obj.key)

    return ship_objects

get_file_name_from_url(url='')

Extracts the file name from a given storage bucket url. Includes the file extension.

Parameters:

Name Type Description Default
url str

The full url of the storage object. Defaults to "". Example: "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_La sker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw"

''

Returns:

Name Type Description
str str

The file name. Example: 2107RL_CW-D20210813-T220732.raw

Source code in src\aalibrary\utils\helpers.py
def get_file_name_from_url(url: str = "") -> str:
    """Extracts the file name from a given storage bucket url. Includes the
    file extension.

    Args:
        url (str, optional): The full url of the storage object.
            Defaults to "".
            Example: "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_La
                      sker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw"

    Returns:
        str: The file name. Example: 2107RL_CW-D20210813-T220732.raw
    """

    return url.split("/")[-1]

This function helps in getting the links from a json request, parsing the contents of that url into a json object. The output is a json of the filename, and the cloud path link (s3 bucket link). Code from: https://www.ngdc.noaa.gov/mgg/wcd/S3_download.html

Parameters:

Name Type Description Default
link str

The link to the json url. Defaults to "".

''
Source code in src\aalibrary\utils\helpers.py
def get_file_paths_via_json_link(link: str = ""):
    """This function helps in getting the links from a json request, parsing
    the contents of that url into a json object. The output is a json of the
    filename, and the cloud path link (s3 bucket link).
    Code from: https://www.ngdc.noaa.gov/mgg/wcd/S3_download.html

    Args:
        link (str, optional): The link to the json url. Defaults to "".
    """

    url = requests.get(link, timeout=10)
    text = url.text
    contents = json.loads(text)
    for k in contents.keys():
        print(k)
    for i in contents["features"]:
        file_name = i["attributes"]["FILE_NAME"]
        cloud_path = i["attributes"]["CLOUD_PATH"]
        if cloud_path:
            print(f"{file_name}, {cloud_path}")

get_netcdf_gcp_location_from_raw_gcp_location(gcp_storage_bucket_location='')

Gets the netcdf location of a raw file within GCP.

Source code in src\aalibrary\utils\helpers.py
def get_netcdf_gcp_location_from_raw_gcp_location(
    gcp_storage_bucket_location: str = "",
):
    """Gets the netcdf location of a raw file within GCP."""

    gcp_storage_bucket_location = gcp_storage_bucket_location.replace(
        "/raw/", "/netcdf/"
    )
    # get rid of file extension and replace with netcdf
    netcdf_gcp_storage_bucket_location = (
        ".".join(gcp_storage_bucket_location.split(".")[:-1]) + ".nc"
    )

    return netcdf_gcp_storage_bucket_location

normalize_ship_name(ship_name='')

Normalizes a ship's name. This is necessary for creating a deterministic file structure within our GCP storage bucket. The ship name is returned as a Title_Cased_And_Snake_Cased ship name, with no punctuation. Ex. HENRY B. BIGELOW will return Henry_B_Bigelow

Parameters:

Name Type Description Default
ship_name str

The ship name string. Defaults to "".

''

Returns:

Name Type Description
str str

The formatted and normalized version of the ship name.

Source code in src\aalibrary\utils\helpers.py
def normalize_ship_name(ship_name: str = "") -> str:
    """Normalizes a ship's name. This is necessary for creating a deterministic
    file structure within our GCP storage bucket.
    The ship name is returned as a Title_Cased_And_Snake_Cased ship name, with
    no punctuation.
    Ex. `HENRY B. BIGELOW` will return `Henry_B_Bigelow`

    Args:
        ship_name (str, optional): The ship name string. Defaults to "".

    Returns:
        str: The formatted and normalized version of the ship name.
    """

    # Lower case the string
    ship_name = ship_name.lower()
    # Un-normalize (replace `_` with ` ` to help further processing)
    # In the edge-case that users include an underscore.
    ship_name = ship_name.replace("_", " ")
    # Remove all punctuation.
    ship_name = "".join(
        [char for char in ship_name if char not in string.punctuation]
    )
    # Title-case it
    ship_name = ship_name.title()
    # Snake-case it
    ship_name = ship_name.replace(" ", "_")

    return ship_name

parse_correct_gcp_storage_bucket_location(file_name='', file_type='', ship_name='', survey_name='', echosounder='', data_source='', is_metadata=False, is_survey_metadata=False, debug=False)

Calculates the correct gcp storage location based on data source, file type, and if the file is metadata or not.

Parameters:

Name Type Description Default
file_name str

The file name (includes extension). Defaults to "".

''
file_type str

The file type (not include the dot "."). Defaults to "".

''
ship_name str

The ship name associated with this survey. Defaults to "".

''
survey_name str

The survey name/identifier. Defaults to "".

''
echosounder str

The echosounder used to gather the data. Defaults to "".

''
data_source str

The source of the data. Can be one of ["NCEI", "OMAO"]. Defaults to "".

''
is_metadata bool

Whether or not the file is a metadata file. Necessary since files that are considered metadata (metadata json, or readmes) are stored in a separate directory. Defaults to False.

False
is_survey_metadata bool

Whether or not the file is a metadata file associated with a survey. The files are stored at the survey level, in the metadata/ folder. Defaults to False.

False
debug bool

Whether or not to print debug statements. Defaults to False.

False

Returns:

Name Type Description
str str

The correctly parsed GCP storage bucket location.

Source code in src\aalibrary\utils\helpers.py
def parse_correct_gcp_storage_bucket_location(
    file_name: str = "",
    file_type: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    is_metadata: bool = False,
    is_survey_metadata: bool = False,
    debug: bool = False,
) -> str:
    """Calculates the correct gcp storage location based on data source, file
    type, and if the file is metadata or not.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the data. Can be one of
            ["NCEI", "OMAO"]. Defaults to "".
        is_metadata (bool, optional): Whether or not the file is a metadata
            file. Necessary since files that are considered metadata (metadata
            json, or readmes) are stored in a separate directory. Defaults to
            False.
        is_survey_metadata (bool, optional): Whether or not the file is a
            metadata file associated with a survey. The files are stored at
            the survey level, in the `metadata/` folder. Defaults to False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        str: The correctly parsed GCP storage bucket location.
    """

    assert (
        (is_metadata and is_survey_metadata is False)
        or (is_metadata is False and is_survey_metadata)
        or (is_metadata is False and is_survey_metadata is False)
    ), (
        "Please make sure that only one of `is_metadata` and"
        " `is_survey_metadata` is True. Or you can set both to False."
    )

    # Creating the correct upload location
    if is_survey_metadata:
        gcp_storage_bucket_location = (
            f"{data_source}/{ship_name}/{survey_name}/metadata/{file_name}"
        )
    elif is_metadata:
        gcp_storage_bucket_location = (
            f"{data_source}/{ship_name}/{survey_name}/{echosounder}/metadata/"
        )
        # Figure out if its a raw or idx file (belongs in raw folder)
        if file_type.lower() in config.RAW_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                gcp_storage_bucket_location + f"raw/{file_name}.json"
            )
        elif file_type.lower() in config.CONVERTED_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                gcp_storage_bucket_location + f"netcdf/{file_name}.json"
            )
    else:
        # Figure out if its a raw or idx file (belongs in raw folder)
        if file_type.lower() in config.RAW_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                f"{data_source}/{ship_name}/"
                f"{survey_name}/{echosounder}/data/raw/{file_name}"
            )
        elif file_type.lower() in config.CONVERTED_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                f"{data_source}/{ship_name}/"
                f"{survey_name}/{echosounder}/data/netcdf/{file_name}"
            )

    if debug:
        logging.debug(
            "PARSED GCP_STORAGE_BUCKET_LOCATION: %s",
            gcp_storage_bucket_location,
        )

    return gcp_storage_bucket_location

parse_variables_from_ncei_file_url(url='')

Gets the file variables associated with a file url in NCEI. File urls in NCEI follow this template: data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}

NOTE: file_name will include the extension.

Source code in src\aalibrary\utils\helpers.py
def parse_variables_from_ncei_file_url(url: str = ""):
    """Gets the file variables associated with a file url in NCEI.
    File urls in NCEI follow this template:
    data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}

    NOTE: file_name will include the extension."""

    file_name = get_file_name_from_url(url=url)
    file_type = file_name.split(".")[-1]
    echosounder = url.split("/")[-2]
    survey_name = url.split("/")[-3]
    ship_name = url.split("/")[-4]

    return file_name, file_type, echosounder, survey_name, ship_name

ices

Functions:

Name Description
correct_dimensions_ices

Extracts angle data from echopype DataArray.

echopype_ek60_raw_to_ices_netcdf

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

echopype_ek80_raw_to_ices_netcdf

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

ragged_data_type_ices

Transforms a gridded 4 dimensional variable from an Echodata object

write_ek60_beamgroup_to_netcdf

Writes echopype Beam_group ds to a Beam_groupX netcdf file.

write_ek80_beamgroup_to_netcdf

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

correct_dimensions_ices(echodata, variable_name='')

Extracts angle data from echopype DataArray.

Args: echodata (echopype.DataArray): Echopype echodata object containing data. variable_name (str): The name of the variable that needs to be transformed to a ragged array representation.

Returns: np.array that returns array with correct dimension as specified by ICES netcdf convention.

Source code in src\aalibrary\utils\ices.py
def correct_dimensions_ices(echodata, variable_name: str = "") -> np.ndarray:
    """Extracts angle data from echopype DataArray.

    Args:
    echodata (echopype.DataArray): Echopype echodata object containing data.
    variable_name (str): The name of the variable that needs to be transformed to
    a ragged array representation.

    Returns:
    np.array that returns array with correct dimension as specified by ICES netcdf convention.
    """
    num_pings = echodata["Sonar/Beam_group1"].sizes["ping_time"]
    num_channels = echodata["Sonar/Beam_group1"].sizes["channel"]

    compliant_np = np.empty((num_pings, num_channels))

    for ping_time_val in range(num_pings):
        compliant_np[ping_time_val, :] = (
            echodata["Sonar/Beam_group1"][variable_name]
            .values.transpose()
            .astype(np.float32)
        )

    return compliant_np

echopype_ek60_raw_to_ices_netcdf(echodata, export_file)

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py
def echopype_ek60_raw_to_ices_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """

    engine = "netcdf4"

    output_file = validate_output_path(
        source_file=echodata.source_file,
        engine=engine,
        save_path=export_file,
        output_storage_options={},
    )

    save_file(
        echodata["Top-level"],
        path=output_file,
        mode="w",
        engine=engine,
        compression_settings=COMPRESSION_SETTINGS[engine],
    )
    save_file(
        echodata["Environment"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Environment",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )
    save_file(
        echodata["Platform"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    save_file(
        echodata["Platform/NMEA"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform/NMEA",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    save_file(
        echodata["Sonar"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Sonar",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    echopype_ek60_raw_to_ices_netcdf(echodata, output_file)

    save_file(
        echodata["Vendor_specific"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Vendor_specific",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

echopype_ek80_raw_to_ices_netcdf(echodata, export_file)

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py
def echopype_ek80_raw_to_ices_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """
    engine = "netcdf4"

    output_file = validate_output_path(
        source_file=echodata.source_file,
        engine=engine,
        save_path=export_file,
        output_storage_options={},
    )

    save_file(
        echodata["Top-level"],
        path=output_file,
        mode="w",
        engine=engine,
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Environment"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Environment",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Platform"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Platform/NMEA"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform/NMEA",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Sonar"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Sonar",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    write_ek80_beamgroup_to_netcdf(echodata, output_file)
    save_file(
        echodata["Vendor_specific"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Vendor_specific",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )

ragged_data_type_ices(echodata, variable_name='')

Transforms a gridded 4 dimensional variable from an Echodata object into a ragged array representation.

Args: echodata (echopype.Echodata): Echopype echodata object containing a variable in the Beam_group1. variable_name (str): The name of the variable that needs to be transformed to a ragged array representation.

Returns: ICES complain np array of type object.

Source code in src\aalibrary\utils\ices.py
def ragged_data_type_ices(echodata, variable_name: str = "") -> np.ndarray:
    """Transforms a gridded 4 dimensional variable from an Echodata object
    into a ragged array representation.

    Args:
    echodata (echopype.Echodata): Echopype echodata object containing a variable in the Beam_group1.
    variable_name (str): The name of the variable that needs to be transformed to
    a ragged array representation.

    Returns:
    ICES complain np array of type object.
    """

    num_pings = echodata["Sonar/Beam_group1"].sizes["ping_time"]
    num_channels = echodata["Sonar/Beam_group1"].sizes["channel"]
    num_beam = echodata["Sonar/Beam_group1"].sizes["beam"]

    compliant_np = np.empty((num_pings, num_channels, num_beam), object)

    for c, channel in enumerate(
        echodata["Sonar/Beam_group1"][variable_name].coords["channel"].values
    ):

        test = echodata["Sonar/Beam_group1"][variable_name].sel(channel=channel)

        # Find the first index along 'range_sample' where all values are NaN across 'beam'
        is_nan_across_beam = test.isnull().all(dim="beam")

        # Find the first index along 'range_sample' where 'is_nan_across_beam' is True
        first_nan_range_sample_indices = xr.apply_ufunc(
            np.argmax,
            is_nan_across_beam,
            input_core_dims=[["range_sample"]],
            exclude_dims=set(("range_sample",)),
            vectorize=True,  # Apply the function row-wise for each ping_time
            dask="parallelized",
            output_dtypes=[int],
        )

        found_nan_block_mask = is_nan_across_beam.isel(
            range_sample=first_nan_range_sample_indices.clip(min=0)
        )

        sample_t = []

        # Iterate through ping_time to populate sample_t
        for i, _ in enumerate(test["ping_time"].values):
            if found_nan_block_mask.isel(ping_time=i):
                value_to_append = (
                    test["range_sample"].values[
                        first_nan_range_sample_indices.isel(ping_time=i).item()
                    ]
                    - 1
                )
                sample_t.append(value_to_append)
            else:
                # If no all-NaN block was found, append the last range_sample index
                sample_t.append(test["range_sample"].values[-1])
        sample_t = np.array(sample_t)

        all_ping_segments = []

        for i, ping_da in enumerate(test):
            segment = ping_da.isel(range_sample=slice(sample_t[i])).values.transpose()
            all_ping_segments.append(segment)

        for i in range(len(compliant_np)):
            for j in range(4):
                compliant_np[i, c, j] = all_ping_segments[i][j].astype(np.float32)

    return compliant_np

write_ek60_beamgroup_to_netcdf(echodata, export_file)

Writes echopype Beam_group ds to a Beam_groupX netcdf file.

Parameters: ed (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the output NetCDF file.

Source code in src\aalibrary\utils\ices.py
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
def write_ek60_beamgroup_to_netcdf(echodata, export_file):
    """
    Writes echopype Beam_group ds to a Beam_groupX netcdf file.

    Parameters:
    ed (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the output NetCDF file.
    """
    ragged_backscatter_r_data = ragged_data_type_ices(echodata, "backscatter_r")
    beamwidth_receive_major_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_athwartship"
    )
    beamwidth_receive_minor_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_alongship"
    )
    echoangle_major_data = ragged_data_type_ices(echodata, "angle_athwartship")
    echoangle_minor_data = ragged_data_type_ices(echodata, "angle_alongship")
    equivalent_beam_angle_data = correct_dimensions_ices(
        echodata, "equivalent_beam_angle"
    )
    rx_beam_rotation_phi_data = (
        ragged_data_type_ices(echodata, "angle_athwartship") * -1
    )
    rx_beam_rotation_psi_data = np.zeros(
        (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
    )
    rx_beam_rotation_theta_data = ragged_data_type_ices(echodata, "angle_alongship")

    for i in range(echodata["Sonar/Beam_group1"].sizes["channel"]):

        with netCDF4.Dataset(export_file, "a", format="netcdf4") as ncfile:
            grp = ncfile.createGroup(f"Sonar/Beam_group{i+1}")
            grp.setncattr("beam_mode", echodata["Sonar/Beam_group1"].attrs["beam_mode"])
            grp.setncattr(
                "conversion_equation_type",
                echodata["Sonar/Beam_group1"].attrs["conversion_equation_t"],
            )
            grp.setncattr(
                "long_name", echodata["Sonar/Beam_group1"].coords["channel"].values[i]
            )

            # Create the VLEN type for 32-bit floats
            sample_t = grp.createVLType(np.float32, "sample_t")
            angle_t = grp.createVLType(np.float32, "angle_t")

            # Create ping_time dimension and ping_time coordinate variable
            grp.createDimension("ping_time", None)

            ping_time_var = grp.createVariable("ping_time", np.int64, ("ping_time",))
            ping_time_var.units = "nanoseconds since 1970-01-01 00:00:00Z"
            ping_time_var.standard_name = "time"
            ping_time_var.long_name = "Time-stamp of each ping"
            ping_time_var.axis = "T"
            ping_time_var.calendar = "gregorian"
            ping_time_var[:] = echodata["Sonar/Beam_group1"].coords[
                "ping_time"
            ].values - np.datetime64("1970-01-01T00:00:00Z")

            # Create beam dimension and coordinate variable
            grp.createDimension("beam", 1)

            beam_var = grp.createVariable("beam", "S1", ("beam",))
            beam_var.long_name = "Beam name"
            beam_var[:] = echodata["Sonar/Beam_group1"].coords["channel"].values[i]

            # Create backscatter_r variable
            backscatter_r = grp.createVariable(
                "backscatter_r", sample_t, ("ping_time", "beam")
            )
            backscatter_r[:] = ragged_backscatter_r_data[:, i]
            backscatter_r.setncattr(
                "long_name", "Raw backscatter measurements (real part)"
            )
            backscatter_r.units = "dB"

            # Create beam_stabilisation variable
            beam_stablisation = grp.createVariable(
                "beam_stablisation", int, ("ping_time", "beam")
            )
            beam_stablisation[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            beam_stablisation.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )

            # Create beam_type variable
            beam_type = grp.createVariable("beam_type", int, ())
            beam_type[:] = echodata["Sonar/Beam_group1"]["beam_type"].values[i]
            beam_type.setncattr("long_name", "type of transducer (0-single, 1-split)")

            # Create beamwidth_receive_major variable
            beamwidth_receive_major = grp.createVariable(
                "beamwidth_receive_major", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_major[:] = beamwidth_receive_major_data[:, i]
            beamwidth_receive_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_receive_major.units = "arc_degree"
            beamwidth_receive_major.valid_range = [0.0, 360.0]

            # Create beamwidth_receive_minor variable
            beamwidth_receive_minor = grp.createVariable(
                "beamwidth_receive_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_receive_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_receive_minor.units = "arc_degree"
            beamwidth_receive_minor.valid_range = [0.0, 360.0]

            beamwidth_transmit_major = grp.createVariable(
                "beamwidth_transmit_major", np.float32, ("ping_time", "beam")
            )
            # Create beamwidth_transmit_major variable
            beamwidth_transmit_major[:] = beamwidth_receive_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_transmit_major.units = "arc_degree"
            beamwidth_transmit_major.valid_range = [0.0, 360.0]

            # Create beamwidth_transmit_minor variable
            beamwidth_transmit_minor = grp.createVariable(
                "beamwidth_transmit_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_transmit_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_transmit_minor.units = "arc_degree"
            beamwidth_transmit_minor.valid_range = [0.0, 360.0]

            # Create blanking_interval variable
            blanking_interval = grp.createVariable(
                "blanking_interval", float, ("ping_time", "beam")
            )
            blanking_interval[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            blanking_interval.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )
            blanking_interval.units = "s"
            blanking_interval.valid_min = 0.0

            # Create calibrated_frequency variable
            calibrated_frequency = grp.createVariable(
                "calibrated_frequency", np.float64, ()
            )
            calibrated_frequency[:] = echodata["Sonar/Beam_group1"][
                "frequency_nominal"
            ].values[i]
            calibrated_frequency.setncattr("long_name", "Calibration gain frequencies")
            calibrated_frequency.units = "Hz"
            calibrated_frequency.valid_min = 0.0

            # Create echoangle_major variable (talk to joe about this)
            echoangle_major = grp.createVariable(
                "echoangle_major", angle_t, ("ping_time", "beam")
            )
            echoangle_major[:] = echoangle_major_data[:, i]
            echoangle_major.setncattr(
                "long_name", "Echo arrival angle in the major beam coordinate"
            )
            echoangle_major.units = "arc_degree"
            echoangle_major.valid_range = [-180.0, 180.0]

            # Create echoangle_minor variable
            echoangle_minor = grp.createVariable(
                "echoangle_minor", angle_t, ("ping_time", "beam")
            )
            echoangle_minor[:] = echoangle_minor_data[:, i]
            echoangle_minor.setncattr(
                "long_name", "Echo arrival angle in the minor beam coordinate"
            )
            echoangle_minor.units = "arc_degree"
            echoangle_minor.valid_range = [-180.0, 180.0]

            # Create echoangle_major sensitivity variable
            echoangle_major_sensitivity = grp.createVariable(
                "echoangle_major_sensitivityr", np.float64, ()
            )
            echoangle_major_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_athwartship"
            ].values[i]
            echoangle_major_sensitivity.setncattr(
                "long_name", "Major angle scaling factor"
            )
            echoangle_major_sensitivity.units = "1"
            echoangle_major_sensitivity.valid_min = 0.0

            # Create echoangle_minor sensitivity variable
            echoangle_minor_sensitivity = grp.createVariable(
                "echoangle_minor_sensitivity", np.float64, ()
            )
            echoangle_minor_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_alongship"
            ].values[i]
            echoangle_minor_sensitivity.setncattr(
                "long_name", "Minor angle scaling factor"
            )
            echoangle_minor_sensitivity.units = "1"
            echoangle_minor_sensitivity.valid_min = 0.0

            # Create equivalent_beam_angle variable (weird angle values)
            equivalent_beam_angle = grp.createVariable(
                "equivalent_beam_angle", np.float64, ("ping_time", "beam")
            )
            equivalent_beam_angle[:] = equivalent_beam_angle_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            equivalent_beam_angle.setncattr("long_name", "Equivalent beam angle")

            # Create frequency variable
            frequency = grp.createVariable("frequency", np.float64, ())
            frequency[:] = echodata["Sonar/Beam_group1"]["frequency_nominal"].values[i]
            frequency.setncattr("long_name", "Calibration gain frequencies")
            frequency.units = "Hz"
            frequency.valid_min = 0.0

            # Create non_quantitative_processing variable
            non_quantitative_processing = grp.createVariable(
                "non_quantitative_processing", int, ("ping_time")
            )
            non_quantitative_processing[:] = np.zeros(
                echodata["Sonar/Beam_group1"].sizes["ping_time"]
            )
            non_quantitative_processing.setncattr(
                "long_name",
                "Presence or not of non-quantitative processing applied to the backscattering data (sonar specific)",
            )

            # Create platoform_latitude variable
            platoform_latitude = grp.createVariable(
                "platoform_latitude", np.float64, ("ping_time")
            )
            platoform_latitude[:] = echodata["Platform"]["latitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platoform_latitude.setncattr(
                "long_name", "Heading of the platform at time of the ping"
            )
            platoform_latitude.units = "degrees_north"
            platoform_latitude.valid_range = [-180.0, 180.0]

            # Create platoform_longitude variable
            platoform_longitude = grp.createVariable(
                "platoform_longitude", np.float64, ("ping_time")
            )
            platoform_longitude[:] = echodata["Platform"]["longitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platoform_longitude.setncattr("long_name", "longitude")
            platoform_longitude.units = "degrees_east"
            platoform_longitude.valid_range = [-180.0, 180.0]

            # Create platoform_pitch variable
            platform_pitch = grp.createVariable(
                "platform_pitch", np.float64, ("ping_time")
            )
            platform_pitch[:] = echodata["Platform"]["pitch"].values
            platform_pitch.setncattr("long_name", "pitch_angle")
            platform_pitch.units = "arc_degree"
            platform_pitch.valid_range = [-90.0, 90.0]

            # Create platoform_roll variable
            platoform_roll = grp.createVariable(
                "platform_roll", np.float64, ("ping_time")
            )
            platoform_roll[:] = echodata["Platform"]["roll"].values
            platoform_roll.setncattr("long_name", "roll angle")
            platoform_roll.units = "arc_degree"

            # Create platoform_vertical_offset variable
            platoform_vertical_offset = grp.createVariable(
                "platoform_vertical_offset", np.float64, ("ping_time")
            )
            platoform_vertical_offset[:] = echodata["Platform"][
                "vertical_offset"
            ].values
            platoform_vertical_offset.setncattr(
                "long_name",
                "Platform vertical distance from reference point to the water line",
            )
            platoform_vertical_offset.units = "m"

            # Create rx_beam_rotation_phi variable
            rx_beam_rotation_phi = grp.createVariable(
                "rx_beam_rotation_phi", angle_t, ("ping_time", "beam")
            )
            rx_beam_rotation_phi[:] = rx_beam_rotation_phi_data[:, i]
            rx_beam_rotation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            rx_beam_rotation_phi.units = "arc_degree"
            rx_beam_rotation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            rx_beam_rotation_psi = grp.createVariable(
                "rx_beam_rotation_psi", np.float64, ("ping_time", "beam")
            )
            rx_beam_rotation_psi[:] = rx_beam_rotation_psi_data
            rx_beam_rotation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            rx_beam_rotation_psi.units = "arc_degree"
            rx_beam_rotation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            rx_beam_rotation_theta = grp.createVariable(
                "rx_beam_roation_theta", angle_t, ("ping_time", "beam")
            )
            rx_beam_rotation_theta[:] = rx_beam_rotation_theta_data[:, i]
            rx_beam_rotation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            rx_beam_rotation_theta.units = "arc_degree"
            rx_beam_rotation_theta.valid_range = [-90.0, 90.0]

            # Create sample_interval variable
            sample_interval = grp.createVariable(
                "sample_interval", np.float64, ("ping_time", "beam")
            )
            sample_interval[:] = (
                echodata["Sonar/Beam_group1"]["sample_interval"]
                .transpose()
                .values[:, i]
            )
            sample_interval.setncattr("long_name", "Equivalent beam angle")
            sample_interval.units = "s"
            sample_interval.valid_min = 0.0
            sample_interval.coordinates = (
                "ping_time platform_latitude platform_longitude"
            )

            # Create sample_time_offset variable
            sample_time_offset = grp.createVariable(
                "sample_time_offset", np.float64, ("ping_time", "beam")
            )
            sample_time_offset[:] = (
                echodata["Sonar/Beam_group1"]["sample_time_offset"]
                .transpose()
                .values[:, i]
            )
            sample_time_offset.setncattr(
                "long_name",
                "Time offset that is subtracted from the timestamp of each sample",
            )
            sample_time_offset.units = "s"

            # Create transmit_duration_nominal variable
            transmit_duration_nominal = grp.createVariable(
                "transmit_duration_nominal", np.float64, ("ping_time", "beam")
            )
            transmit_duration_nominal[:] = (
                echodata["Sonar/Beam_group1"]["transmit_duration_nominal"]
                .transpose()
                .values[:, i]
            )
            transmit_duration_nominal.setncattr(
                "long_name", "Nominal duration of transmitted pulse"
            )
            transmit_duration_nominal.units = "Hz"
            transmit_duration_nominal.valid_min = 0.0

            # Create transmit_frequency_start variable
            transmit_frequency_start = grp.createVariable(
                "transmit_frequency_start", np.float64, ("ping_time")
            )
            transmit_frequency_start[:] = echodata["Sonar/Beam_group1"][
                "transmit_frequency_start"
            ].values[i]
            transmit_frequency_start.setncattr(
                "long_name", "Start frequency in transmitted pulse"
            )
            transmit_frequency_start.units = "Hz"
            transmit_frequency_start.valid_min = 0.0

            # Create transmit_frequency_stop variable
            transmit_frequency_stop = grp.createVariable(
                "transmit_frequency_stop", np.float64, ("ping_time")
            )
            transmit_frequency_stop[:] = echodata["Sonar/Beam_group1"][
                "transmit_frequency_stop"
            ].values[i]
            transmit_frequency_stop.setncattr(
                "long_name", "Stop frequency in transmitted pulse"
            )
            transmit_frequency_stop.units = "Hz"
            transmit_frequency_stop.valid_min = 0.0

            # Create transmit_power variable
            transmit_power = grp.createVariable(
                "transmit_power", np.float64, ("ping_time", "beam")
            )
            transmit_power[:] = (
                echodata["Sonar/Beam_group1"]["transmit_power"].transpose().values[:, i]
            )
            transmit_power.setncattr("long_name", "Nominal transmit power")
            transmit_power.units = "W"
            transmit_power.valid_min = 0.0

            # Create transmit_type
            transmit_type = grp.createVariable("transmit_type", np.float64, ())
            transmit_type[:] = 0
            transmit_type.setncattr("long_name", "Type of transmitted pulse")

            # Create tx_beam_rotation_phi variable
            tx_beam_roation_phi = grp.createVariable(
                "tx_beam_roation_phi", angle_t, ("ping_time", "beam")
            )
            tx_beam_roation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            tx_beam_roation_phi.units = "arc_degree"
            tx_beam_roation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            tx_beam_roation_psi = grp.createVariable(
                "tx_beam_roation_psi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_psi[:] = rx_beam_rotation_psi_data
            tx_beam_roation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            tx_beam_roation_psi.units = "arc_degree"
            tx_beam_roation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            tx_beam_roation_theta = grp.createVariable(
                "tx_beam_roation_theta", angle_t, ("ping_time", "beam")
            )
            tx_beam_roation_theta[:] = rx_beam_rotation_theta_data[:, i]
            tx_beam_roation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            tx_beam_roation_theta.units = "arc_degree"
            tx_beam_roation_theta.valid_range = [-90.0, 90.0]

write_ek80_beamgroup_to_netcdf(echodata, export_file)

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
def write_ek80_beamgroup_to_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """
    ragged_backscatter_r_data = ragged_data_type_ices(echodata, "backscatter_r")
    ragged_backscatter_i_data = ragged_data_type_ices(echodata, "backscatter_i")
    beamwidth_receive_major_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_athwartship"
    )
    beamwidth_receive_minor_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_alongship"
    )
    echoangle_major_data = correct_dimensions_ices(echodata, "angle_offset_athwartship")
    echoangle_minor_data = correct_dimensions_ices(echodata, "angle_offset_alongship")
    equivalent_beam_angle_data = correct_dimensions_ices(
        echodata, "equivalent_beam_angle"
    )
    rx_beam_rotation_phi_data = (
        correct_dimensions_ices(echodata, "angle_offset_athwartship") * -1
    )
    rx_beam_rotation_psi_data = np.zeros(
        (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
    )
    rx_beam_rotation_theta_data = correct_dimensions_ices(
        echodata, "angle_offset_alongship"
    )

    for i in range(echodata["Sonar/Beam_group1"].sizes["channel"]):

        with netCDF4.Dataset(export_file, "a", format="netcdf4") as ncfile:
            grp = ncfile.createGroup(f"Sonar/Beam_group{i+1}")
            grp.setncattr("beam_mode", echodata["Sonar/Beam_group1"].attrs["beam_mode"])
            grp.setncattr(
                "conversion_equation_type",
                echodata["Sonar/Beam_group1"].attrs["conversion_equation_t"],
            )
            grp.setncattr(
                "long_name", echodata["Sonar/Beam_group1"].coords["channel"].values[i]
            )

            # Create the VLEN type for 32-bit floats
            sample_t = grp.createVLType(np.float32, "sample_t")

            # Create ping_time dimension and ping_time coordinate variable
            grp.createDimension("ping_time", None)

            ping_time_var = grp.createVariable("ping_time", np.int64, ("ping_time",))
            ping_time_var.units = "nanoseconds since 1970-01-01 00:00:00Z"
            ping_time_var.standard_name = "time"
            ping_time_var.long_name = "Time-stamp of each ping"
            ping_time_var.axis = "T"
            ping_time_var.calendar = "gregorian"
            ping_time_var[:] = echodata["Sonar/Beam_group1"].coords[
                "ping_time"
            ].values - np.datetime64("1970-01-01T00:00:00Z")

            # Create beam dimension and coordinate variable
            grp.createDimension("beam", 1)

            beam_var = grp.createVariable("beam", "S1", ("beam",))
            beam_var.long_name = "Beam name"
            beam_var[:] = echodata["Sonar/Beam_group1"].coords["channel"].values[i]

            # Create beam dimension and coordinate variable
            grp.createDimension("sub_beam", 4)

            sub_beam_var = grp.createVariable("sub_beam", np.int64, ("sub_beam",))
            sub_beam_var.long_name = "Beam quadrant number"
            sub_beam_var[:] = echodata["Sonar/Beam_group1"].coords["beam"].values

            # Create backscatter_r variable
            backscatter_r = grp.createVariable(
                "backscatter_r",
                sample_t,
                ("ping_time", "beam", "sub_beam"),
            )
            backscatter_r[:] = ragged_backscatter_r_data[:, i, :]
            backscatter_r.setncattr(
                "long_name", "Raw backscatter measurements (real part)"
            )
            backscatter_r.units = "dB"

            # Create backscatter_i variable
            backscatter_i = grp.createVariable(
                "backscatter_i", sample_t, ("ping_time", "beam", "sub_beam")
            )
            backscatter_i[:] = ragged_backscatter_i_data[:, i, :].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"],
                1,
                echodata["Sonar/Beam_group1"].sizes["beam"],
            )
            backscatter_i.setncattr(
                "long_name", "Raw backscatter measurements (imaginary part)"
            )
            backscatter_i.units = "dB"

            # Create beam_stabilisation variable
            beam_stablisation = grp.createVariable(
                "beam_stablisation", int, ("ping_time", "beam")
            )
            beam_stablisation[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            beam_stablisation.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )

            # Create beam_type variable
            beam_type = grp.createVariable("beam_type", int, ())
            beam_type[:] = echodata["Sonar/Beam_group1"]["beam_type"].values[i]
            beam_type.setncattr("long_name", "type of transducer (0-single, 1-split)")

            # Create beamwidth_receive_major variable
            beamwidth_receive_major = grp.createVariable(
                "beamwidth_receive_major", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_major[:] = beamwidth_receive_major_data[:, i]
            beamwidth_receive_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_receive_major.units = "arc_degree"
            beamwidth_receive_major.valid_range = [0.0, 360.0]

            # stopped here
            # Create beamwidth_receive_minor variable
            beamwidth_receive_minor = grp.createVariable(
                "beamwidth_receive_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_receive_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_receive_minor.units = "arc_degree"
            beamwidth_receive_minor.valid_range = [0.0, 360.0]

            beamwidth_transmit_major = grp.createVariable(
                "beamwidth_transmit_major", np.float32, ("ping_time", "beam")
            )
            # Create beamwidth_transmit_major variable
            beamwidth_transmit_major[:] = beamwidth_receive_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_transmit_major.units = "arc_degree"
            beamwidth_transmit_major.valid_range = [0.0, 360.0]

            # Create beamwidth_transmit_minor variable
            beamwidth_transmit_minor = grp.createVariable(
                "beamwidth_transmit_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_transmit_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_transmit_minor.units = "arc_degree"
            beamwidth_transmit_minor.valid_range = [0.0, 360.0]

            # Create blanking_interval variable
            blanking_interval = grp.createVariable(
                "blanking_interval", np.float32, ("ping_time", "beam")
            )
            blanking_interval[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            blanking_interval.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )
            blanking_interval.units = "s"
            blanking_interval.valid_min = 0.0

            # Create calibrated_frequency variable
            calibrated_frequency = grp.createVariable(
                "calibrated_frequency", np.float64, ()
            )
            calibrated_frequency[:] = echodata["Sonar/Beam_group1"][
                "frequency_nominal"
            ].values[i]
            calibrated_frequency.setncattr("long_name", "Calibration gain frequencies")
            calibrated_frequency.units = "Hz"
            calibrated_frequency.valid_min = 0.0

            # Create echoangle_major variable (talk to joe about this)
            echoangle_major = grp.createVariable(
                "echoangle_major", np.float32, ("ping_time", "beam")
            )
            echoangle_major[:] = echoangle_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            echoangle_major.setncattr(
                "long_name", "Echo arrival angle in the major beam coordinate"
            )
            echoangle_major.units = "arc_degree"
            echoangle_major.valid_range = [-180.0, 180.0]

            # Create echoangle_minor variable
            echoangle_minor = grp.createVariable(
                "echoangle_minor", np.float32, ("ping_time", "beam")
            )
            echoangle_minor[:] = echoangle_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            echoangle_minor.setncattr(
                "long_name", "Echo arrival angle in the minor beam coordinate"
            )
            echoangle_minor.units = "arc_degree"
            echoangle_minor.valid_range = [-180.0, 180.0]

            # Create echoangle_major sensitivity variable
            echoangle_major_sensitivity = grp.createVariable(
                "echoangle_major_sensitivityr", np.float64, ()
            )
            echoangle_major_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_athwartship"
            ].values[i]
            echoangle_major_sensitivity.setncattr(
                "long_name", "Major angle scaling factor"
            )
            echoangle_major_sensitivity.units = "1"
            echoangle_major_sensitivity.valid_min = 0.0

            # Create echoangle_minor sensitivity variable
            echoangle_minor_sensitivity = grp.createVariable(
                "echoangle_minor_sensitivity", np.float64, ()
            )
            echoangle_minor_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_alongship"
            ].values[i]
            echoangle_minor_sensitivity.setncattr(
                "long_name", "Minor angle scaling factor"
            )
            echoangle_minor_sensitivity.units = "1"
            echoangle_minor_sensitivity.valid_min = 0.0

            # Create equivalent_beam_angle variable (weird angle values)
            equivalent_beam_angle = grp.createVariable(
                "equivalent_beam_angle", np.float32, ("ping_time", "beam")
            )
            equivalent_beam_angle[:] = equivalent_beam_angle_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            equivalent_beam_angle.setncattr("long_name", "Equivalent beam angle")

            # Create frequency variable
            frequency = grp.createVariable("frequency", np.float64, ())
            frequency[:] = echodata["Sonar/Beam_group1"]["frequency_nominal"].values[i]
            frequency.setncattr("long_name", "Calibration gain frequencies")
            frequency.units = "Hz"
            frequency.valid_min = 0.0

            # Create non_quantitative_processing variable
            non_quantitative_processing = grp.createVariable(
                "non_quantitative_processing", int, ("ping_time")
            )
            non_quantitative_processing[:] = np.zeros(
                echodata["Sonar/Beam_group1"].sizes["ping_time"]
            )
            non_quantitative_processing.setncattr(
                "long_name",
                "Presence or not of non-quantitative processing applied to the backscattering data (sonar specific)",
            )

            # Create platform_heading variable
            platform_heading = grp.createVariable(
                "platform_heading", np.float32, ("ping_time")
            )
            platform_heading[:] = echodata["Platform"]["heading"].values
            platform_heading.setncattr("long_name", "Platform heading(true)")
            platform_heading.units = "degrees_north"
            platform_heading.valid_range = [0, 360.0]

            # Create platform_latitude variable
            platform_latitude = grp.createVariable(
                "platform_latitude", np.float32, ("ping_time")
            )
            platform_latitude[:] = echodata["Platform"]["latitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platform_latitude.setncattr(
                "long_name", "Heading of the platform at time of the ping"
            )
            platform_latitude.units = "degrees_north"
            platform_latitude.valid_range = [-180.0, 180.0]

            # Create platform_longitude variable
            platform_longitude = grp.createVariable(
                "platform_longitude", np.float64, ("ping_time")
            )
            platform_longitude[:] = echodata["Platform"]["longitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platform_longitude.setncattr("long_name", "longitude")
            platform_longitude.units = "degrees_east"
            platform_longitude.valid_range = [-180.0, 180.0]

            # Create platform_pitch variable
            platform_pitch = grp.createVariable(
                "platform_pitch", np.float64, ("ping_time")
            )
            platform_pitch[:] = echodata["Platform"]["pitch"].values
            platform_pitch.setncattr("long_name", "pitch_angle")
            platform_pitch.units = "arc_degree"
            platform_pitch.valid_range = [-90.0, 90.0]

            # Create platform_roll variable
            platform_roll = grp.createVariable(
                "platform_roll", np.float64, ("ping_time")
            )
            platform_roll[:] = echodata["Platform"]["roll"].values
            platform_roll.setncattr("long_name", "roll angle")
            platform_roll.units = "arc_degree"

            # Create platform_vertical_offset variable
            platform_vertical_offset = grp.createVariable(
                "platform_vertical_offset", np.float64, ("ping_time")
            )
            platform_vertical_offset[:] = echodata["Platform"]["vertical_offset"].values
            platform_vertical_offset.setncattr(
                "long_name",
                "Platform vertical distance from reference point to the water line",
            )
            platform_vertical_offset.units = "m"

            # Create rx_beam_rotation_phi variable
            rx_beam_rotation_phi = grp.createVariable(
                "rx_beam_rotation_phi", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            rx_beam_rotation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            rx_beam_rotation_phi.units = "arc_degree"
            rx_beam_rotation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            rx_beam_rotation_psi = grp.createVariable(
                "rx_beam_rotation_psi", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_psi[:] = rx_beam_rotation_psi_data
            rx_beam_rotation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            rx_beam_rotation_psi.units = "arc_degree"
            rx_beam_rotation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            rx_beam_rotation_theta = grp.createVariable(
                "rx_beam_roation_theta", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_theta[:] = rx_beam_rotation_theta_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            rx_beam_rotation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            rx_beam_rotation_theta.units = "arc_degree"
            rx_beam_rotation_theta.valid_range = [-90.0, 90.0]

            # Create sample_interval variable
            sample_interval = grp.createVariable(
                "sample_interval", np.float64, ("ping_time", "beam")
            )
            sample_interval[:] = (
                echodata["Sonar/Beam_group1"]["sample_interval"]
                .transpose()
                .values[:, i]
            )
            sample_interval.setncattr("long_name", "Equivalent beam angle")
            sample_interval.units = "s"
            sample_interval.valid_min = 0.0
            sample_interval.coordinates = (
                "ping_time platform_latitude platform_longitude"
            )

            # Create sample_time_offset variable
            sample_time_offset = grp.createVariable(
                "sample_time_offset", np.float32, ("ping_time", "beam")
            )
            sample_time_offset[:] = (
                echodata["Sonar/Beam_group1"]["sample_time_offset"]
                .transpose()
                .values[:, i]
            )
            sample_time_offset.setncattr(
                "long_name",
                "Time offset that is subtracted from the timestamp of each sample",
            )
            sample_time_offset.units = "s"

            # Create transmit_duration_nominal variable
            transmit_duration_nominal = grp.createVariable(
                "transmit_duration_nominal", np.float32, ("ping_time", "beam")
            )
            transmit_duration_nominal[:] = (
                echodata["Sonar/Beam_group1"]["transmit_duration_nominal"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_duration_nominal.setncattr(
                "long_name", "Nominal duration of transmitted pulse"
            )
            transmit_duration_nominal.units = "Hz"
            transmit_duration_nominal.valid_min = 0.0

            # Create transmit_frequency_start variable
            transmit_frequency_start = grp.createVariable(
                "transmit_frequency_start", np.float32, ("ping_time", "beam")
            )
            transmit_frequency_start[:] = (
                echodata["Sonar/Beam_group1"]["transmit_frequency_start"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_frequency_start.setncattr(
                "long_name", "Start frequency in transmitted pulse"
            )
            transmit_frequency_start.units = "Hz"
            transmit_frequency_start.valid_min = 0.0

            # Create transmit_frequency_stop variable
            transmit_frequency_stop = grp.createVariable(
                "transmit_frequency_stop", np.float32, ("ping_time", "beam")
            )
            transmit_frequency_stop[:] = (
                echodata["Sonar/Beam_group1"]["transmit_frequency_stop"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_frequency_stop.setncattr(
                "long_name", "Stop frequency in transmitted pulse"
            )
            transmit_frequency_stop.units = "Hz"
            transmit_frequency_stop.valid_min = 0.0

            # Create transmit_power variable
            transmit_power = grp.createVariable(
                "transmit_power", np.float32, ("ping_time", "beam")
            )
            transmit_power[:] = (
                echodata["Sonar/Beam_group1"]["transmit_power"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_power.setncattr("long_name", "Nominal transmit power")
            transmit_power.units = "W"
            transmit_power.valid_min = 0.0

            # Create transmit_type
            transmit_type = grp.createVariable(
                "transmit_type", np.float32, ("ping_time", "beam")
            )
            transmit_type[:] = (
                echodata["Sonar/Beam_group1"]["transmit_type"]
                .where(echodata["Sonar/Beam_group1"]["transmit_type"] != "CW", 0)
                .where(echodata["Sonar/Beam_group1"]["transmit_type"] != "LFM", 1)
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_type.setncattr("long_name", "Type of transmitted pulse")

            # Create tx_beam_rotation_phi variable
            tx_beam_roation_phi = grp.createVariable(
                "tx_beam_roation_phi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            tx_beam_roation_phi.units = "arc_degree"
            tx_beam_roation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            tx_beam_roation_psi = grp.createVariable(
                "tx_beam_roation_psi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_psi[:] = rx_beam_rotation_psi_data
            tx_beam_roation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            tx_beam_roation_psi.units = "arc_degree"
            tx_beam_roation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            tx_beam_roation_theta = grp.createVariable(
                "tx_beam_roation_theta", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_theta[:] = rx_beam_rotation_theta_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            tx_beam_roation_theta.units = "arc_degree"
            tx_beam_roation_theta.valid_range = [-90.0, 90.0]

nc_reader

This file is used to get header information out of a NetCDF file. The code reads a .nc file and returns a dict with all of the attributes gathered.

Functions:

Name Description
get_netcdf_header

Reads a NetCDF file and returns its header as a dictionary.

get_netcdf_header(file_path)

Reads a NetCDF file and returns its header as a dictionary.

Parameters:

Name Type Description Default
file_path str

Path to the NetCDF file.

required

Returns:

Name Type Description
dict dict

Dictionary containing global attributes, dimensions, and

dict

variables.

Source code in src\aalibrary\utils\nc_reader.py
def get_netcdf_header(file_path: str) -> dict:
    """Reads a NetCDF file and returns its header as a dictionary.

    Args:
        file_path (str): Path to the NetCDF file.

    Returns:
        dict: Dictionary containing global attributes, dimensions, and
        variables.
    """
    header_info = {}

    with Dataset(file_path, "r") as nc_file:
        # Extract global attributes
        header_info["global_attributes"] = {
            attr: getattr(nc_file, attr) for attr in nc_file.ncattrs()
        }

        # Extract dimensions
        header_info["dimensions"] = {
            dim: len(nc_file.dimensions[dim]) for dim in nc_file.dimensions
        }

        # Extract variable metadata
        header_info["variables"] = {
            var: {
                "dimensions": nc_file.variables[var].dimensions,
                "shape": nc_file.variables[var].shape,
                "dtype": str(nc_file.variables[var].dtype),
                "attributes": {
                    attr: getattr(nc_file.variables[var], attr)
                    for attr in nc_file.variables[var].ncattrs()
                },
            }
            for var in nc_file.variables
        }

    return header_info

ncei_cache_daily_script

Script to get all objects in the NCEI S3 bucket and cache it to BigQuery. Ideally, should run every time a file is updated, however, it is set to run daily via a cronjob.

Cron job command: 0 1 * * * /usr/bin/python3 /path/to/aalibrary/src/aalibrary/utils/test.py

ncei_utils

This file contains code pertaining to auxiliary functions related to parsing through NCEI's s3 bucket.

Functions:

Name Description
check_if_tugboat_metadata_json_exists_in_survey

Checks whether a Tugboat metadata JSON file exists within a survey.

download_single_file_from_aws

Safely downloads a file from AWS storage bucket, aka the NCEI

download_specific_folder_from_ncei

Downloads a specific folder and all of its contents from NCEI to a local

get_all_echosounders_in_a_survey

Gets all of the echosounders in a particular survey from NCEI.

get_all_echosounders_that_exist_in_ncei

Gets a list of all possible echosounders from NCEI.

get_all_file_names_from_survey

Gets all of the file names from a particular NCEI survey.

get_all_file_names_in_a_surveys_echosounder_folder

Gets all of the file names from a particular NCEI survey's echosounder

get_all_metadata_files_in_survey

Gets all of the metadata file names from a particular NCEI survey.

get_all_raw_file_names_from_survey

Gets all of the file names from a particular NCEI survey.

get_all_ship_names_in_ncei

Gets all of the ship names from NCEI. This is based on all of the

get_all_survey_names_from_a_ship

Gets a list of all of the survey names that exist under a ship name.

get_all_surveys_in_ncei

Gets a list of all of the possible survey names from NCEI.

get_checksum_sha256_from_s3

Gets the SHA-256 checksum of the s3 object.

get_closest_ncei_formatted_ship_name

Gets the closest NCEI formatted ship name to the given ship name.

get_echosounder_from_raw_file

Gets the echosounder used for a particular raw file.

get_file_size_from_s3

Gets the file size of an object in s3.

get_folder_size_from_s3

Gets the folder size in bytes from S3.

get_random_raw_file_from_ncei

Creates a test raw file for NCEI. This is used for testing purposes

search_ncei_file_objects_for_string

Searches NCEI for a file type's object keys that contain a particular

search_ncei_objects_for_string

Searches NCEI for object keys that contain a particular string. This

check_if_tugboat_metadata_json_exists_in_survey(ship_name='', survey_name='', s3_bucket=None)

Checks whether a Tugboat metadata JSON file exists within a survey. Returns the file's object key or None if it does not exist.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
s3_bucket resource

The bucket resource object. Defaults to None.

None

Returns: Union[str, None]: Returns the file's object key string or None if it does not exist.

Source code in src\aalibrary\utils\ncei_utils.py
def check_if_tugboat_metadata_json_exists_in_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_bucket: boto3.resource = None,
) -> Union[str, None]:
    """Checks whether a Tugboat metadata JSON file exists within a survey.
    Returns the file's object key or None if it does not exist.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.
    Returns:
        Union[str, None]: Returns the file's object key string or None if it
            does not exist.
    """

    # Find all metadata files within the metadata/ folder in NCEI
    all_metadata_obj_keys = list_all_objects_in_s3_bucket_location(
        prefix=f"data/raw/{ship_name}/{survey_name}/metadata",
        s3_resource=s3_bucket,
    )

    for obj_key, file_name in all_metadata_obj_keys:
        # Handle for main metadata file for upload to BigQuery.
        if file_name.endswith("metadata.json"):
            return obj_key

    return None

download_single_file_from_aws(file_url='', download_location='')

Safely downloads a file from AWS storage bucket, aka the NCEI repository.

Parameters:

Name Type Description Default
file_url str

The file url. Defaults to "".

''
download_location str

The local download location for the file. Defaults to "".

''
Source code in src\aalibrary\utils\ncei_utils.py
def download_single_file_from_aws(
    file_url: str = "",
    download_location: str = "",
):
    """Safely downloads a file from AWS storage bucket, aka the NCEI
    repository.

    Args:
        file_url (str, optional): The file url. Defaults to "".
        download_location (str, optional): The local download location for the
            file. Defaults to "".
    """

    try:
        _, s3_resource, s3_bucket = create_s3_objs()
    except Exception as e:
        logging.error("CANNOT ESTABLISH CONNECTION TO S3 BUCKET..\n{%s}", e)
        raise

    # We replace the beginning of common file paths
    file_url = get_object_key_for_s3(file_url=file_url)
    file_name = get_file_name_from_url(file_url)

    # Check if the file exists in s3
    file_exists = check_if_file_exists_in_s3(
        object_key=file_url,
        s3_resource=s3_resource,
        s3_bucket_name=s3_bucket.name,
    )

    if file_exists:
        # Finally download the file.
        try:
            logging.info("DOWNLOADING `%s`...", file_name)
            s3_bucket.download_file(file_url, download_location)
            logging.info(
                "DOWNLOADED `%s` TO `%s`", file_name, download_location
            )
        except Exception as e:
            logging.error(
                "ERROR DOWNLOADING FILE `%s` DUE TO\n%s", file_name, e
            )
            raise
    else:
        logging.error(
            "FILE %s DOES NOT EXIST IN NCEI S3 BUCKET. SKIPPING...", file_name
        )

download_specific_folder_from_ncei(folder_prefix='', download_directory='', debug=False)

Downloads a specific folder and all of its contents from NCEI to a local directory.

Parameters:

Name Type Description Default
folder_prefix str

The folder's path in the s3 bucket. Ex. 'data/raw/Reuben_Lasker/' Defaults to "".

''
download_directory str

The directory you want to download the folder and all of its contents to. Defaults to "".

''
debug bool

Whether or not to print debug information. Defaults to False.

False
Source code in src\aalibrary\utils\ncei_utils.py
def download_specific_folder_from_ncei(
    folder_prefix: str = "", download_directory: str = "", debug: bool = False
):
    """Downloads a specific folder and all of its contents from NCEI to a local
    directory.

    Args:
        folder_prefix (str, optional): The folder's path in the s3 bucket.
            Ex. 'data/raw/Reuben_Lasker/'
            Defaults to "".
        download_directory (str, optional): The directory you want to download
            the folder and all of its contents to. Defaults to "".
        debug (bool, optional): Whether or not to print debug information.
            Defaults to False.
    """

    if not folder_prefix.endswith("/"):
        folder_prefix += "/"

    assert (download_directory is not None) and (
        download_directory != ""
    ), "You must provide a download_directory to download the folder to."

    if debug:
        logging.debug("FORMATTED DOWNLOAD DIRECTORY: %s", download_directory)

    # Get all s3 objects for the survey
    print(f"GETTING ALL S3 OBJECTS FOR FOLDER `{folder_prefix}`...")
    _, s3_resource, _ = create_s3_objs()
    s3_objects = list_all_objects_in_s3_bucket_location(
        prefix=folder_prefix,
        s3_resource=s3_resource,
        return_full_paths=True,
    )
    print(f"FOUND {len(s3_objects)} FILES.")

    subdirs = set()
    # Get the subfolders from object keys
    for s3_object in s3_objects:
        # Skip folders
        if s3_object.endswith("/"):
            continue
        # Get the subfolder structure from the object key
        subfolder_key = os.sep.join(
            s3_object.replace("data/raw/", "").split("/")[:-1]
        )
        subdirs.add(subfolder_key)
    for subdir in subdirs:
        os.makedirs(os.sep.join([download_directory, subdir]), exist_ok=True)

    # Create the directory if it doesn't exist.
    if not os.path.isdir(download_directory):
        print(f"CREATING download_directory `{download_directory}`")
        os.makedirs(download_directory, exist_ok=True)
    # normalize the path
    download_directory = os.path.normpath(download_directory)
    print("CREATED DOWNLOAD SUBDIRECTORIES.")

    for idx, object_key in enumerate(tqdm(s3_objects, desc="Downloading")):
        file_name = object_key.split("/")[-1]
        local_object_path = object_key.replace("data/raw/", "")
        download_location = os.path.normpath(
            os.sep.join([download_directory, local_object_path])
        )
        download_single_file_from_aws(
            file_url=object_key, download_location=download_location
        )
    print(f"DOWNLOAD COMPLETE {os.path.abspath(download_directory)}.")

get_all_echosounders_in_a_survey(ship_name='', survey_name='', s3_client=None, return_full_paths=False)

Gets all of the echosounders in a particular survey from NCEI.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_echosounders_in_a_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the echosounders in a particular survey from NCEI.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/"
    all_survey_folder_names = get_subdirectories_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_client=s3_client,
        return_full_paths=return_full_paths,
        bucket_name="noaa-wcsd-pds",
    )
    # Get echosounder folders by ignoring the other metadata folders
    all_echosounders = []
    for folder_name in all_survey_folder_names:
        if (
            ("calibration" not in folder_name.lower())
            and ("metadata" not in folder_name.lower())
            and ("json" not in folder_name.lower())
            and ("doc" not in folder_name.lower())
        ):
            all_echosounders.append(folder_name)

    return all_echosounders

get_all_echosounders_that_exist_in_ncei(s3_client=None)

Gets a list of all possible echosounders from NCEI.

Parameters:

Name Type Description Default
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_echosounders_that_exist_in_ncei(
    s3_client: boto3.client = None,
) -> List[str]:
    """Gets a list of all possible echosounders from NCEI.

    Args:
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # First we get all of the prefixes for each survey to exist in NCEI.
    all_survey_prefixes = get_all_surveys_in_ncei(
        s3_client=s3_client, return_full_paths=True
    )
    all_echosounders = set()
    for survey_prefix in tqdm(
        all_survey_prefixes, desc="Getting Echosounders"
    ):
        # Remove trailing `/`
        survey_prefix = survey_prefix.strip("/")
        survey_name = survey_prefix.split("/")[-1]
        ship_name = survey_prefix.split("/")[-2]
        survey_echosounders = get_all_echosounders_in_a_survey(
            ship_name=ship_name,
            survey_name=survey_name,
            s3_client=s3_client,
            return_full_paths=False,
        )
        all_echosounders.update(survey_echosounders)

    return list(all_echosounders)

get_all_file_names_from_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)

Gets all of the file names from a particular NCEI survey.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
s3_resource resource

The resource used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_file_names_from_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_files

get_all_file_names_in_a_surveys_echosounder_folder(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)

Gets all of the file names from a particular NCEI survey's echosounder folder.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
echosounder str

The echosounder used. Defaults to "".

''
s3_resource resource

The resource used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the file name. Whether these are full paths or just file names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_file_names_in_a_surveys_echosounder_folder(
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey's echosounder
    folder.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        echosounder (str, optional): The echosounder used. Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the file name. Whether
            these are full paths or just file names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/{echosounder}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_files

get_all_metadata_files_in_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)

Gets all of the metadata file names from a particular NCEI survey.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
s3_resource resource

The resource used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the metadata file name. Whether these are full paths or just folder names are specified by the return_full_paths parameter. Returns empty list '[]' if no metadata files are present.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_metadata_files_in_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the metadata file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the metadata file name.
            Whether these are full paths or just folder names are specified by
            the `return_full_paths` parameter. Returns empty list '[]' if no
            metadata files are present.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/metadata/"
    all_metadata_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_metadata_files

get_all_raw_file_names_from_survey(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)

Gets all of the file names from a particular NCEI survey.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
survey_name str

The survey name exactly as it is in NCEI. Defaults to "".

''
echosounder str

The echosounder used. Defaults to "".

''
s3_resource resource

The resource used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns:

Type Description
List[str]

List[str]: A list of strings, each being the raw file name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_raw_file_names_from_survey(
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        echosounder (str, optional): The echosounder used. Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the raw file name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/{echosounder}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    all_files = [file for file in all_files if file.endswith(".raw")]
    return all_files

get_all_ship_names_in_ncei(normalize=False, s3_client=None, return_full_paths=False)

Gets all of the ship names from NCEI. This is based on all of the folders listed under the data/raw/ prefix.

Parameters:

Name Type Description Default
normalize bool

Whether or not to normalize the ship_name attribute to how GCP stores it. Defaults to False.

False
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False
Source code in src\aalibrary\utils\ncei_utils.py
def get_all_ship_names_in_ncei(
    normalize: bool = False,
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
):
    """Gets all of the ship names from NCEI. This is based on all of the
    folders listed under the `data/raw/` prefix.

    Args:
        normalize (bool, optional): Whether or not to normalize the ship_name
            attribute to how GCP stores it. Defaults to False.
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # Get the initial subdirs
    prefix = "data/raw/"
    subdirs = get_subdirectories_in_s3_bucket_location(
        prefix=prefix, s3_client=s3_client, return_full_paths=return_full_paths
    )
    if normalize:
        subdirs = [normalize_ship_name(ship_name=subdir) for subdir in subdirs]
    return subdirs

get_all_survey_names_from_a_ship(ship_name='', s3_client=None, return_full_paths=False)

Gets a list of all of the survey names that exist under a ship name.

Parameters:

Name Type Description Default
ship_name str

The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the get_all_ship_names_in_ncei function to see all possible NCEI ship names.

''
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns: List[str]: A list of strings, each being the survey name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_survey_names_from_a_ship(
    ship_name: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets a list of all of the survey names that exist under a ship name.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    Returns:
        List[str]: A list of strings, each being the survey name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """
    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # Make sure the ship name is valid
    all_ship_names = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=False
    )
    if ship_name not in all_ship_names:
        close_matches = get_close_matches(
            ship_name, all_ship_names, n=3, cutoff=0.6
        )
    assert ship_name in all_ship_names, (
        f"The ship name provided `{ship_name}` "
        "needs to be spelled exactly like in NCEI.\n"
        "Use the `get_all_ship_names_in_ncei` function to see all possible "
        "NCEI ship names.\n"
        f"Did you mean one of these possible ship names?\n{close_matches}"
    )

    ship_prefix = f"data/raw/{ship_name}/"
    all_surveys = set()
    # Get a list of all of this ship's survey names
    all_ship_survey_names = get_subdirectories_in_s3_bucket_location(
        prefix=ship_prefix,
        s3_client=s3_client,
        return_full_paths=return_full_paths,
        bucket_name="noaa-wcsd-pds",
    )
    all_surveys.update(all_ship_survey_names)
    return list(all_surveys)

get_all_surveys_in_ncei(s3_client=None, return_full_paths=False)

Gets a list of all of the possible survey names from NCEI.

Parameters:

Name Type Description Default
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None
return_full_paths bool

Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.

False

Returns: List[str]: A list of strings, each being the survey name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def get_all_surveys_in_ncei(
    s3_client: boto3.client = None, return_full_paths: bool = False
) -> List[str]:
    """Gets a list of all of the possible survey names from NCEI.

    Args:
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    Returns:
        List[str]: A list of strings, each being the survey name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # First we get all of the prefixes for each ship.
    all_ship_prefixes = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=True
    )
    all_surveys = set()
    for ship_prefix in tqdm(all_ship_prefixes, desc="Getting Surveys"):
        # Get a list of all of this ship's survey names
        all_ship_survey_names = get_subdirectories_in_s3_bucket_location(
            prefix=ship_prefix,
            s3_client=s3_client,
            return_full_paths=return_full_paths,
            bucket_name="noaa-wcsd-pds",
        )
        all_surveys.update(all_ship_survey_names)
    return list(all_surveys)

get_checksum_sha256_from_s3(object_key, s3_resource)

Gets the SHA-256 checksum of the s3 object.

Source code in src\aalibrary\utils\ncei_utils.py
def get_checksum_sha256_from_s3(object_key, s3_resource):
    """Gets the SHA-256 checksum of the s3 object."""
    obj = s3_resource.Object("noaa-wcsd-pds", object_key)
    checksum = obj.checksum_sha256
    return checksum

get_closest_ncei_formatted_ship_name(ship_name='', s3_client=None)

Gets the closest NCEI formatted ship name to the given ship name. NOTE: Only use if the data_source=="NCEI".

Parameters:

Name Type Description Default
ship_name str

The ship name to search the closest match for. Defaults to "".

''
s3_client client

The client used to perform this operation. Defaults to None, but creates a client for you instead.

None

Returns:

Type Description
Union[str, None]

Union[str, None]: The NCEI formatted ship name or None, if none matched.

Source code in src\aalibrary\utils\ncei_utils.py
def get_closest_ncei_formatted_ship_name(
    ship_name: str = "",
    s3_client: boto3.client = None,
) -> Union[str, None]:
    """Gets the closest NCEI formatted ship name to the given ship name.
    NOTE: Only use if the `data_source`=="NCEI".

    Args:
        ship_name (str, optional): The ship name to search the closest match
            for.
            Defaults to "".
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.

    Returns:
        Union[str, None]: The NCEI formatted ship name or None, if none
            matched.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    all_ship_names = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=False
    )
    close_matches = get_close_matches(
        ship_name, all_ship_names, n=3, cutoff=0.85
    )
    if len(close_matches) >= 1:
        return close_matches[0]
    else:
        return None

get_echosounder_from_raw_file(file_name='', ship_name='', survey_name='', echosounders=None, s3_client=None, s3_resource=None, s3_bucket=None)

Gets the echosounder used for a particular raw file.

Source code in src\aalibrary\utils\ncei_utils.py
def get_echosounder_from_raw_file(
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounders: List[str] = None,
    s3_client: boto3.client = None,
    s3_resource: boto3.resource = None,
    s3_bucket: boto3.resource = None,
):
    """Gets the echosounder used for a particular raw file."""

    if (s3_client is None) or (s3_resource is None) or (s3_bucket is None):
        s3_client, s3_resource, s3_bucket = create_s3_objs()

    if echosounders is None:
        echosounders = get_all_echosounders_in_a_survey(
            ship_name=ship_name,
            survey_name=survey_name,
            s3_client=s3_client,
            return_full_paths=False,
        )

    for echosounder in echosounders:
        raw_file_location = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        raw_file_exists = check_if_file_exists_in_s3(
            object_key=raw_file_location,
            s3_resource=s3_resource,
            s3_bucket_name=s3_bucket.name,
        )
        if raw_file_exists:
            return echosounder

    return ValueError("An echosounder could not be found for this raw file.")

get_file_size_from_s3(object_key, s3_resource)

Gets the file size of an object in s3.

Source code in src\aalibrary\utils\ncei_utils.py
def get_file_size_from_s3(object_key, s3_resource):
    """Gets the file size of an object in s3."""
    obj = s3_resource.Object("noaa-wcsd-pds", object_key)
    file_size = obj.content_length
    return file_size

get_folder_size_from_s3(folder_prefix, s3_resource)

Gets the folder size in bytes from S3.

Parameters:

Name Type Description Default
folder_prefix str

The object key prefix of the folder in S3.

required
s3_resource resource

The resource used to perform this operation. Defaults to None, but creates a client for you instead.

required

Returns:

Name Type Description
int int

The total size of the folder in bytes.

Source code in src\aalibrary\utils\ncei_utils.py
def get_folder_size_from_s3(
    folder_prefix: str, s3_resource: boto3.resource
) -> int:
    """Gets the folder size in bytes from S3.

    Args:
        folder_prefix (str): The object key prefix of the folder in S3.
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.

    Returns:
        int: The total size of the folder in bytes.
    """
    if s3_resource is None:
        _, s3_resource, _ = create_s3_objs()

    # Initialize total size
    total_size = 0

    # Get all objects' keys in the folder
    all_files_object_keys = list_all_objects_in_s3_bucket_location(
        prefix=folder_prefix,
        s3_resource=s3_resource,
        return_full_paths=True,
    )

    for file_object_key in tqdm(
        all_files_object_keys, desc="Calculating Folder Size"
    ):
        total_size += get_file_size_from_s3(
            object_key=file_object_key, s3_resource=s3_resource
        )

    return total_size

get_random_raw_file_from_ncei()

Creates a test raw file for NCEI. This is used for testing purposes only. Retries automatically if an error occurs.

Returns:

Type Description
List[str]

List[str]: A list object with strings denoting each parameter required for creating a raw file object. Ex. [ random_ship_name, random_survey_name, random_echosounder, random_raw_file, ]

Source code in src\aalibrary\utils\ncei_utils.py
def get_random_raw_file_from_ncei() -> List[str]:
    """Creates a test raw file for NCEI. This is used for testing purposes
    only. Retries automatically if an error occurs.

    Returns:
        List[str]: A list object with strings denoting each parameter required
            for creating a raw file object.
            Ex. [
                random_ship_name,
                random_survey_name,
                random_echosounder,
                random_raw_file,
            ]
    """

    try:
        # Get all of the ship names
        all_ship_names = get_all_ship_names_in_ncei(
            normalize=False, return_full_paths=False
        )
        random_ship_name = all_ship_names[randint(0, len(all_ship_names) - 1)]
        # Get all of the surveys for this ship
        all_surveys_for_this_ship = get_all_survey_names_from_a_ship(
            ship_name=random_ship_name, return_full_paths=False
        )
        random_survey_name = all_surveys_for_this_ship[
            randint(0, len(all_surveys_for_this_ship) - 1)
        ]
        # Get all of the echosounders in this survey
        all_echosounders_for_this_survey = get_all_echosounders_in_a_survey(
            ship_name=random_ship_name,
            survey_name=random_survey_name,
            return_full_paths=False,
        )
        random_echosounder = all_echosounders_for_this_survey[
            randint(0, len(all_echosounders_for_this_survey) - 1)
        ]
        # Get all of the raw files in this echosounder
        all_raw_files_in_echosounder = get_all_raw_file_names_from_survey(
            ship_name=random_ship_name,
            survey_name=random_survey_name,
            echosounder=random_echosounder,
            return_full_paths=False,
        )
        random_raw_file = all_raw_files_in_echosounder[
            randint(0, len(all_raw_files_in_echosounder) - 1)
        ]

        return [
            random_ship_name,
            random_survey_name,
            random_echosounder,
            random_raw_file,
        ]
    except Exception:
        return get_random_raw_file_from_ncei()

search_ncei_file_objects_for_string(search_param='', file_extension='.raw')

Searches NCEI for a file type's object keys that contain a particular string. This string can be anything, such as an echosounder name, ship name, survey name, or even a partial file name. The file type can be specified by the file_extension parameter. NOTE: This function takes a long time to run, as it has to search through ALL of NCEI's objects.

Parameters:

Name Type Description Default
search_param str

The string to search for. Defaults to "".

''
file_extension str

The file extension to filter results by. Defaults to ".raw".

'.raw'

Returns:

Type Description
List[str]

List[str]: A list of strings, each being an object key that contains the search parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def search_ncei_file_objects_for_string(
    search_param: str = "", file_extension: str = ".raw"
) -> List[str]:
    """Searches NCEI for a file type's object keys that contain a particular
    string. This string can be anything, such as an echosounder name,
    ship name, survey name, or even a partial file name. The file type can be
    specified by the file_extension parameter.
    NOTE: This function takes a long time to run, as it has to search through
    ALL of NCEI's objects.

    Args:
        search_param (str, optional): The string to search for. Defaults to "".
        file_extension (str, optional): The file extension to filter results
            by. Defaults to ".raw".

    Returns:
        List[str]: A list of strings, each being an object key that contains
            the search parameter.
    """

    s3_client, _, _ = create_s3_objs()
    paginator = s3_client.get_paginator("list_objects_v2")
    page_iterator = paginator.paginate(Bucket="noaa-wcsd-pds")
    matching_object_keys = []
    objects = page_iterator.search(
        f"Contents[?contains(Key, `{search_param}`)"
        f" && ends_with(Key, `{file_extension}`)][]"
    )
    for item in objects:
        print(item["Key"])
        matching_object_keys.append(item["Key"])
    return matching_object_keys

search_ncei_objects_for_string(search_param='')

Searches NCEI for object keys that contain a particular string. This string can be anything, such as an echosounder name, ship name, survey name, or even a partial file name. NOTE: This function takes a long time to run, as it has to search through ALL of NCEI's objects. NOTE: Use a folder name as the search_param to get all object keys that contain that folder name. (e.g. '/EK80/')

Parameters:

Name Type Description Default
search_param str

The string to search for. Defaults to "".

''

Returns:

Type Description
List[str]

List[str]: A list of strings, each being an object key that contains the search parameter.

Source code in src\aalibrary\utils\ncei_utils.py
def search_ncei_objects_for_string(search_param: str = "") -> List[str]:
    """Searches NCEI for object keys that contain a particular string. This
    string can be anything, such as an echosounder name, ship name,
    survey name, or even a partial file name.
    NOTE: This function takes a long time to run, as it has to search through
    ALL of NCEI's objects.
    NOTE: Use a folder name as the search_param to get all object keys that
    contain that folder name. (e.g. '/EK80/')

    Args:
        search_param (str, optional): The string to search for. Defaults to "".

    Returns:
        List[str]: A list of strings, each being an object key that contains
            the search parameter.
    """

    s3_client, _, _ = create_s3_objs()
    paginator = s3_client.get_paginator("list_objects_v2")
    page_iterator = paginator.paginate(Bucket="noaa-wcsd-pds")
    matching_object_keys = []
    # Vpcs[?contains(`["vpc-blabla1", "vpc-blabla2"]`, VpcId)].OtherKey
    # objects = page_iterator.search(f"
    # Contents[?contains(Key, `{search_param}`) && ends_with(Key, `.raw`)][]")
    objects = page_iterator.search(
        f"Contents[?contains(Key, `{search_param}`)][]"
    )
    # objects = page_iterator.search("Contents[?ends_with(Key, `.csv`)][]")
    for item in objects:
        matching_object_keys.append(item["Key"])
    return matching_object_keys

sonar_checker

Modules:

Name Description
ek_date_conversion

Code originally developed for pyEcholab

ek_raw_io

Code originally developed for pyEcholab

ek_raw_parsers

Code originally developed for pyEcholab

log
misc
sonar_checker

ek_date_conversion

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

Contains functions to convert date information.

TODO: merge necessary function into ek60.py or group everything into a class TODO: fix docstring

Functions:

Name Description
nt_to_unix

:param nt_timestamp_tuple: Tuple of two longs representing the NT date

unix_to_nt

Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders

datetime_to_unix(datetime_obj)

:param datetime_obj: datetime object to convert :type datetime_obj: :class:datetime.datetime

:param tz: Timezone to use for converted time -- if None, uses timezone information contained within datetime_obj :type tz: :class:datetime.tzinfo

from pytz import utc from datetime import datetime epoch = datetime(1970, 1, 1, tzinfo=utc) assert datetime_to_unix(epoch) == 0

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py
def datetime_to_unix(datetime_obj):
    """
    :param datetime_obj: datetime object to convert
    :type datetime_obj: :class:`datetime.datetime`

    :param tz: Timezone to use for converted time -- if None, uses timezone
                information contained within datetime_obj
    :type tz: :class:datetime.tzinfo

    >>> from pytz import utc
    >>> from datetime import datetime
    >>> epoch = datetime(1970, 1, 1, tzinfo=utc)
    >>> assert datetime_to_unix(epoch) == 0
    """

    timestamp = (datetime_obj - UTC_UNIX_EPOCH).total_seconds()

    return timestamp

nt_to_unix(nt_timestamp_tuple, return_datetime=True)

:param nt_timestamp_tuple: Tuple of two longs representing the NT date :type nt_timestamp_tuple: (long, long)

:param return_datetime: Return a datetime object instead of float :type return_datetime: bool

Returns a datetime.datetime object w/ UTC timezone calculated from the nt time tuple

lowDateTime, highDateTime = nt_timestamp_tuple

The timestamp is a 64bit count of 100ns intervals since the NT epoch broken into two 32bit longs, least significant first:

dt = nt_to_unix((19496896L, 30196149L)) match_dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc) assert abs(dt - match_dt) <= dt.resolution

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py
def nt_to_unix(nt_timestamp_tuple, return_datetime=True):
    """
    :param nt_timestamp_tuple: Tuple of two longs representing the NT date
    :type nt_timestamp_tuple: (long, long)

    :param return_datetime:  Return a datetime object instead of float
    :type return_datetime: bool


    Returns a datetime.datetime object w/ UTC timezone
    calculated from the nt time tuple

    lowDateTime, highDateTime = nt_timestamp_tuple

    The timestamp is a 64bit count of 100ns intervals since the NT epoch
    broken into two 32bit longs, least significant first:

    >>> dt = nt_to_unix((19496896L, 30196149L))
    >>> match_dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc)
    >>> assert abs(dt - match_dt) <= dt.resolution
    """

    lowDateTime, highDateTime = nt_timestamp_tuple
    sec_past_nt_epoch = ((highDateTime << 32) + lowDateTime) * 1.0e-7

    if return_datetime:
        return UTC_NT_EPOCH + datetime.timedelta(seconds=sec_past_nt_epoch)

    else:
        sec_past_unix_epoch = sec_past_nt_epoch - EPOCH_DELTA_SECONDS
        return sec_past_unix_epoch

unix_to_datetime(unix_timestamp)

:param unix_timestamp: Number of seconds since unix epoch (1/1/1970) :type unix_timestamp: float

:param tz: timezone to use for conversion (default None = UTC) :type tz: None or tzinfo object (see datetime docs)

:returns: datetime object :raises: ValueError if unix_timestamp is not of type float or datetime

Returns a datetime object from a unix timestamp. Simple wrapper for :func:datetime.datetime.fromtimestamp

from pytz import utc from datetime import datetime epoch = unix_to_datetime(0.0, tz=utc) assert epoch == datetime(1970, 1, 1, tzinfo=utc)

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py
def unix_to_datetime(unix_timestamp):
    """
    :param unix_timestamp: Number of seconds since unix epoch (1/1/1970)
    :type unix_timestamp: float

    :param tz: timezone to use for conversion (default None = UTC)
    :type tz: None or tzinfo object (see datetime docs)

    :returns: datetime object
    :raises: ValueError if unix_timestamp is not of type float or datetime

    Returns a datetime object from a unix timestamp.  Simple wrapper for
    :func:`datetime.datetime.fromtimestamp`

    >>> from pytz import utc
    >>> from datetime import datetime
    >>> epoch = unix_to_datetime(0.0, tz=utc)
    >>> assert epoch == datetime(1970, 1, 1, tzinfo=utc)
    """

    if isinstance(unix_timestamp, datetime.datetime):
        if unix_timestamp.tzinfo is None:
            unix_datetime = pytz_utc.localize(unix_timestamp)

        elif unix_timestamp.tzinfo == pytz_utc:
            unix_datetime = unix_timestamp

        else:
            unix_datetime = pytz_utc.normalize(unix_timestamp.astimezone(pytz_utc))

    elif isinstance(unix_timestamp, float):
        unix_datetime = pytz_utc.localize(datetime.datetime.fromtimestamp(unix_timestamp))

    else:
        errstr = "Looking for a timestamp of type datetime.datetime or # of sec past unix epoch.\n"
        errstr += "Supplied timestamp '%s' of type %s." % (
            str(unix_timestamp),
            type(unix_timestamp),
        )
        raise ValueError(errstr)

    return unix_datetime

unix_to_nt(unix_timestamp)

Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders

Simple conversion

dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc) assert (19496896L, 30196149L) == unix_to_nt(dt)

Converting back and forth between the two standards:

orig_dt = datetime.datetime.now(tz=pytz_utc) nt_tuple = unix_to_nt(orig_dt)

converting back may not yield the exact original date,
but will be within the datetime's precision

back_to_dt = nt_to_unix(nt_tuple) d_mu_seconds = abs(orig_dt - back_to_dt).microseconds mu_sec_resolution = orig_dt.resolution.microseconds assert d_mu_seconds <= mu_sec_resolution

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py
def unix_to_nt(unix_timestamp):
    """
    Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders


    #Simple conversion
    >>> dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc)
    >>> assert (19496896L, 30196149L) == unix_to_nt(dt)

    #Converting back and forth between the two standards:
    >>> orig_dt = datetime.datetime.now(tz=pytz_utc)
    >>> nt_tuple = unix_to_nt(orig_dt)

    #converting back may not yield the exact original date,
    #but will be within the datetime's precision
    >>> back_to_dt = nt_to_unix(nt_tuple)
    >>> d_mu_seconds = abs(orig_dt - back_to_dt).microseconds
    >>> mu_sec_resolution = orig_dt.resolution.microseconds
    >>> assert d_mu_seconds <= mu_sec_resolution
    """

    if isinstance(unix_timestamp, datetime.datetime):
        if unix_timestamp.tzinfo is None:
            unix_datetime = pytz_utc.localize(unix_timestamp)

        elif unix_timestamp.tzinfo == pytz_utc:
            unix_datetime = unix_timestamp

        else:
            unix_datetime = pytz_utc.normalize(unix_timestamp.astimezone(pytz_utc))

    else:
        unix_datetime = unix_to_datetime(unix_timestamp)

    sec_past_nt_epoch = (unix_datetime - UTC_NT_EPOCH).total_seconds()

    onehundred_ns_intervals = int(sec_past_nt_epoch * 1e7)
    lowDateTime = onehundred_ns_intervals & 0xFFFFFFFF
    highDateTime = onehundred_ns_intervals >> 32

    return lowDateTime, highDateTime

ek_raw_io

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

Contains low-level functions called by ./ek_raw_parsers.py

Classes:

Name Description
RawSimradFile

A low-level extension of the built in python file object allowing the reading/writing

RawSimradFile

Bases: BufferedReader

A low-level extension of the built in python file object allowing the reading/writing of SIMRAD RAW files on datagram by datagram basis (instead of at the byte level.)

Calls to the read method return parse datagrams as dicts.

Methods:

Name Description
__next__

Returns the next datagram (synonymous with self.read(1))

iter_dgrams

Iterates through the file, repeatedly calling self.next() until

peek

Returns the header of the next datagram in the file. The file position is

prev

Returns the previous datagram 'behind' the current file pointer position

read

:param k: Number of datagrams to read

readall

Reads the entire file from the beginning and returns a list of datagrams.

readline

aliased to self.next()

readlines

aliased to self.read(-1)

seek

Performs the familiar 'seek' operation using datagram offsets

skip

Skips forward to the next datagram without reading the contents of the current one

skip_back

Skips backwards to the previous datagram without reading it's contents

tell

Returns the current file pointer offset by datagram number

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
class RawSimradFile(BufferedReader):
    """
    A low-level extension of the built in python file object allowing the reading/writing
    of SIMRAD RAW files on datagram by datagram basis (instead of at the byte level.)

    Calls to the read method return parse datagrams as dicts.
    """

    #: Dict object with datagram header/python class key/value pairs
    DGRAM_TYPE_KEY = {
        "RAW": parsers.SimradRawParser(),
        "CON": parsers.SimradConfigParser(),
        "TAG": parsers.SimradAnnotationParser(),
        "NME": parsers.SimradNMEAParser(),
        "BOT": parsers.SimradBottomParser(),
        "DEP": parsers.SimradDepthParser(),
        "XML": parsers.SimradXMLParser(),
        "IDX": parsers.SimradIDXParser(),
        "FIL": parsers.SimradFILParser(),
        "MRU": parsers.SimradMRUParser(),
    }

    def __init__(
        self,
        name,
        mode="rb",
        closefd=True,
        return_raw=False,
        buffer_size=1024 * 1024,
        storage_options={},
    ):
        #  9-28-18 RHT: Changed RawSimradFile to implement BufferedReader instead of
        #  io.FileIO to increase performance.

        #  create a raw file object for the buffered reader
        fmap = fsspec.get_mapper(name, **storage_options)
        if isinstance(fmap.fs, LocalFileSystem):
            fio = FileIO(name, mode=mode, closefd=closefd)
        else:
            fio = fmap.fs.open(fmap.root)

        #  initialize the superclass
        super().__init__(fio, buffer_size=buffer_size)
        self._current_dgram_offset = 0
        self._total_dgram_count = None
        self._return_raw = return_raw

    def _seek_bytes(self, bytes_, whence=0):
        """
        :param bytes_: byte offset
        :type bytes_: int

        :param whence:

        Seeks a file by bytes instead of datagrams.
        """

        super().seek(bytes_, whence)

    def _tell_bytes(self):
        """
        Returns the file pointer position in bytes.
        """

        return super().tell()

    def _read_dgram_size(self):
        """
        Attempts to read the size of the next datagram in the file.
        """

        buf = self._read_bytes(4)
        if len(buf) != 4:
            self._seek_bytes(-len(buf), SEEK_CUR)
            raise DatagramReadError(
                "Short read while getting dgram size",
                (4, len(buf)),
                file_pos=(self._tell_bytes(), self.tell()),
            )
        else:
            return struct.unpack("=l", buf)[0]  # This return value is an int object.

    def _bytes_remaining(self):
        old_pos = self._tell_bytes()
        self._seek_bytes(0, SEEK_END)
        end_pos = self._tell_bytes()
        offset = end_pos - old_pos
        self._seek_bytes(old_pos, SEEK_SET)

        return offset

    def _read_timestamp(self):
        """
        Attempts to read the datagram timestamp.
        """

        buf = self._read_bytes(8)
        if len(buf) != 8:
            self._seek_bytes(-len(buf), SEEK_CUR)
            raise DatagramReadError(
                "Short read while getting timestamp",
                (8, len(buf)),
                file_pos=(self._tell_bytes(), self.tell()),
            )

        else:
            lowDateField, highDateField = struct.unpack("=2L", buf)
            #  11/26/19 - RHT - modified to return the raw bytes
            return lowDateField, highDateField, buf

    def _read_dgram_header(self):
        """
        :returns: dgram_size, dgram_type, (low_date, high_date)

        Attempts to read the datagram header consisting of:

            long        dgram_size
            char[4]     type
            long        lowDateField
            long        highDateField
        """

        try:
            dgram_size = self._read_dgram_size()
        except Exception:
            if self.at_eof():
                raise SimradEOF()
            else:
                raise

        #  get the datagram type
        buf = self._read_bytes(4)

        if len(buf) != 4:
            if self.at_eof():
                raise SimradEOF()
            else:
                self._seek_bytes(-len(buf), SEEK_CUR)
                raise DatagramReadError(
                    "Short read while getting dgram type",
                    (4, len(buf)),
                    file_pos=(self._tell_bytes(), self.tell()),
                )
        else:
            dgram_type = buf
        dgram_type = dgram_type.decode("latin_1")

        #  11/26/19 - RHT
        #  As part of the rewrite of read to remove the reverse seeking,
        #  store the raw header bytes so we can prepend them to the raw
        #  data bytes and pass it all to the parser.
        raw_bytes = buf

        #  read the timestamp - this method was also modified to return
        #  the raw bytes
        lowDateField, highDateField, buf = self._read_timestamp()

        #  add the timestamp bytes to the raw_bytes string
        raw_bytes += buf

        return dict(
            size=dgram_size,
            type=dgram_type,
            low_date=lowDateField,
            high_date=highDateField,
            raw_bytes=raw_bytes,
        )

    def _read_bytes(self, k):
        """
        Reads raw bytes from the file
        """

        return super().read(k)

    def _read_next_dgram(self):
        """
        Attempts to read the next datagram from the file.

        Returns the datagram as a raw string
        """

        #  11/26/19 - RHT - Modified this method so it doesn't "peek"
        #  at the next datagram before reading which was inefficient.
        #  To minimize changes to the code, methods to read the header
        #  and timestamp were modified to return the raw bytes which
        #  allows us to pass them onto the parser without having to
        #  rewind and read again as was previously done.

        #  store our current location in the file
        old_file_pos = self._tell_bytes()

        #  try to read the header of the next datagram
        try:
            header = self._read_dgram_header()
        except DatagramReadError as e:
            e.message = "Short read while getting raw file datagram header"
            raise e

        #  check for invalid time data
        if (header["low_date"], header["high_date"]) == (0, 0):
            logger.warning(
                "Skipping %s datagram w/ timestamp of (0, 0) at %sL:%d",
                header["type"],
                str(self._tell_bytes()),
                self.tell(),
            )
            self.skip()
            return self._read_next_dgram()

        #  basic sanity check on size
        if header["size"] < 16:
            #  size can't be smaller than the header size
            logger.warning(
                "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
                header["size"],
                header["type"],
                str((header["low_date"], header["high_date"])),
            )

            #  see if we can find the next datagram
            self._find_next_datagram()

            #  and then return that
            return self._read_next_dgram()

        #  get the raw bytes from the header
        raw_dgram = header["raw_bytes"]

        #  and append the rest of the datagram - we subtract 12
        #  since we have already read 12 bytes: 4 for type and
        #  8 for time.
        raw_dgram += self._read_bytes(header["size"] - 12)

        #  determine the size of the payload in bytes
        bytes_read = len(raw_dgram)

        #  and make sure it checks out
        if bytes_read < header["size"]:
            logger.warning(
                "Datagram %d (@%d) shorter than expected length:  %d < %d",
                self.tell(),
                old_file_pos,
                bytes_read,
                header["size"],
            )
            self._find_next_datagram()
            return self._read_next_dgram()

        #  now read the trailing size value
        try:
            dgram_size_check = self._read_dgram_size()
        except DatagramReadError as e:
            self._seek_bytes(old_file_pos, SEEK_SET)
            e.message = "Short read while getting trailing raw file datagram size for check"
            raise e

        #  make sure they match
        if header["size"] != dgram_size_check:
            # self._seek_bytes(old_file_pos, SEEK_SET)
            logger.warning(
                "Datagram failed size check:  %d != %d @ (%d, %d)",
                header["size"],
                dgram_size_check,
                self._tell_bytes(),
                self.tell(),
            )
            logger.warning("Skipping to next datagram...")
            self._find_next_datagram()

            return self._read_next_dgram()

        #  add the header (16 bytes) and repeated size (4 bytes) to the payload
        #  bytes to get the total bytes read for this datagram.
        bytes_read = bytes_read + 20

        if self._return_raw:
            self._current_dgram_offset += 1
            return raw_dgram
        else:
            nice_dgram = self._convert_raw_datagram(raw_dgram, bytes_read)
            self._current_dgram_offset += 1
            return nice_dgram

    def _convert_raw_datagram(self, raw_datagram_string, bytes_read):
        """
        :param raw_datagram_string: bytestring containing datagram (first 4
            bytes indicate datagram type, such as 'RAW0')
        :type raw_datagram_string: str

        :param bytes_read: integer specifying the datagram size, including header
            in bytes,
        :type bytes_read: int

        Returns a formatted datagram object using the data in raw_datagram_string
        """

        #  11/26/19 - RHT - Modified this method to pass through the number of
        #  bytes read so we can bubble that up to the user.

        dgram_type = raw_datagram_string[:3].decode()
        try:
            parser = self.DGRAM_TYPE_KEY[dgram_type]
        except KeyError:
            # raise KeyError('Unknown datagram type %s,
            # valid types: %s' % (str(dgram_type),
            # str(self.DGRAM_TYPE_KEY.keys())))
            return raw_datagram_string

        nice_dgram = parser.from_string(raw_datagram_string, bytes_read)
        return nice_dgram

    def _set_total_dgram_count(self):
        """
        Skips quickly through the file counting datagrams and stores the
        resulting number in self._total_dgram_count

        :raises: ValueError if self._total_dgram_count is not None (it has been set before)
        """
        if self._total_dgram_count is not None:
            raise ValueError(
                "self._total_dgram_count has already been set.  Call .reset() first if you really want to recount"  # noqa
            )

        # Save current position for later
        old_file_pos = self._tell_bytes()
        old_dgram_offset = self.tell()

        self._current_dgram_offset = 0
        self._seek_bytes(0, SEEK_SET)

        while True:
            try:
                self.skip()
            except (DatagramReadError, SimradEOF):
                self._total_dgram_count = self.tell()
                break

        # Return to where we started
        self._seek_bytes(old_file_pos, SEEK_SET)
        self._current_dgram_offset = old_dgram_offset

    def at_eof(self):
        old_pos = self._tell_bytes()
        self._seek_bytes(0, SEEK_END)
        eof_pos = self._tell_bytes()

        # Check to see if we're at the end of file and raise EOF
        if old_pos == eof_pos:
            return True

        # Othereise, go back to where we were and re-raise the original
        # exception
        else:
            offset = old_pos - eof_pos
            self._seek_bytes(offset, SEEK_END)
            return False

    def read(self, k):
        """
        :param k: Number of datagrams to read
        :type k: int

        Reads the next k datagrams.  A list of datagrams is returned if k > 1.  The entire
        file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning
        of file if previous datagrams were read)
        """

        if k == 1:
            try:
                return self._read_next_dgram()
            except Exception:
                if self.at_eof():
                    raise SimradEOF()
                else:
                    raise

        elif k > 0:
            dgram_list = []

            for m in range(k):
                try:
                    dgram = self._read_next_dgram()
                    dgram_list.append(dgram)

                except Exception:
                    break

            return dgram_list

        elif k < 0:
            return self.readall()

    def readall(self):
        """
        Reads the entire file from the beginning and returns a list of datagrams.
        """

        self.seek(0, SEEK_SET)
        dgram_list = []

        for raw_dgram in self.iter_dgrams():
            dgram_list.append(raw_dgram)

        return dgram_list

    def _find_next_datagram(self):
        old_file_pos = self._tell_bytes()
        logger.warning("Attempting to find next valid datagram...")

        try:
            while self.peek()["type"][:3] not in list(self.DGRAM_TYPE_KEY.keys()):
                self._seek_bytes(1, 1)
        except DatagramReadError:
            logger.warning("No next datagram found. Ending reading of file.")
            raise SimradEOF()
        else:
            logger.warning("Found next datagram:  %s", self.peek())
            logger.warning("Skipped ahead %d bytes", self._tell_bytes() - old_file_pos)

    def tell(self):
        """
        Returns the current file pointer offset by datagram number
        """
        return self._current_dgram_offset

    def peek(self):
        """
        Returns the header of the next datagram in the file.  The file position is
        reset back to the original location afterwards.

        :returns: [dgram_size, dgram_type, (low_date, high_date)]
        """

        dgram_header = self._read_dgram_header()
        if dgram_header["type"].startswith("RAW0"):
            dgram_header["channel"] = struct.unpack("h", self._read_bytes(2))[0]
            self._seek_bytes(-18, SEEK_CUR)
        elif dgram_header["type"].startswith("RAW3"):
            chan_id = struct.unpack("128s", self._read_bytes(128))
            dgram_header["channel_id"] = chan_id.strip("\x00")
            self._seek_bytes(-(16 + 128), SEEK_CUR)
        else:
            self._seek_bytes(-16, SEEK_CUR)

        return dgram_header

    def __next__(self):
        """
        Returns the next datagram (synonymous with self.read(1))
        """

        return self.read(1)

    def prev(self):
        """
        Returns the previous datagram 'behind' the current file pointer position
        """

        self.skip_back()
        raw_dgram = self.read(1)
        self.skip_back()
        return raw_dgram

    def skip(self):
        """
        Skips forward to the next datagram without reading the contents of the current one
        """

        # dgram_size, dgram_type, (low_date, high_date) = self.peek()[:3]

        header = self.peek()

        if header["size"] < 16:
            logger.warning(
                "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
                header["size"],
                header["type"],
                str((header["low_date"], header["high_date"])),
            )

            self._find_next_datagram()

        else:
            self._seek_bytes(header["size"] + 4, SEEK_CUR)
            dgram_size_check = self._read_dgram_size()

            if header["size"] != dgram_size_check:
                logger.warning(
                    "Datagram failed size check:  %d != %d @ (%d, %d)",
                    header["size"],
                    dgram_size_check,
                    self._tell_bytes(),
                    self.tell(),
                )
                logger.warning("Skipping to next datagram... (in skip)")

                self._find_next_datagram()

        self._current_dgram_offset += 1

    def skip_back(self):
        """
        Skips backwards to the previous datagram without reading it's contents
        """

        old_file_pos = self._tell_bytes()

        try:
            self._seek_bytes(-4, SEEK_CUR)
        except IOError:
            raise

        dgram_size_check = self._read_dgram_size()

        # Seek to the beginning of the datagram and read as normal
        try:
            self._seek_bytes(-(8 + dgram_size_check), SEEK_CUR)
        except IOError:
            raise DatagramSizeError

        try:
            dgram_size = self._read_dgram_size()

        except DatagramSizeError:
            logger.info("Error reading the datagram")
            self._seek_bytes(old_file_pos, SEEK_SET)
            raise

        if dgram_size_check != dgram_size:
            self._seek_bytes(old_file_pos, SEEK_SET)
            raise DatagramSizeError
        else:
            self._seek_bytes(-4, SEEK_CUR)

        self._current_dgram_offset -= 1

    def iter_dgrams(self):
        """
        Iterates through the file, repeatedly calling self.next() until
        the end of file is reached
        """

        while True:
            # new_dgram = self.next()
            # yield new_dgram

            try:
                new_dgram = next(self)
            except Exception:
                logger.debug("Caught EOF?")
                raise StopIteration

            yield new_dgram

    # Unsupported members
    def readline(self):
        """
        aliased to self.next()
        """
        return next(self)

    def readlines(self):
        """
        aliased to self.read(-1)
        """
        return self.read(-1)

    def seek(self, offset, whence):
        """
        Performs the familiar 'seek' operation using datagram offsets
        instead of raw bytes.
        """

        if whence == SEEK_SET:
            if offset < 0:
                raise ValueError("Cannot seek backwards from beginning of file")
            else:
                self._seek_bytes(0, SEEK_SET)
                self._current_dgram_offset = 0
        elif whence == SEEK_END:
            if offset > 0:
                raise ValueError("Use negative offsets when seeking backward from end of file")

            # Do we need to generate the total number of datagrams w/in the file?
            try:
                self._set_total_dgram_count()
                # Throws a value error if _total_dgram_count has already been set.  We can ignore it
            except ValueError:
                pass

            self._seek_bytes(0, SEEK_END)
            self._current_dgram_offset = self._total_dgram_count

        elif whence == SEEK_CUR:
            pass
        else:
            raise ValueError(
                "Illegal value for 'whence' (%s), use 0 (beginning), 1 (current), or 2 (end)"
                % (str(whence))
            )

        if offset > 0:
            for k in range(offset):
                self.skip()
        elif offset < 0:
            for k in range(-offset):
                self.skip_back()

    def reset(self):
        self._current_dgram_offset = 0
        self._total_dgram_count = None
        self._seek_bytes(0, SEEK_SET)
__next__()

Returns the next datagram (synonymous with self.read(1))

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def __next__(self):
    """
    Returns the next datagram (synonymous with self.read(1))
    """

    return self.read(1)
iter_dgrams()

Iterates through the file, repeatedly calling self.next() until the end of file is reached

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def iter_dgrams(self):
    """
    Iterates through the file, repeatedly calling self.next() until
    the end of file is reached
    """

    while True:
        # new_dgram = self.next()
        # yield new_dgram

        try:
            new_dgram = next(self)
        except Exception:
            logger.debug("Caught EOF?")
            raise StopIteration

        yield new_dgram
peek()

Returns the header of the next datagram in the file. The file position is reset back to the original location afterwards.

:returns: [dgram_size, dgram_type, (low_date, high_date)]

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def peek(self):
    """
    Returns the header of the next datagram in the file.  The file position is
    reset back to the original location afterwards.

    :returns: [dgram_size, dgram_type, (low_date, high_date)]
    """

    dgram_header = self._read_dgram_header()
    if dgram_header["type"].startswith("RAW0"):
        dgram_header["channel"] = struct.unpack("h", self._read_bytes(2))[0]
        self._seek_bytes(-18, SEEK_CUR)
    elif dgram_header["type"].startswith("RAW3"):
        chan_id = struct.unpack("128s", self._read_bytes(128))
        dgram_header["channel_id"] = chan_id.strip("\x00")
        self._seek_bytes(-(16 + 128), SEEK_CUR)
    else:
        self._seek_bytes(-16, SEEK_CUR)

    return dgram_header
prev()

Returns the previous datagram 'behind' the current file pointer position

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def prev(self):
    """
    Returns the previous datagram 'behind' the current file pointer position
    """

    self.skip_back()
    raw_dgram = self.read(1)
    self.skip_back()
    return raw_dgram
read(k)

:param k: Number of datagrams to read :type k: int

Reads the next k datagrams. A list of datagrams is returned if k > 1. The entire file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning of file if previous datagrams were read)

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def read(self, k):
    """
    :param k: Number of datagrams to read
    :type k: int

    Reads the next k datagrams.  A list of datagrams is returned if k > 1.  The entire
    file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning
    of file if previous datagrams were read)
    """

    if k == 1:
        try:
            return self._read_next_dgram()
        except Exception:
            if self.at_eof():
                raise SimradEOF()
            else:
                raise

    elif k > 0:
        dgram_list = []

        for m in range(k):
            try:
                dgram = self._read_next_dgram()
                dgram_list.append(dgram)

            except Exception:
                break

        return dgram_list

    elif k < 0:
        return self.readall()
readall()

Reads the entire file from the beginning and returns a list of datagrams.

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def readall(self):
    """
    Reads the entire file from the beginning and returns a list of datagrams.
    """

    self.seek(0, SEEK_SET)
    dgram_list = []

    for raw_dgram in self.iter_dgrams():
        dgram_list.append(raw_dgram)

    return dgram_list
readline()

aliased to self.next()

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def readline(self):
    """
    aliased to self.next()
    """
    return next(self)
readlines()

aliased to self.read(-1)

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def readlines(self):
    """
    aliased to self.read(-1)
    """
    return self.read(-1)
seek(offset, whence)

Performs the familiar 'seek' operation using datagram offsets instead of raw bytes.

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def seek(self, offset, whence):
    """
    Performs the familiar 'seek' operation using datagram offsets
    instead of raw bytes.
    """

    if whence == SEEK_SET:
        if offset < 0:
            raise ValueError("Cannot seek backwards from beginning of file")
        else:
            self._seek_bytes(0, SEEK_SET)
            self._current_dgram_offset = 0
    elif whence == SEEK_END:
        if offset > 0:
            raise ValueError("Use negative offsets when seeking backward from end of file")

        # Do we need to generate the total number of datagrams w/in the file?
        try:
            self._set_total_dgram_count()
            # Throws a value error if _total_dgram_count has already been set.  We can ignore it
        except ValueError:
            pass

        self._seek_bytes(0, SEEK_END)
        self._current_dgram_offset = self._total_dgram_count

    elif whence == SEEK_CUR:
        pass
    else:
        raise ValueError(
            "Illegal value for 'whence' (%s), use 0 (beginning), 1 (current), or 2 (end)"
            % (str(whence))
        )

    if offset > 0:
        for k in range(offset):
            self.skip()
    elif offset < 0:
        for k in range(-offset):
            self.skip_back()
skip()

Skips forward to the next datagram without reading the contents of the current one

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def skip(self):
    """
    Skips forward to the next datagram without reading the contents of the current one
    """

    # dgram_size, dgram_type, (low_date, high_date) = self.peek()[:3]

    header = self.peek()

    if header["size"] < 16:
        logger.warning(
            "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
            header["size"],
            header["type"],
            str((header["low_date"], header["high_date"])),
        )

        self._find_next_datagram()

    else:
        self._seek_bytes(header["size"] + 4, SEEK_CUR)
        dgram_size_check = self._read_dgram_size()

        if header["size"] != dgram_size_check:
            logger.warning(
                "Datagram failed size check:  %d != %d @ (%d, %d)",
                header["size"],
                dgram_size_check,
                self._tell_bytes(),
                self.tell(),
            )
            logger.warning("Skipping to next datagram... (in skip)")

            self._find_next_datagram()

    self._current_dgram_offset += 1
skip_back()

Skips backwards to the previous datagram without reading it's contents

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def skip_back(self):
    """
    Skips backwards to the previous datagram without reading it's contents
    """

    old_file_pos = self._tell_bytes()

    try:
        self._seek_bytes(-4, SEEK_CUR)
    except IOError:
        raise

    dgram_size_check = self._read_dgram_size()

    # Seek to the beginning of the datagram and read as normal
    try:
        self._seek_bytes(-(8 + dgram_size_check), SEEK_CUR)
    except IOError:
        raise DatagramSizeError

    try:
        dgram_size = self._read_dgram_size()

    except DatagramSizeError:
        logger.info("Error reading the datagram")
        self._seek_bytes(old_file_pos, SEEK_SET)
        raise

    if dgram_size_check != dgram_size:
        self._seek_bytes(old_file_pos, SEEK_SET)
        raise DatagramSizeError
    else:
        self._seek_bytes(-4, SEEK_CUR)

    self._current_dgram_offset -= 1
tell()

Returns the current file pointer offset by datagram number

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py
def tell(self):
    """
    Returns the current file pointer offset by datagram number
    """
    return self._current_dgram_offset

ek_raw_parsers

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

The code has been modified to handle split-beam data and channel-transducer structure from different EK80 setups.

Classes:

Name Description
SimradAnnotationParser

ER60 Annotation datagram contains the following keys:

SimradBottomParser

Bottom Detection datagram contains the following keys:

SimradConfigParser

Simrad Configuration Datagram parser operates on dictionaries with the following keys:

SimradDepthParser

ER60 Depth Detection datagram (from .bot files) contain the following keys:

SimradNMEAParser

ER60 NMEA datagram contains the following keys:

SimradRawParser

Sample Data Datagram parser operates on dictionaries with the following keys:

SimradAnnotationParser

Bases: _SimradDatagramParser

ER60 Annotation datagram contains the following keys:

type:         string == 'TAG0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:     datetime.datetime object of NT date, assumed to be UTC

text:         Annotation

The following methods are defined:

from_string(str):    parse a raw ER60 Annotation datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradAnnotationParser(_SimradDatagramParser):
    """
    ER60 Annotation datagram contains the following keys:


        type:         string == 'TAG0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:     datetime.datetime object of NT date, assumed to be UTC

        text:         Annotation

    The following methods are defined:

        from_string(str):    parse a raw ER60 Annotation datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")]}

        _SimradDatagramParser.__init__(self, "TAG", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        #        if version == 0:
        #            data['text'] = raw_string[self.header_size(version):].strip('\x00')
        #            if isinstance(data['text'], bytes):
        #                data['text'] = data['text'].decode()

        if version == 0:
            if sys.version_info.major > 2:
                data["text"] = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                data["text"] = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["text"][-1] != "\x00":
                tmp_string = data["text"] + "\x00"
            else:
                tmp_string = data["text"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))
            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

SimradBottomParser

Bases: _SimradDatagramParser

Bottom Detection datagram contains the following keys:

type:         string == 'BOT0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
datetime:     datetime.datetime object of NT date converted to UTC
transceiver_count:  long uint with number of transceivers
depth:        [float], one value for each active channel

The following methods are defined:

from_string(str):    parse a raw ER60 Bottom datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradBottomParser(_SimradDatagramParser):
    """
    Bottom Detection datagram contains the following keys:

        type:         string == 'BOT0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        datetime:     datetime.datetime object of NT date converted to UTC
        transceiver_count:  long uint with number of transceivers
        depth:        [float], one value for each active channel

    The following methods are defined:

        from_string(str):    parse a raw ER60 Bottom datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("transceiver_count", "L"),
            ]
        }
        _SimradDatagramParser.__init__(self, "BOT", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            depth_fmt = "=%dd" % (data["transceiver_count"],)
            depth_size = struct.calcsize(depth_fmt)
            buf_indx = self.header_size(version)
            data["depth"] = np.fromiter(
                struct.unpack(depth_fmt, raw_string[buf_indx : buf_indx + depth_size]),  # noqa
                "float",
            )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            if len(data["depth"]) != data["transceiver_count"]:
                logger.warning(
                    "# of depth values %d does not match transceiver count %d",
                    len(data["depth"]),
                    data["transceiver_count"],
                )

                data["transceiver_count"] = len(data["depth"])

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%dd" % (data["transceiver_count"])
            datagram_contents.extend(data["depth"])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradConfigParser

Bases: _SimradDatagramParser

Simrad Configuration Datagram parser operates on dictionaries with the following keys:

type:         string == 'CON0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC

survey_name                     [str]
transect_name                   [str]
sounder_name                    [str]
version                         [str]
spare0                          [str]
transceiver_count               [long]
transceivers                    [list] List of dicts representing Transducer Configs:

ME70 Data contains the following additional values (data contained w/in first 14
    bytes of the spare0 field)

multiplexing                    [short]  Always 0
time_bias                       [long] difference between UTC and local time in min.
sound_velocity_avg              [float] [m/s]
sound_velocity_transducer       [float] [m/s]
beam_config                     [str] Raw XML string containing beam config. info

Transducer Config Keys (ER60/ES60/ES70 sounders): channel_id [str] channel ident string beam_type [long] Type of channel (0 = Single, 1 = Split) frequency [float] channel frequency equivalent_beam_angle [float] dB beamwidth_alongship [float] beamwidth_athwartship [float] angle_sensitivity_alongship [float] angle_sensitivity_athwartship [float] angle_offset_alongship [float] angle_offset_athwartship [float] pos_x [float] pos_y [float] pos_z [float] dir_x [float] dir_y [float] dir_z [float] pulse_length_table [float[5]] spare1 [str] gain_table [float[5]] spare2 [str] sa_correction_table [float[5]] spare3 [str] gpt_software_version [str] spare4 [str]

Transducer Config Keys (ME70 sounders): channel_id [str] channel ident string beam_type [long] Type of channel (0 = Single, 1 = Split) reserved1 [float] channel frequency equivalent_beam_angle [float] dB beamwidth_alongship [float] beamwidth_athwartship [float] angle_sensitivity_alongship [float] angle_sensitivity_athwartship [float] angle_offset_alongship [float] angle_offset_athwartship [float] pos_x [float] pos_y [float] pos_z [float] beam_steering_angle_alongship [float] beam_steering_angle_athwartship [float] beam_steering_angle_unused [float] pulse_length [float] reserved2 [float] spare1 [str] gain [float] reserved3 [float] spare2 [str] sa_correction [float] reserved4 [float] spare3 [str] gpt_software_version [str] spare4 [str]

from_string(str): parse a raw config datagram (with leading/trailing datagram size stripped)

to_string(dict): Returns raw string (including leading/trailing size fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
class SimradConfigParser(_SimradDatagramParser):
    """
    Simrad Configuration Datagram parser operates on dictionaries with the following keys:

        type:         string == 'CON0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC

        survey_name                     [str]
        transect_name                   [str]
        sounder_name                    [str]
        version                         [str]
        spare0                          [str]
        transceiver_count               [long]
        transceivers                    [list] List of dicts representing Transducer Configs:

        ME70 Data contains the following additional values (data contained w/in first 14
            bytes of the spare0 field)

        multiplexing                    [short]  Always 0
        time_bias                       [long] difference between UTC and local time in min.
        sound_velocity_avg              [float] [m/s]
        sound_velocity_transducer       [float] [m/s]
        beam_config                     [str] Raw XML string containing beam config. info


    Transducer Config Keys (ER60/ES60/ES70 sounders):
        channel_id                      [str]   channel ident string
        beam_type                       [long]  Type of channel (0 = Single, 1 = Split)
        frequency                       [float] channel frequency
        equivalent_beam_angle           [float] dB
        beamwidth_alongship             [float]
        beamwidth_athwartship           [float]
        angle_sensitivity_alongship     [float]
        angle_sensitivity_athwartship   [float]
        angle_offset_alongship          [float]
        angle_offset_athwartship        [float]
        pos_x                           [float]
        pos_y                           [float]
        pos_z                           [float]
        dir_x                           [float]
        dir_y                           [float]
        dir_z                           [float]
        pulse_length_table              [float[5]]
        spare1                          [str]
        gain_table                      [float[5]]
        spare2                          [str]
        sa_correction_table             [float[5]]
        spare3                          [str]
        gpt_software_version            [str]
        spare4                          [str]

    Transducer Config Keys (ME70 sounders):
        channel_id                      [str]   channel ident string
        beam_type                       [long]  Type of channel (0 = Single, 1 = Split)
        reserved1                       [float] channel frequency
        equivalent_beam_angle           [float] dB
        beamwidth_alongship             [float]
        beamwidth_athwartship           [float]
        angle_sensitivity_alongship     [float]
        angle_sensitivity_athwartship   [float]
        angle_offset_alongship          [float]
        angle_offset_athwartship        [float]
        pos_x                           [float]
        pos_y                           [float]
        pos_z                           [float]
        beam_steering_angle_alongship   [float]
        beam_steering_angle_athwartship [float]
        beam_steering_angle_unused      [float]
        pulse_length                    [float]
        reserved2                       [float]
        spare1                          [str]
        gain                            [float]
        reserved3                       [float]
        spare2                          [str]
        sa_correction                   [float]
        reserved4                       [float]
        spare3                          [str]
        gpt_software_version            [str]
        spare4                          [str]

    from_string(str):   parse a raw config datagram
                        (with leading/trailing datagram size stripped)

    to_string(dict):    Returns raw string (including leading/trailing size fields)
                        ready for writing to disk
    """

    COMMON_KEYS = [
        ("channel_id", "128s"),
        ("beam_type", "l"),
        ("frequency", "f"),
        ("gain", "f"),
        ("equivalent_beam_angle", "f"),
        ("beamwidth_alongship", "f"),
        ("beamwidth_athwartship", "f"),
        ("angle_sensitivity_alongship", "f"),
        ("angle_sensitivity_athwartship", "f"),
        ("angle_offset_alongship", "f"),
        ("angle_offset_athwartship", "f"),
        ("pos_x", "f"),
        ("pos_y", "f"),
        ("pos_z", "f"),
        ("dir_x", "f"),
        ("dir_y", "f"),
        ("dir_z", "f"),
        ("pulse_length_table", "5f"),
        ("spare1", "8s"),
        ("gain_table", "5f"),
        ("spare2", "8s"),
        ("sa_correction_table", "5f"),
        ("spare3", "8s"),
        ("gpt_software_version", "16s"),
        ("spare4", "28s"),
    ]

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("survey_name", "128s"),
                ("transect_name", "128s"),
                ("sounder_name", "128s"),
                ("version", "30s"),
                ("spare0", "98s"),
                ("transceiver_count", "l"),
            ],
            1: [("type", "4s"), ("low_date", "L"), ("high_date", "L")],
        }

        _SimradDatagramParser.__init__(self, "CON", headers)

        self._transducer_headers = {
            "ER60": self.COMMON_KEYS,
            "ES60": self.COMMON_KEYS,
            "ES70": self.COMMON_KEYS,
            "MBES": [
                ("channel_id", "128s"),
                ("beam_type", "l"),
                ("frequency", "f"),
                ("reserved1", "f"),
                ("equivalent_beam_angle", "f"),
                ("beamwidth_alongship", "f"),
                ("beamwidth_athwartship", "f"),
                ("angle_sensitivity_alongship", "f"),
                ("angle_sensitivity_athwartship", "f"),
                ("angle_offset_alongship", "f"),
                ("angle_offset_athwartship", "f"),
                ("pos_x", "f"),
                ("pos_y", "f"),
                ("pos_z", "f"),
                ("beam_steering_angle_alongship", "f"),
                ("beam_steering_angle_athwartship", "f"),
                ("beam_steering_angle_unused", "f"),
                ("pulse_length", "f"),
                ("reserved2", "f"),
                ("spare1", "20s"),
                ("gain", "f"),
                ("reserved3", "f"),
                ("spare2", "20s"),
                ("sa_correction", "f"),
                ("reserved4", "f"),
                ("spare3", "20s"),
                ("gpt_software_version", "16s"),
                ("spare4", "28s"),
            ],
        }

    def _unpack_contents(self, raw_string, bytes_read, version):
        data = {}
        round6 = lambda x: round(x, ndigits=6)  # noqa
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]

            #  handle Python 3 strings
            if (sys.version_info.major > 2) and isinstance(data[field], bytes):
                data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            data["transceivers"] = {}

            for field in ["transect_name", "version", "survey_name", "sounder_name"]:
                data[field] = data[field].strip("\x00")

            sounder_name = data["sounder_name"]
            if sounder_name == "MBES":
                _me70_extra_values = struct.unpack("=hLff", data["spare0"][:14])
                data["multiplexing"] = _me70_extra_values[0]
                data["time_bias"] = _me70_extra_values[1]
                data["sound_velocity_avg"] = _me70_extra_values[2]
                data["sound_velocity_transducer"] = _me70_extra_values[3]
                data["spare0"] = data["spare0"][:14] + data["spare0"][14:].strip("\x00")

            else:
                data["spare0"] = data["spare0"].strip("\x00")

            buf_indx = self.header_size(version)

            try:
                transducer_header = self._transducer_headers[sounder_name]
                _sounder_name_used = sounder_name
            except KeyError:
                logger.warning(
                    "Unknown sounder_name:  %s, (no one of %s)",
                    sounder_name,
                    list(self._transducer_headers.keys()),
                )
                logger.warning("Will use ER60 transducer config fields as default")

                transducer_header = self._transducer_headers["ER60"]
                _sounder_name_used = "ER60"

            txcvr_header_fields = [x[0] for x in transducer_header]
            txcvr_header_fmt = "=" + "".join([x[1] for x in transducer_header])
            txcvr_header_size = struct.calcsize(txcvr_header_fmt)

            for txcvr_indx in range(1, data["transceiver_count"] + 1):
                txcvr_header_values_encoded = struct.unpack(
                    txcvr_header_fmt,
                    raw_string[buf_indx : buf_indx + txcvr_header_size],  # noqa
                )
                txcvr_header_values = list(txcvr_header_values_encoded)
                for tx_idx, tx_val in enumerate(txcvr_header_values_encoded):
                    if isinstance(tx_val, bytes):
                        txcvr_header_values[tx_idx] = tx_val.decode("latin_1")

                txcvr = data["transceivers"].setdefault(txcvr_indx, {})

                if _sounder_name_used in ["ER60", "ES60", "ES70"]:
                    for txcvr_field_indx, field in enumerate(txcvr_header_fields[:17]):
                        txcvr[field] = txcvr_header_values[txcvr_field_indx]

                    txcvr["pulse_length_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[17:22])), "float"
                    )
                    txcvr["spare1"] = txcvr_header_values[22]
                    txcvr["gain_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[23:28])), "float"
                    )
                    txcvr["spare2"] = txcvr_header_values[28]
                    txcvr["sa_correction_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[29:34])), "float"
                    )
                    txcvr["spare3"] = txcvr_header_values[34]
                    txcvr["gpt_software_version"] = txcvr_header_values[35]
                    txcvr["spare4"] = txcvr_header_values[36]

                elif _sounder_name_used == "MBES":
                    for txcvr_field_indx, field in enumerate(txcvr_header_fields):
                        txcvr[field] = txcvr_header_values[txcvr_field_indx]

                else:
                    raise RuntimeError(
                        "Unknown _sounder_name_used (Should not happen, this is a bug!)"
                    )

                txcvr["channel_id"] = txcvr["channel_id"].strip("\x00")
                txcvr["spare1"] = txcvr["spare1"].strip("\x00")
                txcvr["spare2"] = txcvr["spare2"].strip("\x00")
                txcvr["spare3"] = txcvr["spare3"].strip("\x00")
                txcvr["spare4"] = txcvr["spare4"].strip("\x00")
                txcvr["gpt_software_version"] = txcvr["gpt_software_version"].strip("\x00")

                buf_indx += txcvr_header_size

        elif version == 1:
            # CON1 only has a single data field:  beam_config, holding an xml string
            data["beam_config"] = raw_string[self.header_size(version) :].strip("\x00")

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            if data["transceiver_count"] != len(data["transceivers"]):
                logger.warning("Mismatch between 'transceiver_count' and actual # of transceivers")
                data["transceiver_count"] = len(data["transceivers"])

            sounder_name = data["sounder_name"]
            if sounder_name == "MBES":
                _packed_me70_values = struct.pack(
                    "=hLff",
                    data["multiplexing"],
                    data["time_bias"],
                    data["sound_velocity_avg"],
                    data["sound_velocity_transducer"],
                )
                data["spare0"] = _packed_me70_values + data["spare0"][14:]

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            try:
                transducer_header = self._transducer_headers[sounder_name]
                _sounder_name_used = sounder_name
            except KeyError:
                logger.warning(
                    "Unknown sounder_name:  %s, (no one of %s)",
                    sounder_name,
                    list(self._transducer_headers.keys()),
                )
                logger.warning("Will use ER60 transducer config fields as default")

                transducer_header = self._transducer_headers["ER60"]
                _sounder_name_used = "ER60"

            txcvr_header_fields = [x[0] for x in transducer_header]
            txcvr_header_fmt = "=" + "".join([x[1] for x in transducer_header])
            txcvr_header_size = struct.calcsize(txcvr_header_fmt)  # noqa

            for txcvr_indx, txcvr in list(data["transceivers"].items()):
                txcvr_contents = []

                if _sounder_name_used in ["ER60", "ES60", "ES70"]:
                    for field in txcvr_header_fields[:17]:
                        txcvr_contents.append(txcvr[field])

                    txcvr_contents.extend(txcvr["pulse_length_table"])
                    txcvr_contents.append(txcvr["spare1"])

                    txcvr_contents.extend(txcvr["gain_table"])
                    txcvr_contents.append(txcvr["spare2"])

                    txcvr_contents.extend(txcvr["sa_correction_table"])
                    txcvr_contents.append(txcvr["spare3"])

                    txcvr_contents.extend([txcvr["gpt_software_version"], txcvr["spare4"]])

                    txcvr_contents_str = struct.pack(txcvr_header_fmt, *txcvr_contents)

                elif _sounder_name_used == "MBES":
                    for field in txcvr_header_fields:
                        txcvr_contents.append(txcvr[field])

                    txcvr_contents_str = struct.pack(txcvr_header_fmt, *txcvr_contents)

                else:
                    raise RuntimeError(
                        "Unknown _sounder_name_used (Should not happen, this is a bug!)"
                    )

                datagram_fmt += "%ds" % (len(txcvr_contents_str))
                datagram_contents.append(txcvr_contents_str)

        elif version == 1:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%ds" % (len(data["beam_config"]))
            datagram_contents.append(data["beam_config"])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradDepthParser

Bases: _SimradDatagramParser

ER60 Depth Detection datagram (from .bot files) contain the following keys:

type:         string == 'DEP0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
transceiver_count:  [long uint] with number of transceivers

depth:        [float], one value for each active channel
reflectivity: [float], one value for each active channel
unused:       [float], unused value for each active channel

The following methods are defined:

from_string(str):    parse a raw ER60 Depth datagram
                     (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradDepthParser(_SimradDatagramParser):
    """
    ER60 Depth Detection datagram (from .bot files) contain the following keys:

        type:         string == 'DEP0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        transceiver_count:  [long uint] with number of transceivers

        depth:        [float], one value for each active channel
        reflectivity: [float], one value for each active channel
        unused:       [float], unused value for each active channel

    The following methods are defined:

        from_string(str):    parse a raw ER60 Depth datagram
                             (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk

    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("transceiver_count", "L"),
            ]
        }
        _SimradDatagramParser.__init__(self, "DEP", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            data_fmt = "=3f"
            data_size = struct.calcsize(data_fmt)

            data["depth"] = np.zeros((data["transceiver_count"],))
            data["reflectivity"] = np.zeros((data["transceiver_count"],))
            data["unused"] = np.zeros((data["transceiver_count"],))

            buf_indx = self.header_size(version)
            for indx in range(data["transceiver_count"]):
                d, r, u = struct.unpack(
                    data_fmt, raw_string[buf_indx : buf_indx + data_size]  # noqa
                )
                data["depth"][indx] = d
                data["reflectivity"][indx] = r
                data["unused"][indx] = u

                buf_indx += data_size

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            lengths = [
                len(data["depth"]),
                len(data["reflectivity"]),
                len(data["unused"]),
                data["transceiver_count"],
            ]

            if len(set(lengths)) != 1:
                min_indx = min(lengths)
                logger.warning("Data lengths mismatched:  d:%d, r:%d, u:%d, t:%d", *lengths)
                logger.warning("  Using minimum value:  %d", min_indx)
                data["transceiver_count"] = min_indx

            else:
                min_indx = data["transceiver_count"]

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%df" % (3 * data["transceiver_count"])

            for indx in range(data["transceiver_count"]):
                datagram_contents.extend(
                    [
                        data["depth"][indx],
                        data["reflectivity"][indx],
                        data["unused"][indx],
                    ]
                )

        return struct.pack(datagram_fmt, *datagram_contents)

SimradFILParser

Bases: _SimradDatagramParser

EK80 FIL datagram contains the following keys:

type:               string == 'FIL1'
low_date:           long uint representing LSBytes of 64bit NT date
high_date:          long uint representing MSBytes of 64bit NT date
timestamp:          datetime.datetime object of NT date, assumed to be UTC
stage:              int
channel_id:         string
n_coefficients:     int
decimation_factor:  int
coefficients:       np.complex64

The following methods are defined:

from_string(str):    parse a raw EK80 FIL datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                    (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradFILParser(_SimradDatagramParser):
    """
    EK80 FIL datagram contains the following keys:


        type:               string == 'FIL1'
        low_date:           long uint representing LSBytes of 64bit NT date
        high_date:          long uint representing MSBytes of 64bit NT date
        timestamp:          datetime.datetime object of NT date, assumed to be UTC
        stage:              int
        channel_id:         string
        n_coefficients:     int
        decimation_factor:  int
        coefficients:       np.complex64

    The following methods are defined:

        from_string(str):    parse a raw EK80 FIL datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                            (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {
            1: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("stage", "h"),
                ("spare", "2s"),
                ("channel_id", "128s"),
                ("n_coefficients", "h"),
                ("decimation_factor", "h"),
            ]
        }

        _SimradDatagramParser.__init__(self, "FIL", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        data = {}
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]

            #  handle Python 3 strings
            if (sys.version_info.major > 2) and isinstance(data[field], bytes):
                data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 1:
            #  clean up the channel ID
            data["channel_id"] = data["channel_id"].strip("\x00")

            #  unpack the coefficients
            indx = self.header_size(version)
            block_size = data["n_coefficients"] * 8
            data["coefficients"] = np.frombuffer(
                raw_string[indx : indx + block_size], dtype="complex64"  # noqa
            )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            pass

        elif version == 1:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%ds" % (len(data["beam_config"]))
            datagram_contents.append(data["beam_config"])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradIDXParser

Bases: _SimradDatagramParser

ER60/EK80 IDX datagram contains the following keys:

type:         string == 'IDX0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
ping_number:  int
distance :    float
latitude:     float
longitude:    float
file_offset:  int

The following methods are defined:

from_string(str):   Parse a raw ER60/EK80 IDX datagram
                    (with leading/trailing datagram size stripped)

to_string():    Returns the datagram as a raw string (including leading/trailing size
                fields) ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradIDXParser(_SimradDatagramParser):
    """
    ER60/EK80 IDX datagram contains the following keys:


        type:         string == 'IDX0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        ping_number:  int
        distance :    float
        latitude:     float
        longitude:    float
        file_offset:  int

    The following methods are defined:

        from_string(str):   Parse a raw ER60/EK80 IDX datagram
                            (with leading/trailing datagram size stripped)

        to_string():    Returns the datagram as a raw string (including leading/trailing size
                        fields) ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                # ('dummy', 'L'),   # There are 4 extra bytes in this datagram
                ("ping_number", "L"),
                ("distance", "d"),
                ("latitude", "d"),
                ("longitude", "d"),
                ("file_offset", "L"),
            ]
        }

        _SimradDatagramParser.__init__(self, "IDX", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Unpacks the data in raw_string into dictionary containing IDX data

        :param raw_string:
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                #  first try to decode as utf-8 but fall back to latin_1 if that fails
                try:
                    data[field] = data[field].decode("utf-8")
                except:
                    data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["timestamp"] = data["timestamp"].replace(tzinfo=None)
        data["bytes_read"] = bytes_read

        return data

    def _pack_contents(self, data, version):

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:

            for field in self.header_fields(version):
                if isinstance(data[field], str):
                    data[field] = data[field].encode("latin_1")
                datagram_contents.append(data[field])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradMRUParser

Bases: _SimradDatagramParser

EK80 MRU datagram contains the following keys:

type:         string == 'MRU0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
heave:        float
roll :        float
pitch:        float
heading:      float

Version 1 contains (from https://www3.mbari.org/products/mbsystem/formatdoc/KongsbergKmall/EMdgmFormat_RevH/html/kmBinary.html): # noqa

Status word See 1) uint32 4U Latitude deg double 8F Longitude deg double 8F Ellipsoid height m float 4F Roll deg float 4F Pitch deg float 4F Heading deg float 4F Heave m float 4F Roll rate deg/s float 4F Pitch rate deg/s float 4F Yaw rate deg/s float 4F North velocity m/s float 4F East velocity m/s float 4F Down velocity m/s float 4F Latitude error m float 4F Longitude error m float 4F Height error m float 4F Roll error deg float 4F Pitch error deg float 4F Heading error deg float 4F Heave error m float 4F North acceleration m/s2 float 4F East acceleration m/s2 float 4F Down acceleration m/s2 float 4F Delayed heave: - - - UTC seconds s uint32 4U UTC nanoseconds ns uint32 4U Delayed heave m float 4F

The following methods are defined:

from_string(str):   parse a raw EK80 MRU datagram
                    (with leading/trailing datagram size stripped)

to_string():        Returns the datagram as a raw string (including
                    leading/trailing size fields) ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradMRUParser(_SimradDatagramParser):
    """
    EK80 MRU datagram contains the following keys:


        type:         string == 'MRU0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        heave:        float
        roll :        float
        pitch:        float
        heading:      float

    Version 1 contains (from https://www3.mbari.org/products/mbsystem/formatdoc/KongsbergKmall/EMdgmFormat_RevH/html/kmBinary.html): # noqa

    Status word See 1)  uint32  4U
    Latitude    deg double  8F
    Longitude   deg double  8F
    Ellipsoid height    m   float   4F
    Roll    deg float   4F
    Pitch   deg float   4F
    Heading deg float   4F
    Heave   m   float   4F
    Roll rate   deg/s   float   4F
    Pitch rate  deg/s   float   4F
    Yaw rate    deg/s   float   4F
    North velocity  m/s float   4F
    East velocity   m/s float   4F
    Down velocity   m/s float   4F
    Latitude error  m   float   4F
    Longitude error m   float   4F
    Height error    m   float   4F
    Roll error  deg float   4F
    Pitch error deg float   4F
    Heading error   deg float   4F
    Heave error m   float   4F
    North acceleration  m/s2    float   4F
    East acceleration   m/s2    float   4F
    Down acceleration   m/s2    float   4F
    Delayed heave:  -   -   -
    UTC seconds s   uint32  4U
    UTC nanoseconds ns  uint32  4U
    Delayed heave   m   float   4F

    The following methods are defined:

        from_string(str):   parse a raw EK80 MRU datagram
                            (with leading/trailing datagram size stripped)

        to_string():        Returns the datagram as a raw string (including
                            leading/trailing size fields) ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("heave", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("heading", "f"),
            ],
            1: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("start_id", "4s"),  # KMB#
                ("status_word", "L"),
                ("dummy", "12s"),
                ("latitude", "d"),
                ("longitude", "d"),
                ("ellipsoid_height", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("heading", "f"),
                ("heave", "f"),
                ("roll_rate", "f"),
                ("pitch_rate", "f"),
                ("yaw_rate", "f"),
                ("velocity_north", "f"),
                ("velocity_east", "f"),
                ("velocity_down", "f"),
                ("latitude_error", "f"),
                ("longitude_error", "f"),
                ("height_error", "f"),
                ("roll_error", "f"),
                ("pitch_error", "f"),
                ("heading_error", "f"),
                ("heave_error", "f"),
                ("accel_north", "f"),
                ("accel_east", "f"),
                ("accel_down", "f"),
                ("heave_delay_secs", "L"),
                ("heave_delay_usecs", "L"),
                ("heave_delay_m", "f"),
            ],
        }

        _SimradDatagramParser.__init__(self, "MRU", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Unpacks the data in raw_string into dictionary containing MRU data

        :param raw_string:
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                #  first try to decode as utf-8 but fall back to latin_1 if that fails
                try:
                    data[field] = data[field].decode("utf-8")
                except:
                    data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["timestamp"] = data["timestamp"].replace(tzinfo=None)
        data["bytes_read"] = bytes_read

        return data

    def _pack_contents(self, data, version):

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:

            for field in self.header_fields(version):
                if isinstance(data[field], str):
                    data[field] = data[field].encode("latin_1")
                datagram_contents.append(data[field])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradNMEAParser

Bases: _SimradDatagramParser

ER60 NMEA datagram contains the following keys:

type:         string == 'NME0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:     datetime.datetime object of NT date, assumed to be UTC

nmea_string:  full (original) NMEA string

The following methods are defined:

from_string(str):    parse a raw ER60 NMEA datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradNMEAParser(_SimradDatagramParser):
    """
    ER60 NMEA datagram contains the following keys:


        type:         string == 'NME0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:     datetime.datetime object of NT date, assumed to be UTC

        nmea_string:  full (original) NMEA string

    The following methods are defined:

        from_string(str):    parse a raw ER60 NMEA datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    nmea_head_re = re.compile(r"\$[A-Za-z]{5},")  # noqa

    def __init__(self):
        headers = {
            0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")],
            1: [("type", "4s"), ("low_date", "L"), ("high_date", "L"), ("port", "32s")],
        }

        _SimradDatagramParser.__init__(self, "NME", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Parses the NMEA string provided in raw_string

        :param raw_string:  Raw NMEA string (i.e. '$GPZDA,160012.71,11,03,2004,-1,00*7D')
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        # Remove trailing \x00 from the PORT field for NME1, rest of the datagram identical to NME0
        if version == 1:
            data["port"] = data["port"].strip("\x00")

        if version == 0 or version == 1:
            if sys.version_info.major > 2:
                data["nmea_string"] = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                data["nmea_string"] = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

            if self.nmea_head_re.match(data["nmea_string"][:7]) is not None:
                data["nmea_talker"] = data["nmea_string"][1:3]
                data["nmea_type"] = data["nmea_string"][3:6]
            else:
                data["nmea_talker"] = ""
                data["nmea_type"] = "UNKNOWN"

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["nmea_string"][-1] != "\x00":
                tmp_string = data["nmea_string"] + "\x00"
            else:
                tmp_string = data["nmea_string"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))

            # Convert to python string if needed
            if isinstance(tmp_string, str):
                tmp_string = tmp_string.encode("ascii", errors="replace")

            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

SimradRawParser

Bases: _SimradDatagramParser

Sample Data Datagram parser operates on dictionaries with the following keys:

type:         string == 'RAW0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC

channel                         [short] Channel number
mode                            [short] 1 = Power only, 2 = Angle only 3 = Power & Angle
transducer_depth                [float]
frequency                       [float]
transmit_power                  [float]
pulse_length                    [float]
bandwidth                       [float]
sample_interval                 [float]
sound_velocity                  [float]
absorption_coefficient          [float]
heave                           [float]
roll                            [float]
pitch                           [float]
temperature                     [float]
heading                         [float]
transmit_mode                   [short] 0 = Active, 1 = Passive, 2 = Test, -1 = Unknown
spare0                          [str]
offset                          [long]
count                           [long]

power                           [numpy array] Unconverted power values (if present)
angle                           [numpy array] Unconverted angle values (if present)

from_string(str): parse a raw sample datagram (with leading/trailing datagram size stripped)

to_string(dict): Returns raw string (including leading/trailing size fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
class SimradRawParser(_SimradDatagramParser):
    """
    Sample Data Datagram parser operates on dictionaries with the following keys:

        type:         string == 'RAW0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC

        channel                         [short] Channel number
        mode                            [short] 1 = Power only, 2 = Angle only 3 = Power & Angle
        transducer_depth                [float]
        frequency                       [float]
        transmit_power                  [float]
        pulse_length                    [float]
        bandwidth                       [float]
        sample_interval                 [float]
        sound_velocity                  [float]
        absorption_coefficient          [float]
        heave                           [float]
        roll                            [float]
        pitch                           [float]
        temperature                     [float]
        heading                         [float]
        transmit_mode                   [short] 0 = Active, 1 = Passive, 2 = Test, -1 = Unknown
        spare0                          [str]
        offset                          [long]
        count                           [long]

        power                           [numpy array] Unconverted power values (if present)
        angle                           [numpy array] Unconverted angle values (if present)

    from_string(str):   parse a raw sample datagram
                        (with leading/trailing datagram size stripped)

    to_string(dict):    Returns raw string (including leading/trailing size fields)
                        ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel", "h"),
                ("mode", "h"),
                ("transducer_depth", "f"),
                ("frequency", "f"),
                ("transmit_power", "f"),
                ("pulse_length", "f"),
                ("bandwidth", "f"),
                ("sample_interval", "f"),
                ("sound_velocity", "f"),
                ("absorption_coefficient", "f"),
                ("heave", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("temperature", "f"),
                ("heading", "f"),
                ("transmit_mode", "h"),
                ("spare0", "6s"),
                ("offset", "l"),
                ("count", "l"),
            ],
            3: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel_id", "128s"),
                ("data_type", "h"),
                ("spare", "2s"),
                ("offset", "l"),
                ("count", "l"),
            ],
            4: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel_id", "128s"),
                ("data_type", "h"),
                ("spare", "2s"),
                ("offset", "l"),
                ("count", "l"),
            ],
        }
        _SimradDatagramParser.__init__(self, "RAW", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode(encoding="unicode_escape")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            if data["count"] > 0:
                block_size = data["count"] * 2
                indx = self.header_size(version)

                if int(data["mode"]) & 0x1:
                    data["power"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int16"  # noqa
                    )
                    indx += block_size
                else:
                    data["power"] = None

                if int(data["mode"]) & 0x2:
                    data["angle"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int8"  # noqa
                    )
                    data["angle"] = data["angle"].reshape((-1, 2))
                else:
                    data["angle"] = None

            else:
                data["power"] = np.empty((0,), dtype="int16")
                data["angle"] = np.empty((0, 2), dtype="int8")

        # RAW3 and RAW4 have the same format, only Datatype Bit 0-1 not used in RAW4
        elif version == 3 or version == 4:
            # result = 1j*Data[...,1]; result += Data[...,0]

            #  clean up the channel ID
            data["channel_id"] = data["channel_id"].strip("\x00")

            if data["count"] > 0:
                #  set the initial block size and indx value.
                block_size = data["count"] * 2
                indx = self.header_size(version)

                if data["data_type"] & 0b1:
                    data["power"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int16"  # noqa
                    )
                    indx += block_size
                else:
                    data["power"] = None

                if data["data_type"] & 0b10:
                    data["angle"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int8"  # noqa
                    )
                    data["angle"] = data["angle"].reshape((-1, 2))
                    indx += block_size
                else:
                    data["angle"] = None

                #  determine the complex sample data type - this is contained in bits 2 and 3
                #  of the datatype <short> value. I'm assuming the types are exclusive...
                data["complex_dtype"] = np.float16
                type_bytes = 2
                if data["data_type"] & 0b1000:
                    data["complex_dtype"] = np.float32
                    type_bytes = 8

                #  determine the number of complex samples
                data["n_complex"] = data["data_type"] >> 8

                #  unpack the complex samples
                if data["n_complex"] > 0:
                    #  determine the block size
                    block_size = data["count"] * data["n_complex"] * type_bytes

                    data["complex"] = np.frombuffer(
                        raw_string[indx : indx + block_size],  # noqa
                        dtype=data["complex_dtype"],
                    )
                    data["complex"].dtype = np.complex64
                    if version == 3:
                        data["complex"] = data["complex"].reshape((-1, data["n_complex"]))
                else:
                    data["complex"] = None

            else:
                data["power"] = np.empty((0,), dtype="int16")
                data["angle"] = np.empty((0,), dtype="int8")
                data["complex"] = np.empty((0,), dtype="complex64")
                data["n_complex"] = 0

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)

        datagram_contents = []

        if version == 0:
            if data["count"] > 0:
                if (int(data["mode"]) & 0x1) and (len(data.get("power", [])) != data["count"]):
                    logger.warning(
                        "Data 'count' = %d, but contains %d power samples.  Ignoring power."
                    )
                    data["mode"] &= ~(1 << 0)

                if (int(data["mode"]) & 0x2) and (len(data.get("angle", [])) != data["count"]):
                    logger.warning(
                        "Data 'count' = %d, but contains %d angle samples.  Ignoring angle."
                    )
                    data["mode"] &= ~(1 << 1)

                if data["mode"] == 0:
                    logger.warning(
                        "Data 'count' = %d, but mode == 0.  Setting count to 0",
                        data["count"],
                    )
                    data["count"] = 0

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["count"] > 0:
                if int(data["mode"]) & 0x1:
                    datagram_fmt += "%dh" % (data["count"])
                    datagram_contents.extend(data["power"])

                if int(data["mode"]) & 0x2:
                    datagram_fmt += "%dH" % (data["count"])
                    datagram_contents.extend(data["angle"])

        return struct.pack(datagram_fmt, *datagram_contents)

SimradXMLParser

Bases: _SimradDatagramParser

EK80 XML datagram contains the following keys:

type:         string == 'XML0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
subtype:      string representing Simrad XML datagram type:
              configuration, environment, or parameter

[subtype]:    dict containing the data specific to the XML subtype.

The following methods are defined:

from_string(str):    parse a raw EK80 XML datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk
Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
class SimradXMLParser(_SimradDatagramParser):
    """
    EK80 XML datagram contains the following keys:


        type:         string == 'XML0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        subtype:      string representing Simrad XML datagram type:
                      configuration, environment, or parameter

        [subtype]:    dict containing the data specific to the XML subtype.

    The following methods are defined:

        from_string(str):    parse a raw EK80 XML datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    #  define the XML parsing options - here we define dictionaries for various xml datagram
    #  types. When parsing that xml datagram, these dictionaries are used to inform the parser about
    #  type conversion, name wrangling, and delimiter. If a field is missing, the parser
    #  assumes no conversion: type will be string, default mangling, and that there is only 1
    #  element.
    #
    #  the dicts are in the form:
    #       'XMLParamName':[converted type,'fieldname', 'parse char']
    #
    #  For example: 'PulseDurationFM':[float,'pulse_duration_fm',';']
    #
    #  will result in a return dictionary field named 'pulse_duration_fm' that contains a list
    #  of float values parsed from a string that uses ';' to separate values. Empty strings
    #  for fieldname and/or parse char result in the default action for those parsing steps.

    channel_parsing_options = {
        "MaxTxPowerTransceiver": [int, "", ""],
        "PulseDuration": [float, "", ";"],
        "PulseDurationFM": [float, "pulse_duration_fm", ";"],
        "SampleInterval": [float, "", ";"],
        "ChannelID": [str, "channel_id", ""],
        "HWChannelConfiguration": [str, "hw_channel_configuration", ""],
    }

    transceiver_parsing_options = {
        "TransceiverNumber": [int, "", ""],
        "Version": [str, "transceiver_version", ""],
        "IPAddress": [str, "ip_address", ""],
        "Impedance": [int, "", ""],
    }

    transducer_parsing_options = {
        "SerialNumber": [str, "transducer_serial_number", ""],
        "Frequency": [float, "transducer_frequency", ""],
        "FrequencyMinimum": [float, "transducer_frequency_minimum", ""],
        "FrequencyMaximum": [float, "transducer_frequency_maximum", ""],
        "BeamType": [int, "transducer_beam_type", ""],
        "Gain": [float, "", ";"],
        "SaCorrection": [float, "", ";"],
        "MaxTxPowerTransducer": [float, "", ""],
        "EquivalentBeamAngle": [float, "", ""],
        "BeamWidthAlongship": [float, "", ""],
        "BeamWidthAthwartship": [float, "", ""],
        "AngleSensitivityAlongship": [float, "", ""],
        "AngleSensitivityAthwartship": [float, "", ""],
        "AngleOffsetAlongship": [float, "", ""],
        "AngleOffsetAthwartship": [float, "", ""],
        "DirectivityDropAt2XBeamWidth": [
            float,
            "directivity_drop_at_2x_beam_width",
            "",
        ],
        "TransducerOffsetX": [float, "", ""],
        "TransducerOffsetY": [float, "", ""],
        "TransducerOffsetZ": [float, "", ""],
        "TransducerAlphaX": [float, "", ""],
        "TransducerAlphaY": [float, "", ""],
        "TransducerAlphaZ": [float, "", ""],
    }

    header_parsing_options = {"Version": [str, "application_version", ""]}

    envxdcr_parsing_options = {"SoundSpeed": [float, "transducer_sound_speed", ""]}

    environment_parsing_options = {
        "Depth": [float, "", ""],
        "Acidity": [float, "", ""],
        "Salinity": [float, "", ""],
        "SoundSpeed": [float, "", ""],
        "Temperature": [float, "", ""],
        "Latitude": [float, "", ""],
        "SoundVelocityProfile": [float, "", ";"],
        "DropKeelOffset": [float, "", ""],
        "DropKeelOffsetIsManual": [int, "", ""],
        "WaterLevelDraft": [float, "", ""],
        "WaterLevelDraftIsManual": [int, "", ""],
    }

    parameter_parsing_options = {
        "ChannelID": [str, "channel_id", ""],
        "ChannelMode": [int, "", ""],
        "PulseForm": [int, "", ""],
        "Frequency": [float, "", ""],
        "PulseDuration": [float, "", ""],
        "SampleInterval": [float, "", ""],
        "TransmitPower": [float, "", ""],
        "Slope": [float, "", ""],
    }

    def __init__(self):
        headers = {0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")]}
        _SimradDatagramParser.__init__(self, "XML", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Parses the NMEA string provided in raw_string

        :param raw_string:  Raw NMEA string (i.e. '$GPZDA,160012.71,11,03,2004,-1,00*7D')
        :type raw_string: str

        :returns: None
        """

        def dict_to_dict(xml_dict, data_dict, parse_opts):
            """
            dict_to_dict appends the ETree xml value dicts to a provided dictionary
            and along the way converts the key name to conform to the project's
            naming convention and optionally parses and or converts values as
            specified in the parse_opts dictionary.
            """

            for k in xml_dict:
                #  check if we're parsing this key/value
                if k in parse_opts:
                    #  try to parse the string
                    if parse_opts[k][2]:
                        try:
                            data = xml_dict[k].split(parse_opts[k][2])
                        except:
                            #  bad or empty parse character(s) provided
                            data = xml_dict[k]
                    else:
                        #  no parse char provided - nothing to parse
                        data = xml_dict[k]

                    #  try to convert to specified type
                    if isinstance(data, list):
                        for i in range(len(data)):
                            try:
                                data[i] = parse_opts[k][0](data[i])
                            except:
                                pass
                    else:
                        data = parse_opts[k][0](data)

                    #  and add the value to the provided dict
                    if parse_opts[k][1]:
                        #  add using the specified key name
                        data_dict[parse_opts[k][1]] = data
                    else:
                        #  add using the default key name wrangling
                        data_dict[camelcase2snakecase(k)] = data
                else:
                    #  nothing to do with the value string
                    data = xml_dict[k]

                    #  add the parameter to the provided dictionary
                    data_dict[camelcase2snakecase(k)] = data

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            if sys.version_info.major > 2:
                xml_string = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                xml_string = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

            #  get the ElementTree element
            root = ET.fromstring(xml_string)

            #  get the XML message type
            data["subtype"] = root.tag.lower()

            #  create the dictionary that contains the message data
            data[data["subtype"]] = {}

            #  parse it
            if data["subtype"] == "configuration":
                #  parse the Transceiver section
                for tcvr in root.iter("Transceiver"):
                    #  parse the Transceiver section
                    tcvr_xml = tcvr.attrib

                    #  parse the Channel section -- this works with multiple channels
                    #  under 1 transceiver
                    for tcvr_ch in tcvr.iter("Channel"):
                        tcvr_ch_xml = tcvr_ch.attrib
                        channel_id = tcvr_ch_xml["ChannelID"]

                        #  create the configuration dict for this channel
                        data["configuration"][channel_id] = {}

                        #  add the transceiver data to the config dict (this is
                        #  replicated for all channels)
                        dict_to_dict(
                            tcvr_xml,
                            data["configuration"][channel_id],
                            self.transceiver_parsing_options,
                        )

                        #  add the general channel data to the config dict
                        dict_to_dict(
                            tcvr_ch_xml,
                            data["configuration"][channel_id],
                            self.channel_parsing_options,
                        )

                        #  check if there are >1 transducer under a single transceiver channel
                        if len(list(tcvr_ch)) > 1:
                            ValueError("Found >1 transducer under a single transceiver channel!")
                        else:  # should only have 1 transducer
                            tcvr_ch_xducer = tcvr_ch.find(
                                "Transducer"
                            )  # get Element of this xducer
                            f_par = tcvr_ch_xducer.findall("FrequencyPar")
                            # Save calibration parameters
                            if f_par:
                                cal_par = {
                                    "frequency": np.array(
                                        [int(f.attrib["Frequency"]) for f in f_par]
                                    ),
                                    "gain": np.array([float(f.attrib["Gain"]) for f in f_par]),
                                    "impedance": np.array(
                                        [float(f.attrib["Impedance"]) for f in f_par]
                                    ),
                                    "phase": np.array([float(f.attrib["Phase"]) for f in f_par]),
                                    "beamwidth_alongship": np.array(
                                        [float(f.attrib["BeamWidthAlongship"]) for f in f_par]
                                    ),
                                    "beamwidth_athwartship": np.array(
                                        [float(f.attrib["BeamWidthAthwartship"]) for f in f_par]
                                    ),
                                    "angle_offset_alongship": np.array(
                                        [float(f.attrib["AngleOffsetAlongship"]) for f in f_par]
                                    ),
                                    "angle_offset_athwartship": np.array(
                                        [float(f.attrib["AngleOffsetAthwartship"]) for f in f_par]
                                    ),
                                }
                                data["configuration"][channel_id]["calibration"] = cal_par
                            #  add the transducer data to the config dict
                            dict_to_dict(
                                tcvr_ch_xducer.attrib,
                                data["configuration"][channel_id],
                                self.transducer_parsing_options,
                            )

                        # get unique transceiver channel number stored in channel_id
                        tcvr_ch_num = TCVR_CH_NUM_MATCHER.search(channel_id)[0]

                        # parse the Transducers section from the root
                        # TODO Remove Transducers if doesn't exist
                        xducer = root.find("Transducers")
                        if xducer is not None:
                            # built occurrence lookup table for transducer name
                            xducer_name_list = []
                            for xducer_ch in xducer.iter("Transducer"):
                                xducer_name_list.append(xducer_ch.attrib["TransducerName"])

                            # find matching transducer for this channel_id
                            match_found = False
                            for xducer_ch in xducer.iter("Transducer"):
                                if not match_found:
                                    xducer_ch_xml = xducer_ch.attrib
                                    match_name = (
                                        xducer_ch.attrib["TransducerName"]
                                        == tcvr_ch_xducer.attrib["TransducerName"]
                                    )
                                    if xducer_ch.attrib["TransducerSerialNumber"] == "":
                                        match_sn = False
                                    else:
                                        match_sn = (
                                            xducer_ch.attrib["TransducerSerialNumber"]
                                            == tcvr_ch_xducer.attrib["SerialNumber"]
                                        )
                                    match_tcvr = (
                                        tcvr_ch_num in xducer_ch.attrib["TransducerCustomName"]
                                    )

                                    # if find match add the transducer mounting details
                                    if (
                                        Counter(xducer_name_list)[
                                            xducer_ch.attrib["TransducerName"]
                                        ]
                                        > 1
                                    ):
                                        # if more than one transducer has the same name
                                        # only check sn and transceiver unique number
                                        match_found = match_sn or match_tcvr
                                    else:
                                        match_found = match_name or match_sn or match_tcvr

                                    # add transducer mounting details
                                    if match_found:
                                        dict_to_dict(
                                            xducer_ch_xml,
                                            data["configuration"][channel_id],
                                            self.transducer_parsing_options,
                                        )

                        #  add the header data to the config dict
                        h = root.find("Header")
                        dict_to_dict(
                            h.attrib,
                            data["configuration"][channel_id],
                            self.header_parsing_options,
                        )

            elif data["subtype"] == "parameter":
                #  parse the parameter XML datagram
                for h in root.iter("Channel"):
                    parm_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(parm_xml, data["parameter"], self.parameter_parsing_options)

            elif data["subtype"] == "environment":
                #  parse the environment XML datagram
                for h in root.iter("Environment"):
                    env_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(env_xml, data["environment"], self.environment_parsing_options)

                for h in root.iter("Transducer"):
                    transducer_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(
                        transducer_xml,
                        data["environment"],
                        self.envxdcr_parsing_options,
                    )

        data["xml"] = xml_string
        return data

    def _pack_contents(self, data, version):
        def to_CamelCase(xml_param):
            """
            convert name from project's convention to CamelCase for converting back to
            XML to in Kongsberg's convention.
            """
            idx = list(reversed([i for i, c in enumerate(xml_param) if c.isupper()]))
            param_len = len(xml_param)
            for i in idx:
                #  check if we should insert an underscore
                if idx > 0 and idx < param_len - 1:
                    xml_param = xml_param[:idx] + "_" + xml_param[idx:]
            xml_param = xml_param.lower()

            return xml_param

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["nmea_string"][-1] != "\x00":
                tmp_string = data["nmea_string"] + "\x00"
            else:
                tmp_string = data["nmea_string"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))

            # Convert to python string if needed
            if isinstance(tmp_string, str):
                tmp_string = tmp_string.encode("ascii", errors="replace")

            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

log

Functions:

Name Description
verbose

Set the verbosity for echopype print outs.

verbose(logfile=None, override=False)

Set the verbosity for echopype print outs. If called it will output logs to terminal by default.

Parameters

logfile : str, optional Optional string path to the desired log file. override: bool Boolean flag to override verbosity, which turns off verbosity if the value is False. Default is False.

Returns

None

Source code in src\aalibrary\utils\sonar_checker\log.py
def verbose(logfile: Optional[str] = None, override: bool = False) -> None:
    """Set the verbosity for echopype print outs.
    If called it will output logs to terminal by default.

    Parameters
    ----------
    logfile : str, optional
        Optional string path to the desired log file.
    override: bool
        Boolean flag to override verbosity,
        which turns off verbosity if the value is `False`.
        Default is `False`.

    Returns
    -------
    None
    """
    if not isinstance(override, bool):
        raise ValueError("override argument must be a boolean!")
    package_name = __name__.split(".")[0]  # Get the package name
    loggers = _get_all_loggers()
    verbose = True if override is False else False
    _set_verbose(verbose)
    for logger in loggers:
        if package_name in logger.name:
            handlers = [h.name for h in logger.handlers]
            if logfile is None:
                if LOGFILE_HANDLE_NAME in handlers:
                    # Remove log file handler if it exists
                    handler = next(filter(lambda h: h.name == LOGFILE_HANDLE_NAME, logger.handlers))
                    logger.removeHandler(handler)
            elif LOGFILE_HANDLE_NAME not in handlers:
                # Only add the logfile handler if it doesn't exist
                _set_logfile(logger, logfile)

            if isinstance(logfile, str):
                # Prevents multiple handler from propagating messages
                # this way there are no duplicate line in logfile
                logger.propagate = False
            else:
                logger.propagate = True

misc

Functions:

Name Description
camelcase2snakecase

Convert string from CamelCase to snake_case

depth_from_pressure

Convert pressure to depth using UNESCO 1983 algorithm.

camelcase2snakecase(camel_case_str)

Convert string from CamelCase to snake_case e.g. CamelCase becomes camel_case.

Source code in src\aalibrary\utils\sonar_checker\misc.py
def camelcase2snakecase(camel_case_str):
    """
    Convert string from CamelCase to snake_case
    e.g. CamelCase becomes camel_case.
    """
    idx = list(reversed([i for i, c in enumerate(camel_case_str) if c.isupper()]))
    param_len = len(camel_case_str)
    for i in idx:
        #  check if we should insert an underscore
        if i > 0 and i < param_len:
            camel_case_str = camel_case_str[:i] + "_" + camel_case_str[i:]

    return camel_case_str.lower()

depth_from_pressure(pressure, latitude=30.0, atm_pres_surf=0.0)

Convert pressure to depth using UNESCO 1983 algorithm.

UNESCO. 1983. Algorithms for computation of fundamental properties of seawater (Pressure to Depth conversion, pages 25-27). Prepared by Fofonoff, N.P. and Millard, R.C. UNESCO technical papers in marine science, 44. http://unesdoc.unesco.org/images/0005/000598/059832eb.pdf

Parameters

pressure : Union[float, FloatSequence] Pressure in dbar latitude : Union[float, FloatSequence], default=30.0 Latitude in decimal degrees. atm_pres_surf : Union[float, FloatSequence], default=0.0 Atmospheric pressure at the surface in dbar. Use the default 0.0 value if pressure is corrected to be 0 at the surface. Otherwise, enter a correction for pressure due to air, sea ice and any other medium that may be present

Returns

depth : NDArray[float] Depth in meters

Source code in src\aalibrary\utils\sonar_checker\misc.py
def depth_from_pressure(
    pressure: Union[float, FloatSequence],
    latitude: Optional[Union[float, FloatSequence]] = 30.0,
    atm_pres_surf: Optional[Union[float, FloatSequence]] = 0.0,
) -> NDArray[float]:
    """
    Convert pressure to depth using UNESCO 1983 algorithm.

    UNESCO. 1983. Algorithms for computation of fundamental properties of seawater (Pressure to
    Depth conversion, pages 25-27). Prepared by Fofonoff, N.P. and Millard, R.C. UNESCO technical
    papers in marine science, 44. http://unesdoc.unesco.org/images/0005/000598/059832eb.pdf

    Parameters
    ----------
    pressure : Union[float, FloatSequence]
        Pressure in dbar
    latitude : Union[float, FloatSequence], default=30.0
        Latitude in decimal degrees.
    atm_pres_surf : Union[float, FloatSequence], default=0.0
        Atmospheric pressure at the surface in dbar.
        Use the default 0.0 value if pressure is corrected to be 0 at the surface.
        Otherwise, enter a correction for pressure due to air, sea ice and any other
        medium that may be present

    Returns
    -------
    depth : NDArray[float]
        Depth in meters
    """

    def _as_nparray_check(v, check_vs_pressure=False):
        """
        Convert to np.array if not already a np.array.
        Ensure latitude and atm_pres_surf are of the same size and shape as
        pressure if they are not scalar.
        """
        v_array = np.array(v) if not isinstance(v, np.ndarray) else v
        if check_vs_pressure:
            if v_array.size != 1:
                if v_array.size != pressure.size or v_array.shape != pressure.shape:
                    raise ValueError("Sequence shape or size does not match pressure")
        return v_array

    pressure = _as_nparray_check(pressure)
    latitude = _as_nparray_check(latitude, check_vs_pressure=True)
    atm_pres_surf = _as_nparray_check(atm_pres_surf, check_vs_pressure=True)

    # Constants
    g = 9.780318
    c1 = 9.72659
    c2 = -2.2512e-5
    c3 = 2.279e-10
    c4 = -1.82e-15
    k1 = 5.2788e-3
    k2 = 2.36e-5
    k3 = 1.092e-6

    # Calculate depth
    pressure = pressure - atm_pres_surf
    depth_w_g = c1 * pressure + c2 * pressure**2 + c3 * pressure**3 + c4 * pressure**4
    x = np.sin(np.deg2rad(latitude))
    gravity = g * (1.0 + k1 * x**2 + k2 * x**4) + k3 * pressure
    depth = depth_w_g / gravity
    return depth

sonar_checker

Functions:

Name Description
is_AD2CP

Check if the provided file has a .ad2cp extension.

is_AZFP

Check if the specified XML file contains an with string="AZFP".

is_AZFP6

Check if the provided file has a .azfp extension.

is_EK60

Check if a raw data file is from Simrad EK60 echosounder.

is_EK80

Check if a raw data file is from Simrad EK80 echosounder.

is_ER60

Check if a raw data file is from Simrad EK60 echosounder.

is_AD2CP(raw_file)

Check if the provided file has a .ad2cp extension.

Parameters: raw_file (str): The name of the file to check.

Returns: bool: True if the file has a .ad2cp extension, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_AD2CP(raw_file):
    """
    Check if the provided file has a .ad2cp extension.

    Parameters:
    raw_file (str): The name of the file to check.

    Returns:
    bool: True if the file has a .ad2cp extension, False otherwise.
    """

    # Check if the input is a string
    if not isinstance(raw_file, str):
        return False  # Return False if the input is not a string

    # Use the str.lower() method to check for the .ad2cp extension
    has_ad2cp_extension = raw_file.lower().endswith(".ad2cp")

    # Return the result of the check
    return has_ad2cp_extension

is_AZFP(raw_file)

Check if the specified XML file contains an with string="AZFP".

Parameters: raw_file (str): The base name of the XML file (with or without extension).

Returns: bool: True if with string="AZFP" is found, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_AZFP(raw_file):
    """
    Check if the specified XML file contains an <InstrumentType> with string="AZFP".

    Parameters:
    raw_file (str): The base name of the XML file (with or without extension).

    Returns:
    bool: True if <InstrumentType> with string="AZFP" is found, False otherwise.
    """

    # Check if the filename ends with .xml or .XML, and strip the extension if it does
    base_filename = raw_file.rstrip(".xml").rstrip(".XML")

    # Create a list of possible filenames with both extensions
    possible_files = [f"{base_filename}.xml", f"{base_filename}.XML"]

    for full_filename in possible_files:
        if os.path.isfile(full_filename):
            try:
                # Parse the XML file
                tree = ET.parse(full_filename)
                root = tree.getroot()

                # Check for <InstrumentType> elements
                for instrument in root.findall(".//InstrumentType"):
                    if instrument.get("string") == "AZFP":
                        return True
            except ET.ParseError:
                print(f"Error parsing the XML file: {full_filename}.")

    return False

is_AZFP6(raw_file)

Check if the provided file has a .azfp extension.

Parameters: raw_file (str): The name of the file to check.

Returns: bool: True if the file has a .azfp extension, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_AZFP6(raw_file):
    """
    Check if the provided file has a .azfp extension.

    Parameters:
    raw_file (str): The name of the file to check.

    Returns:
    bool: True if the file has a .azfp extension, False otherwise.
    """

    # Check if the input is a string
    if not isinstance(raw_file, str):
        return False  # Return False if the input is not a string

    # Use the str.lower() method to check for the .azfp extension
    has_azfp_extension = raw_file.lower().endswith(".azfp")

    # Return the result of the check
    return has_azfp_extension

is_EK60(raw_file, storage_options)

Check if a raw data file is from Simrad EK60 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_EK60(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK60 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )

        try:
            # Return True if the sounder name matches "EK60"
            return config_datagram["sounder_name"] in {"ER60", "EK60"}
        except KeyError:
            return False

is_EK80(raw_file, storage_options)

Check if a raw data file is from Simrad EK80 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_EK80(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK80 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )

        # Return True if "configuration" exists in config_datagram
        return "configuration" in config_datagram

is_ER60(raw_file, storage_options)

Check if a raw data file is from Simrad EK60 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py
def is_ER60(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK60 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )
        # Return True if the sounder name matches "ER60"
        try:
            return config_datagram["sounder_name"] in {"ER60", "EK60"}
        except KeyError:
            return False

timings

"This script deals with the times associated with ingesting/preprocessing data from various sources. It works as follows: * A large file (usually 1 GB) is selected to repeatedly be downloaded and uploaded to a GCP bucket. * Download and upload times are recorded for each of these n iterations. * The average of these times are presented.

Functions:

Name Description
time_ingestion_and_upload_from_ncei

Used for timing the ingestion from the NCEI AWS S3 bucket.

time_ingestion_and_upload_from_ncei(n=10, ncei_file_url='https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_Lasker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw', ncei_bucket='noaa-wcsd-pds', download_location='./')

Used for timing the ingestion from the NCEI AWS S3 bucket.

Source code in src\aalibrary\utils\timings.py
def time_ingestion_and_upload_from_ncei(
    n: int = 10,
    ncei_file_url: str = (
        "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/"
        "Reuben_Lasker/RL2107/EK80/"
        "2107RL_CW-D20210813-T220732.raw"
    ),
    ncei_bucket: str = "noaa-wcsd-pds",
    download_location: str = "./",
):
    """Used for timing the ingestion from the NCEI AWS S3 bucket."""

    download_times = []
    upload_times = []
    file_name = helpers.get_file_name_from_url(ncei_file_url)

    for i in range(n):
        start_time = time.time()
        ncei_utils.download_single_file_from_aws(
            file_url=ncei_file_url,
            download_location=download_location,
        )
        time_elapsed = time.time() - start_time
        print(
            (
                f"Downloading took {time_elapsed} seconds."
                f"\nThat's {1000/time_elapsed} mb/sec."
            )
        )
        print("Uploading file to cloud storage")
        start_time = time.time()
        cloud_utils.upload_file_to_gcp_bucket(
            bucket=None,
            blob_file_path="timing_test_raw_upload.raw",
            local_file_path=file_name,
        )
        time_elapsed = time.time() - start_time
        print(
            (
                f"Uploading took {time_elapsed} seconds."
                f"\nThat's {1000/time_elapsed} mb/sec."
            )
        )

    print(
        (
            "Average download time for this file:"
            f" {sum(download_times)/len(download_times)}"
        )
    )
    print(
        (
            "Average upload time for this file:"
            f" {sum(upload_times)/len(upload_times)}"
        )
    )