PAM Cloud Data Management Lifecycle

Document purpose: Summarize cloud storage class options, associated costs, and restrictions to help FMC PAM teams decide the best storage class/management lifecycle for their buckets.

Definitions and cloud storage background

Google Storage Classes:

  • Standard: Expensive to store ($20 per TB/month), free to access, no min storage
  • Nearline: ~50% cheaper than standard to store, $10 per TB to access, 30 day min
  • Coldline: ~75% cheaper than standard to store, $20 per TB to access, 90 day min
  • Archive: ~85% chapter than standard to store, $50 per TB to access, 365 day min

Hot vs Cold storage: A gradient of storage options, where “hot” data is cheap to access and use (standard storage), and “cold storage” refers to data that is cheaper to store, but more expensive to access (nearline, coldline, or archive storage).

  • Storage cost examples: How much does it cost to store 700TB without discounts?
    • Standard per month: ~ $16,500 / month
    • Nearline: ~ $8,500 / month
    • Coldline: ~$4,150 / month
    • Archival: ~$2,475 / month

Data management options:

  • Manual: the data admin of each bucket manual sets and maintains the storage class for each object/bucket.
  • Google Autoclass: Google manages storage class behind the scenes. In this set-up, Google actively moves files into colder storage based on the time since they have last been accessed. Either nearline or archive storage can be selected as the coldest allowable storage. If a file in a colder storage is accessed, Google moves the file back to standard storage and only charges the higher access fee once. Google recommends autoclass for data that is accessed in variable, unpredictable, or unknown patterns.
    • Uploaded to standard, if not accessed for 30 days moves to nearline.
    • If not accessed for another 60 days, moves to cold line
    • If not accessed for another 275 days, moves to archive
  • Time-based: Data is programmed to move into colder storage based on an agreed upon timeline for accessing.
  • Logic based: Storage logic is programmed in our console for more automated, but user defined management. A few possible examples:
    • All files with .sud extensions are moved to archive storage 30 days after upload since they are compressed SoundTrap files stored as a backup
    • Data that is available in the NCEI archive or on the NODD are moved to archive storage in GCP
    • Data is moved to archive storage once a set of user defined analyses are completed and their data are available in Makara.

Current data management scheme

All buckets are set to autoclass which is managed by Google behind the scenes. In this set-up, Google actively moves files into colder storage based on the time since they have last been accessed. Currently, files only move down to nearline storage. If a file in nearline is accessed, Google moves the file back to standard storage and only charges the higher access fee once.

The use of autoclass removes the ability to set specific object storage classes (ie. cannot have a bucket in autoclass and set all .sud files to archive).