Data Storage, Mining, Curation & Analytics

Both Climate Modeling and Earth System Modeling entail petabytes (~1015 bytes), if not exabytes (~1018 bytes) of observational data and sensor network data, as well as the vast amounts of data output from the simulation process itself.

Generally speaking, observational data is best stored locally, at a place near to where the data has been collected - simply because moving data has a cost, and the 'pride of ownership' factor helps preserve the quality and integrity of such data on a long term basis. Simulation output data on the other hand, is best stored near to the computing centres at which the simulations are made.

When useful data is stored in several separate facilities, it needs to be 'federated' and 'harmonized' so as to become accessible and useful. Gaining access to remotely stored data through global networks is required in all such cases.

This section covers many aspects of the entire data life cycle process:

Software as a service for data scientists     EarthServer

To Know, but Not Understand: David Weinberger on Science and Big Data

From Data to Knowledge: machine-learning with real-time and streaming applications 

From Microprocessors to Nanostores: Rethinking Data-Centric Systems

Big data, big dreams    Data-Intesive System Evolution    

What CIOs and CTOs need to know about Big Data and Data Intensive Computing

Storage at Exascale     Using In-Memory Data Grids for global data integration

Using an In-Memory Data Grid for near real-time data analysis    

Availability in Globally Distributed Storage Systems   High performance scalable unified storage 

Codesign challenges for exascale systems: performance, power and reliablility  

Big Data, Big Demand: Navigating the Cloud Storage Landscape   SDSC Cloud Storage Services

Fujitsu Develops World's First Cloud Platform to Leverage Big Data

HP: Exascale Data Center    IBM big data VP surveys landscape

Understanding data intensive analysis on large-scale HPC compute systems 

Why Lustre Is Set to Excel in Exascale    The State of the Lustre Community

Xyratex announces acquisition of Oracle's Lustre assets   

As Supercomputers Approach Exascale, Experts Wrestle with Big Data

The New Era of Computing: An Interview with "Dr. Data"

Expert Panel: What’s Around the Bend for Big Data?

Tool Enables Scientists to Uncover Patterns in Vast Data Sets

MINE: Detecting novel associations in large data sets

MINE: Maximal Information-based Nonparametric Exploration

New Techniques Turbo-Charge Data Mining    The Evolving Art (and Business) of Data Curation

Fujitsu Lets Big Data Cloud Flag Fly     Supercomputer sails through world history

Big Data in Space: Martian Computational Archeology   Astronomers Leverage "Unprecedented" Data Set

Big data revolution in astrophysics

The CAP Theorem's growing impact   

DOE Focuses on Scientific Data Integration      Why science really needs big data

Multiparadigm Data Storage for Enterprise Applcations

Optimize Storage Placement in Sensor Networks    Next Generation Team Science Platform

The Complexity of VMware storage management

IBM Design Wins the Storage Challenge at SC10

IBM Demos Record-Breaking Parallel File System Performance

Parallel File System OrangeFS Starts to Build a Following

IBM Announces HPC Storage Solution for Streaming Data

IBM Scientists Demonstrate Phase-Change Memory Breakthrough

Phase Change Memory-Based Moneta System Points to the Future of Computer Storage

Write speeds for phase-change memory reach record limits   Hybrid memory cube angles for exascale

Patent Granted for Super-Fast MRAM Data Storage    UK Researchers develop  super-fast memory chip

Rice, UCLA slash energy needs for next-generation memory    Battery and memory device in one

DNA storage crams 700 terabytes of data into a single gram

Hadoop: Big Data, Big Analytics, Big Insights

Data storage in DNA becomes a reality