GeoSCIFramework: Scalable Real-time Streaming Analytics and Machine Learning for Geoscience and Hazards Research develops a real-time processing system capable of handling a large mix of sensor observations using automated detection of natural hazard events using machine learning, as the events are occurring. A four-organization collaboration (UNAVCO, University of Colorado, University of Oregon, and Rutgers University) develops a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research. This work will support rapid analysis and understanding of data associated with hazardous events (earthquakes, volcanic eruptions, tsunamis).
Earthquakes, tsunamis and volcanoes pose natural hazards on nearly unimaginable scales and compel geoscientists to find new ways to better understand the processes that cause them and to mitigate their effects on population and the built environment.
The shear volume and complexity of the data from these data streams, coupled with the need to model, analyze and assess hazards in a matter of only moments, makes geophysical applications to hazards early warning a Big Data problem. This project will unite computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.
This project uses a collaboration between computer scientists and geoscientists to develop a data framework for generalized real-time streaming analytics and machine learning for geoscience and hazards research.
It focuses on the aggregation and integration of a large number of data streams into a coherent system that supports analysis of the data streams in real-time. The framework will offer machine-learning-based tools designed to detect signals of events, such as earthquakes and tsunamis, that might only be detectable when looking at a broad selection of observational inputs.
The architecture sets up a fast data pipeline by combining a group of open source components that make big data applications viable and easier to develop. Machine learning (ML) algorithms will be researched and applied to the tsunami and earthquake use cases. Integral to the project will be development, documentation and training using collaborative online resources such as GitLab and Jupyter Notebooks, and utilizing NSF XSEDE resources to make larger datasets and computational resources more widely available.
Our project focuses on use cases in the Cascadia subduction zone and Yellowstone: these locations combine the expertise of the science team with locations where EarthScope and OOI have the greatest concentration of instruments. Data sources for the project draw primarily upon the 1500+ sensors from the EarthScope networks currently managed by UNAVCO and the Incorporated Research Institutions for Seismology (IRIS), as well as the Ocean Observatories Initiative (OOI) cabled array data managed by Rutgers University.
Use Case: Real-time Short-term Events. The fundamental open science question for earthquake and tsunami hazards is about determinism, when, within a minutes long rupture process, is a very large earthquake different from an only large one? This is important for the physics of earthquakes, the rupture behavior, and is indicative of the material properties, the state of stress, and the overall dynamics. For Earthquake (EEW) and Tsunami (TWW) early warning, it defines the minimum possible time at which characterization of an event and its resulting hazards could be made. The focus of this study is earthquake and tsunami warning in the Cascadia subduction zone.
Use Case: Long-term Events. Natural catastrophes occur at a variety of spatial and temporal scales. In particular, solid earth hazards, such as large earthquakes and volcanic eruptions, often have very long interevent times and this makes it difficult to forecast their behavior. This part of the project pulls in multiple data sets to address the long- intermediate- and short-term forecasting of these types of events. Test sites include the Yellowstone magmatic center and the Hawaiian island volcanoes.
GeoSCIFramework will provide scientists and researchers the capability to instantly recognize that a tsunamigenic earthquake has occurred or to identify longer term subtle motions of the earth's surface on previously unrealized scales. Trained in this multi-data environment and informed by physical models, machine learning algorithms and spatio-temporal analyses, this project’s approach is extensible to not just detection and characterization of earthquakes but also to the onset of other geophysical signals like slow-slip events or magmatic intrusion, expanding the potential for new scientific discoveries.
This 4-year award by the NSF Office of Advanced Cyberinfrastructure is jointly supported by the Cross-Cutting Program and Division of Earth Sciences within the NSF Directorate for Geosciences, the Big Data Science and Engineering Program within the Directorate for Computer and Information Science and Engineering, and the EarthCube Program jointly sponsored by the NSF Directorate for Geosciences and the Office of Advanced Cyberinfrastructure.
Corsa, B.D., Tiampo, K.F., Kelevitz, K., Baker, S., Meerteens, C.,(2019). Automated processing, streaming, and integration of InSAR time-series and GNSS data; as part of the collaborative GeoSciFramework research project. Poster Presented at American Geophysical Union 2019 Fall Meeting, Session G13C-0574, 2019 09 Dec, San Francisco, CA.
Corsa, B., Tiampo, K, Kelevitz, K., Baker, S., Meertens, C. (2020) Comparison of InSAR time series generation techniques as part of the collaborative GeoSCIFramework research project, Presented at 2020 EarthCube Annual Meeting, 2020 18 Jun, Virtual Conference.
Fauvel, K., Balouek-Thomert, D., Melgar, D., Silva, P., Simonet, A., Antoniu, G., Costan, A., Masson, V., Parashar, M., Rodero, I., Termier, A. (2020). A Distributed Multi-Sensor Machine Learning Approach to Earthquake Early Warning. Proceedings of the AAAI Conference on Artificial Intelligence, 34(01), 403–411. DOI: 10.1609/aaai.v34i01.5376 Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/5376 08 Jun 2020.
Abstract:
Our research aims to improve the accuracy of Earthquake Early Warning (EEW) systems by means of machine learning. EEW systems are designed to detect and characterize medium and large earthquakes before their damaging effects reach a certain location. Traditional EEW methods based on seismometers fail to accurately identify large earthquakes due to their sensitivity to the ground motion velocity. The recently introduced high-precision GPS stations, on the other hand, are ineffective to identify medium earthquakes due to its propensity to produce noisy data. In addition, GPS stations and seismometers may be deployed in large numbers across different locations and may produce a significant volume of data consequently, affecting the response time and the robustness of EEW systems.
In practice, EEW can be seen as a typical classification problem in the machine learning field: multi-sensor data are given in input, and earthquake severity is the classification result. In this paper, we introduce the Distributed Multi-Sensor Earthquake Early Warning (DMSEEW) system, a novel machine learning-based approach that combines data from both types of sensors (GPS stations and seismometers) to detect medium and large earthquakes. DMSEEW is based on a new stacking ensemble method which has been evaluated on a real-world dataset validated with geoscientists. The system builds on a geographically distributed infrastructure, ensuring an efficient computation in terms of response time and robustness to partial infrastructure failures. Our experiments show that DMSEEW is more accurate than the traditional seismometer-only approach and the combined-sensors (GPS and seismometers) approach that adopts the rule of relative strength.
Kelevitz, K.; Tiampo, K.F.; Corsa, B.D. Improved Real-Time Natural Hazard Monitoring Using Automated DInSAR Time Series. Remote Sens. 2021, 13, 867.
Meertens, C., Mencin, D., Baker S., Hodgkinson, K., Olds, S., Melgar, S., Rodero, I., Simonet, A., Villalobos, J.J., Tiampo, K., Corsa, B. (2019) When seconds matter – Big Data real-time streaming analytics and machine learning for geoscience and hazards research, Presentation at October 2019 GeoSciFramework Annual Project Meeting, Alexandria, VA.
Last modified: 2021-04-01 13:41:27 America/Denver