Poster + Paper
25 August 2022 Joint ALMA Observatory data science exploration on the cloud
Sergio Pavez, Ignacio Toledo, Tomas Staig, Nicolás Ovando, Gastón Vélez, Jorge Ibsen, Jorge Sierra, Agustin Grangetto
Author Affiliations +
Conference Poster
Abstract
The Joint ALMA Observatory (JAO) decided some years ago to become a data-centric operational facility, basing its operational decision-making processes on evidence and ensuring several efforts to adopt data science practices to its daily operations. Key non-profit collaborations allowed ALMA to work with Dataiku, empowering us to design projects to explore high data volumes and prepare solutions to enable informed operational decisions. To increase the capabilities of the data science platform, JAO invested on an in-house infrastructure, providing a Hadoop ecosystem which allowed processing big datasets in reasonable time. The provisioning of such ecosystems is laborious and expensive in terms of system administration effort, highlighting the need to explore alternatives. JAO sought to collaborate with cloud providers to investigate alternatives, deciding to experiment with Amazon Web Services (AWS). A key element to this decision was flexibility provided, and a practical hands-on explorative approach, which was close to JAO's vision. The relationship, formalized through a Memorandum of Understanding, enabled the development of a proof of concept (PoC) aiming to replicate the existing system on the cloud. Although the PoC might not impress as an ambitious goal, designing an architecture using the broad set of technologies offered by AWS to seamlessly work together with Dataiku was a non-trivial challenge on top of the limited six weeks available to complete it and the continuous learning of technologies and concepts. This paper summarizes our results, lessons learned, and key insights gained during our focused and successful rapid prototyping effort.
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Sergio Pavez, Ignacio Toledo, Tomas Staig, Nicolás Ovando, Gastón Vélez, Jorge Ibsen, Jorge Sierra, and Agustin Grangetto "Joint ALMA Observatory data science exploration on the cloud", Proc. SPIE 12186, Observatory Operations: Strategies, Processes, and Systems IX, 121861Q (25 August 2022); https://doi.org/10.1117/12.2630375
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Clouds

Observatories

Data processing

Back to Top