The U.S. Virtual Astronomical Observatory (VAO http://www.us-vao.org/) has been in operation since May 2010. Its goal is to enable new science through efficient integration of distributed multi-wavelength data. This paper describes the management and organization of the VAO, and emphasizes the techniques used to ensure efficiency in a distributed organization. Management methods include using an annual program plan as the basis for establishing contracts with member organizations, regular communication, and monitoring of processes.
SkyServer is an Internet portal to data from the Sloan Digital Sky Survey, the largest online archive of astronomy
data in the world. provides free access to hundreds of millions of celestial objects for science, education and
outreach purposes. Logs of accesses to SkyServer comprise around 930 million hits, 140 million web services
accesses and 170 million SQL submitted queries, collected over the past 10 years. These logs also contain
indications of compromise attempts on the servers. In this paper, we show some threats that were detected in
ten years of stored logs, and compare them with known threats in those years. Also, we present an analysis of
the evolution of those threats over these years.
The 3.2 giga-pixel LSST camera will produce approximately half a petabyte of archive images every month. These data need to be reduced in under a minute to produce real-time transient alerts, and then added to the cumulative catalog for further analysis. The catalog is expected to grow about three hundred terabytes per year. The data volume, the real-time transient alerting requirements of the LSST, and its spatio-temporal aspects require innovative techniques to build an efficient data access system at reasonable cost. As currently envisioned, the system will rely on a database for catalogs and metadata. Several database systems are being evaluated to understand how they perform at these data rates, data volumes, and access patterns. This paper describes the LSST requirements, the challenges they impose, the data access philosophy, results to date from evaluating available database technologies against LSST requirements, and the proposed database architecture to meet the data challenges.
We describe the Galaxy Evolution Explorer (GALEX) satellite that was launched in April 2003 specifically to accomplish far ultraviolet (FUV) and near ultraviolet (NUV) imaging and spectroscopic sky-surveys. GALEX is currently providing new and significant information on how galaxies form and evolve over a period that encompasses 80% of the history of the Universe. This is being accomplished by the precise measurement of the UV brightness of galaxies which is a direct measurement of their rate of star formation. We briefly describe the design of the GALEX mission followed by an overview of the instrumentation that comprises the science payload. We then focus on a description of the development of the UV sealed tube micro-channel plate detectors and provide data that describe their on-orbit performance. Finally, we provide a short overview of some of the science highlights obtained with GALEX.
PRIME (The Primordial Explorer) is a proposed Explorer-class mission. It will carry out a deep sky survey from space in four near-infrared bands between ~0.9-3.5 μm. It surveys a quarter of the sky to AB magnitude of ~24, which is ~600 times deeper than 2MASS and ~ five million times deeper than COBE at long wavelengths. Deeper surveys in selected sky regions are also planned. PRIME will reach an epoch during which the first quasars, galaxies and clusters of galaxies were formed in the early universe, map the large-scale structure of the dark matter, discover Type-Ia supernovae to be used in measuring the acceleration of the expanding universe, and detect thousands of brown dwarfs and even Jupiter-size planets in the vicinity of the solar system. Most of these objects are so rare that they may be identified only in large and deep surveys. PRIME will serve as the precursor for the Next Generation Space Telescope (NGST), supplying rare targets for its spectroscopy and deep imaging. It is more than capable of providing targets for the largest ground-based telescopes (10-30m). Combining PRIME with other surveys (SDSS, GALEX) will yield the largest astronomical database ever built.
The Galaxy Evolution Explorer (GALEX), a NASA Small Explorer Mission planned for launch in Fall 2002, will perform the first Space Ultraviolet sky survey. Five imaging surveys in each of two bands (1350-1750Å and 1750-2800Å) will range from an all-sky survey (limit mAB~20-21) to an ultra-deep survey of 4 square degrees (limit mAB~26). Three spectroscopic grism surveys (R=100-300) will be performed with various depths (mAB~20-25) and sky coverage (100 to 2 square degrees) over the 1350-2800Å band. The instrument includes a 50 cm modified Ritchey-Chrétien telescope, a dichroic beam splitter and astigmatism corrector, two large sealed tube microchannel plate detectors to simultaneously cover the two bands and the 1.2 degree field of view. A rotating wheel provides either imaging or grism spectroscopy with transmitting optics. We will use the measured UV properties of local galaxies, along with corollary observations, to calibrate the UV-global star formation rate relationship in galaxies. We will apply this calibration to distant galaxies discovered in the deep imaging and spectroscopic surveys to map the history of star formation in the universe over the red shift range zero to two. The GALEX mission will include an Associate Investigator program for additional observations and supporting data analysis. This will support a wide variety of investigations made possible by the first UV sky survey.
The IFA and collaborators are embarking on a project to develop a 4-telescope synoptic survey instrument. While somewhat smaller than the 6.5m class telescope envisaged by the decadal review in their proposal for a LSST, this facility will nonetheless be able to accomplish many of the LSST science goals. In this paper we will describe the motivation for a 'distributed aperture' approach for the LSST, the current concept for Pan-STARRS -- a pilot project for the LSST proper -- and its performance goals and science reach. We will also discuss how the facility may be expanded.
Science is becoming very data intensive. Today's astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such a task today is trivial, almost manageable with a laptop. We think that the issue of a PB database will be very similar in six years. In this paper we scale our current experiments in data archiving and analysis on the Sloan Digital Sky Survey data six years into the future. We analyze these projections and look at the requirements of performing data mining on such data sets. We conclude that the task scales rather well: we could do the job today, although it would be expensive. There do not seem to be any show-stoppers that would prevent us from storing and using a Petabyte dataset six years from today.
Datasets with tens of millions of galaxies present new challenges for the analysis of spatial clustering. We have built a framework, that integrates a database of object catalogs, tools for creating masks of bad regions, and a fast (NlogN) correlation code. This system has enabled unprecedented efficiency in carrying out the analysis of galaxy clustering in the SDSS catalog. A similar approach is used to compute the three-dimensional spatial clustering of galaxies on very large scales. We describe our strategy to estimate the effect of photometric errors using a database. We discuss our efforts as an early example of data-intensive science. While it would have been possible to get these results without the framework we describe, it will be infeasible to perform these computations on the future huge datasets without using this framework.
Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that published scientific data needs to be available forever -- this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.
Web Services form a new, emerging paradigm to handle distributed access to resources over the Internet. There are platform independent standards (SOAP, WSDL), which make the developers' task considerably easier. This article discusses how web services could be used in the context of the Virtual Observatory. We envisage a multi-layer architecture, with interoperating services. A well-designed lower layer consisting of simple, standard services implemented by most data providers will go a long way towards establishing a modular architecture. More complex applications can be built upon this core layer. We present two prototype applications, the SdssCutout and the SkyQuery as examples of this layered architecture.