In the field of astronomy, most of the data that a scientist works with is collected by telescopes and large sky surveys. This growth in new data is on the order of terabytes, and will continue to grow with new data from future sky surveys. This challenges current astronomical research methods, and as a result, encourages new approaches to studying it. With the incorporation of data mining techniques, scientists would be able to organize, classify, and find trends in the growing data sets with more ease. This push to a more data-driven area of astronomical research has called for a new discipline that combines astronomy and data mining, called astroinformatics.
Why is astroinformatics so important? Take as an example the Sloan Digital Sky Survey (SDSS). The SDSS is a major optical telescope that produces multi-color images covering more than a quarter of the sky. In addition to these images, it has produced 3-dimensional maps containing numerous amounts of galaxies and quasars. The survey has also produced the largest picture of the sky to date. The SDSS takes a few hundred gigabytes of raw data every night, and since the beginning of the survey, the amount of raw data collected has grown to a few terabytes, and continues to grow as the survey goes on. With such a large amount of data, the task of processing it becomes more difficult and time-consuming.
But the SDSS isn't the only sky survey out there. A future survey, called the Large Synoptic Survey Telescope (LSST) that plans to be operational by 2022, will be producing up to 30 terabytes of data per night. That's about as much data that the SDSS collects in its whole operation! At that rate, the LSST is expected to produce about 2 petabytes of uncompressed data per year. That is far more than what humans are able to review, and reviewing the data is the most difficult part of the project. This is why astroinformatics is important.
With effective data mining techniques, the ability to review the enormous data output from the survey becomes much more simplified. By creating well-tested, straightforward data mining algorithms, the process of reviewing data and classifying objects allows for the data to be placed in searchable and downloadable databases in real-time. This allows scientists and researchers from all over the world to have immediate access to large, high quality data sets that would otherwise take months or years to be published if the data were to have been reviewed just by humans.
Why is astroinformatics so important? Take as an example the Sloan Digital Sky Survey (SDSS). The SDSS is a major optical telescope that produces multi-color images covering more than a quarter of the sky. In addition to these images, it has produced 3-dimensional maps containing numerous amounts of galaxies and quasars. The survey has also produced the largest picture of the sky to date. The SDSS takes a few hundred gigabytes of raw data every night, and since the beginning of the survey, the amount of raw data collected has grown to a few terabytes, and continues to grow as the survey goes on. With such a large amount of data, the task of processing it becomes more difficult and time-consuming.
But the SDSS isn't the only sky survey out there. A future survey, called the Large Synoptic Survey Telescope (LSST) that plans to be operational by 2022, will be producing up to 30 terabytes of data per night. That's about as much data that the SDSS collects in its whole operation! At that rate, the LSST is expected to produce about 2 petabytes of uncompressed data per year. That is far more than what humans are able to review, and reviewing the data is the most difficult part of the project. This is why astroinformatics is important.
With effective data mining techniques, the ability to review the enormous data output from the survey becomes much more simplified. By creating well-tested, straightforward data mining algorithms, the process of reviewing data and classifying objects allows for the data to be placed in searchable and downloadable databases in real-time. This allows scientists and researchers from all over the world to have immediate access to large, high quality data sets that would otherwise take months or years to be published if the data were to have been reviewed just by humans.
No comments:
Post a Comment