Microsoft Research Faculty Summit: eScience

Microsoft Research Faculty Summit 2007
Microsoft Conference Center, Redmond, Washington, July 16 2007

eScience: Data Capture to Scholarly Publication
Tony Hey, Microsoft Research (Chair)

Research Communication, Navigation, Evaluation, and Impact in the Open Access Era
Stevan Harnad, University of Southampton

The global research community is moving toward the optimal and inevitable outcome in the online age: All research articles as well as the data on which they are based will be openly accessible for free for all on the web, deposited in researchers’ own OAI-compliant Institutional Repositories, and mandated by their institutions and funders. Research users, funders, evaluators, and analysts, as well as teachers, and the general public will have an unprecedented capacity not only to read, assess and use research findings, but to comment upon them, entering into the global knowledge growth process. Prepublication preprints, published postprints, data, analytic tools and commentary will all be fully and navigably interlinked. Scientometrics will generate powerful new ways to navigate, analyze, rank, and evaluate this Open Access corpus, its past history, and its future trajectory. A vast potential for providing services that mine and manage this rich global research database will be open both to the academic community as well as to enterprising industries. [See: “Publication-Archiving, Data-Archiving and Scientometrics,” forthcoming in CTWatch]

The Digital Data Universe
Chris Greer, National Science Foundation

CyberInfrastructure to Support Scientific Exploration and Collaboration
Dennis Gannon, Indiana University

Funding for experimental and computational science has undergone a dramatic shift from having been dominated by single investigator research projects to large, distributed, and multidisciplinary collaborations tied together by powerful information technologies. Because cutting-dge science now requires access to vast data resources, extremely high-powered computation, and state-of-the-art tools, the individual researcher with a great idea or insight is at a serious disadvantage compared to large, well-financed groups. However, just as the Web is now able to provide most of humanity with access to nearly unlimited data, theory, and knowledge, a transformation is also underway that can broaden participation in basic scientific discovery and empower entirely new communities with the tools needed to bring about a paradigm shift in basic research techniques.
   The roots of this transformation can be seen in the emergence of on-demand supercomputing and vast data storage available from companies like Amazon and the National Science Foundation’s TeraGrid Science Gateways program, which takes the concept of a Web portal and turns it into an access point for state-of-the-art data archives and scientific applications that run on back-end supercomputers. However, this transformation is far from complete. What we are now seeing emerge is a redefinition of ?computational experiment? from simple reporting of the results from simulations or data analysis to a documented and repeatable workflow in which every derived data product has an automatically generated provenance record. This talk extrapolates these ideas to the broader domain of scholarly workflow and scientific publication, and qualitative as well as quantitative data, and ponders the possible impact of multicore, ubiquitous gigabyte bandwidth and personal exabyte storage.