A slide presentation by Mercè Crosas, March 22, 2019.
“As a researcher who is trying to understand the structure of the Milky Way, I often deal with very large astronomical datasets (terabytes of data, representing almost two billion unique stars). Every single dataset we use is publicly available to anyone, but the primary challenge in processing them is just how large they are. Most astronomical data hosting sites provide an option to remotely query sources through their web interface, but it is slow and inefficient for our science….
To circumvent this issue, we download all the catalogs locally to Harvard Odyssey, with each independent survey housed in a separate database. We use a special python-based tool (the “Large-Survey Database”) developed by a former post-doctoral scholar at Harvard, which allows us to perform fast queries of these databases simultaneously using the Odyssey computing cluster….
To extract information from each hdf5 file, we have developed a sophisticated Bayesian analysis pipeline that reads in our curated hdf5 files and outputs best fits for our model parameters (in our case, distances to local star-forming regions near the sun). Led by a graduate student and co-PI on the paper (Joshua Speagle), the python codebase is publicly available on GitHub with full API documentation. In the future, it will be archived with a permanent DOI on Zenodo. Also on GitHub users will find full working examples of the code, demonstrating how users can read in the publicly available data and output the same style of figures seen in the paper. Sample data are provided, and the demo is configured as a jupyter notebook, so interested users can walk through the methodology line-by-line….”
“In celebration of OA Week, the Harvard Library Office for Scholarly Communicationwill share some great news about OA and the Harvard Community:
- The OSC will launch a new OA policy for staff, researchers, and scholars to use open-access licensing
- we will share our annual statistics from around the world, highlighting Harvard’s scholarship’s impact
- reveal the new and improved Harvard open-access repository, DASH (Digital Access to Scholarship at Harvard).
In addition, the Harvard Library OSC and the Research Data Management Programare teaming up to co-sponsor a series of events during OA week, including an open-access open house, interactive workshops on ORCID, reproducibility, Dataverse, and more. See the schedule for more details….”
“Dataverse’s latest update adds more metadata to dataset landing pages, using a community-driven vocabulary supported by major search engines to make it even easier to find open data online.
Search results account for a large portion of traffic to datasets published online. For example, since Dataverse 4 was released in June 2015, at least a fifth of the traffic to dataset pages in the largest Dataverse installation, Harvard Dataverse, has come from search engines, mostly Google. Giving search engines and other systems richer metadata to index datasets will help people find data faster….”
“Among those researchers that do archive and share data, GitHub is indeed the most often used, but just as many people indicate using ‘others’ (i.e. tools not mentioned as one of the preselected options). Figshare comes in third, followed by Bitbucket, Dryad, Dataverse, Zenodo and Pangaea (Figure 3)….Another surprising finding is the overall low use of Zenodo – a CERN-hosted repository that is the recommended archiving and sharing solution for data from EU-projects and -institutions. The fact that Zenodo is a data-sharing platform that is available to anyone (thus not just for EU project data) might not be widely known yet….”
“As a result of collaborations with the Office of the Vice Provost for Research, Harvard University Information Technology, and IQSS, Harvard Library has launched a customized version of DMPTool, an online data management planning tool, for Harvard University. Data management plans—documents that outline what researchers will do with data during and after a project—are becoming increasingly required by funding agencies such as the National Institutes of Health and the National Science Foundation. The online tool provides step-by-step guidance for creating data management plans that include templates and examples; it also helps researchers create and share their plans, assisting them in how to address requirements specific to Harvard….”
“On Oct. 1, researchers at HMS and Harvard University received a three-year, $1.6 million grant from the Leona M. and Harry B. Helmsley Charitable Trust to help solve the problem by developing a global open-source system that can manage large biomedical datasets….The endeavor expands on the Dataverse, an open-source, web-based research storage and sharing application led by Crosas. The Dataverse was originally designed for the social sciences and will now be augmented to better accommodate bigger datasets from structural biology, cell biology and other fields….”
1. An Open-Access Policy for Harvard Medical School (October 23, 2014)
Harvard Medical School adopted an open-access policy on June 18, 2014, by a unanimous vote of the Faculty Council. The new policy covers both “quad”-based and clinical faculty. As a result, all Harvard schools now have open-access policies. Like the other Harvard policies, the Medical School policy insures that faculty members automatically retain a license to share their research papers freely through DASH (Digital Access to Scholarship at Harvard), the University’s open-access repository. Faculty also have the option to waive this license for any article, preserving their freedom to submit new work to the journals of their choice. (Read more.)
2. Harvard’s School of Engineering and Applied Sciences Recommends Open-Access Deposit for Faculty Review Process (October 22, 2014)
Harvard’s School of Engineering and Applied Sciences (SEAS) announced a pilot project recommending to faculty engaged in a review, promotion, or tenure process to use Harvard’s open-access repository DASH (Digital Access to Scholarship at Harvard) as part of their preparations. SEAS is part of the Harvard Faculty of Arts and Sciences, which unanimously adopted an open-access policy in 2008, asking faculty to deposit their new scholarly articles in DASH. SEAS strongly supports this policy and sees this program as one more incentive to help implement the policy. (Read more.)
3. Harvard’s Berkman Center for Internet & Society Adopts an Open-Access Policy (October 22, 2014)
The Berkman Center for Internet & Society announced that the Center’s faculty directors and staff have adopted an open-access policy. In a landmark unanimous vote, the Berkman Center became the first research center at Harvard to adopt an open-access policy, and the first to extend the scope of Harvard’s open-access policies beyond the faculty. (Read more.)
4. Harvard Library Lifts Restrictions on Digital Reproductions of Works in the Public Domain (October 21, 2014)
The Harvard Library announced a new policy on the use of digital reproductions of works in the public domain. When the Library makes such reproductions and makes them openly available online, it will treat the reproductions themselves as objects in the public domain and will not try to restrict what users can do with them. For additional detail, see the policy FAQ. (Read more.)
5. Peerless Preservation for Harvard’s Open-Access Repository (October 20, 2014)
The Harvard Library Office for Scholarly Communication andHarvard University Archives announced two initiatives to preserve Harvard’s open-access research in the Library’s state-of-the-art digital preservation system, Digital Repository Services (DRS). One initiative will cover electronic theses and dissertations (ETDs) and one will cover the scholarly articles inDASH, Harvard’s open-access repository. (