As part of my current research on organizational change, my research team has been working to setup a new repository for social scientists to access Internet Archive data. I’m happy to announce that we’re now live with a Beta version of our data repository, ArchiveHub. ArchiveHub (archivehub.rutgers.edu) is a community data sharing site built on the Hub Zero platform.
In the short run, the site is a work in progress. Under the resources tab, you’ll find data on congressional websites and Occupy Wall Street. The datasets are currently structured for social network analysis, and contain a standard link structure: source, destination, date, frequency, and any associated text that describes the link. Each data set contains a read me file that provides additional information. In the coming months, we’ll be releasing more and more datasets; there is a file under the FAQ section that describes the general datasets that are available, and provides additional information.
The long term vision is that this site will provide a repository for social scientists who are interested in archival Internet data to both (a) access hosted datasets and (b) post their own versions of data as researchers continue to manipulate data and modify them to fit their needs.
In the meantime, we’re always looking for suggestions as to what would be useful for others. I’m working on getting data from our Media dataset live very soon; that repository contains 20TB of data, so it’s taking a while to plow on through it.