World Historical Dataverse Workshop, 23 February 2011

World-Historical Dataverse (www.dataverse.pitt.edu)
World-Historical Dataverse Workshop
University of Pittsburgh: Pittsburgh Athletic Association

February 23, 2011

Project Vision: The World-Historical Dataverse (WHD) project (www.dataverse.pitt.edu) aims to create a comprehensive set of data on social-scientific, health, and environmental data for the world as a whole – as well as for its constituent regions and localities – covering the last four or five centuries.  This global dataset would be of immense value for historical study and for policymaking, but major research breakthroughs are required to advance the project. In particular, creating a methodology for “Space-Time Analysis” may be the most challenging research issue.

Workshop Purposes: introduce the project; identify transformational research issues in large-scale historical datasets; select specific deliverables; build community of collaborators.  This will be a day of brainstorming.

Participants: Researchers working with large-scale datasets in GIS, in distributed computing, in cyberinfrastructure.

Workshop Outcome: Report highlighting key research issues and challenges and outlining short-term and long-term projects to address them.

Please note: online videos of each talk may be accessed through the links on this page

Workshop  Schedule
9:30. Coffee and Refreshments.
9:45. Introductions.
10:00 – 11:00.  Presentations by Dataverse project principals:

World-Historical Dataverse project objectives and overview
– Patrick Manning, Univ. of Pittsburgh

Assembling and analyzing historical data
– Siddharth Chandra, Michigan State Univ.

Space-time analysis and modeling
– Hassan Karimi, Univ. of Pittsburgh

11:00 – 12:00.  Presentations by collaborators working on related projects:

Historical GIS: experience and vision.
– Humphrey Southall, University of Portsmouth

Archiving and temporal analysis.
– Merrick Lex Berman, Harvard University

Lessons of previous large-scale projects.
– Ruth Mostern, University of California - Merced

12:00 – 1 :00. Lunch

1:00 – 1:45. Presentations by NSF program officer:

Geography and Spatial Sciences
– Dr. Thomas Baerwald

1:45 – 2:00. Break

2:00 – 3:30. General discussion: Continuing research; potential funding opportunities

3:30. Workshop ends

The World-Historical Dataverse: Project Vision

Imagine a dynamic, global representation of the human social system. Such a representation would illustrate population growth, decline, and redistribution; social transformations in class, community, and occupations; the shifts in politics of empires and nations; and conflicts through war and social movements. It would document improvements in health, spikes in disease, and trace the course of environmental degradation. It would demonstrate the effects of industrialization, expanding trade, economic cycles, and frightening growth in economic inequality. Most importantly, it would permit the display of these variables at the global level, thereby revealing the dynamics of past global change.

The World-Historical Dataverse project lays the groundwork for just such a representation. It is an open-source, open-access project that will be used by scholars from various disciplines to document global trends; by policy-making groups to acquire background information; and by social scientists who seek to analyze global and local trends. The project is neither a deterministic model equivalent to climate models nor a giant empirical dataset, but includes aspects of each. This combination of data repository and analytical system, while of value in itself for providing the as-yet-unseen global view of our society, is of even more importance to identifying the global dynamics in human society that should inform the potentially irrevocable policy decisions now being made.

Challenges and potential breakthroughs for this work fall into five primary categories:

  • Collecting and archiving data. The project’s electronic archive will be accessible, permanent, and will incorporate citation of authors. Data are currently online at the World-Historical Dataverse (www.dataverse.pitt.edu) and on the Harvard-based Dataverse Network (dvn.iq.harvard.edu/dvn/dv/worldhistorical).
  • Defining data. Metadata and data structures; ontologies and search engines in space, time, and topic.
  • Integrating data. Data integration involves aggregating small datasets into large datasets; it includes transformations to account for location, time, languages, topical categories, weights and measures; it resolves conflicts among constituent datasets. 
  • Constructing data. For data-poor domains, the project will devise a means of extending existing data to highlight important variables through interpolation, simulation and estimation.
  • Visualization. A user interface providing data by space, time, and topic as requested by the user.  It includes constituent datasets, citations of authors, and details on all data transformations.

The ultimate deliverable is an open, online data resource, updated regularly, which will provide consistent local and global data on social-science, health, and environmental impacts over the past four centuries. The project will provide a steady stream of deliverables of inherent value throughout each stage of its development (see bulleted list below). Project leaders will seek to collaborate with any parallel projects. In terms of data, the project focuses initially on selected variables in order to illustrate the range of variables and links among them that can be discovered with time. In design and programming, project directors are committed to identifying modules within the overall project that can be funded and delivered separately -- yet built so that they will all work together toward the long-term objective. Among them are:  

  • On-line Digital Repository. Archival holdings of data and related papers as well as the data structures to integrate diverse data, updated continuously.
  • Index of datasets. Lists of major online historical data collections, updated annually.
  • Data syntheses. Global aggregations of selected commodity data to be updated annually: for example,  opium (2010), silver (2011), rice (2012).
  • Tools. Ontologies, standards, and other project tools, released as completed.