Data Science and Computational Art History
Invited article for the first issue of International Journal for Digital Art History, published in June 2015
I present a number of core concepts from data science that are relevant to digital art history and the use of quantitative methods to study any cultural artifacts or processes in general. These concepts are objects, features, data, feature space, and dimension reduction. These concepts enable computational exploration of both large and small visual cultural data. We can analyze relations between works on a single artist, many artists, all digitized production from a whole historical period, holdings in museum collections, collection metadata, or writings about art. The same concepts allow us to study contemporary vernacular visual media using massive social media content. (In our lab, we analyzed works by van Gogh, Mondrian, and Rothko, 6000 paintings by French Impressionists, 20,000 photographs from MoMA photography collection, one million manga pages from manga books, one million artworks of contemporary non-professional artists, and over 13 million Instagram images from 16 global cities.) While data science techniques do not replace other art historical methods, they allow us to see familiar art historical material in new ways, and also to study contemporary digital visual culture.
In addition to their relevance to art history and digital humanities, the concepts are also important by themselves. Anybody who wants to understand how our society “thinks with data” needs to understand these concepts. They are used in tens of thousands of quantitative studies of cultural patterns in social media carried out by computer scientists in the last few years. More generally, these concepts are behind data mining, predictive analytics and machine learning, and their numerous industry applications. In fact, they are as central to our “big data society” as other older cultural techniques we use to represent and reason about the world and each other – natural languages, material technologies for preserving and accessing information (paper, printing, digital media, etc.), counting, calculus, or lens-based photo and video imaging. In short, these concepts form the data society’s “mind” – the particular ways of encountering, understanding, and acting on the world and the humans specific to our era.