Cultural Analytics of Large Datasets from Flickr
Daniela Ushizima, Todd Margolis, and Jeremy Douglass.
The Proceedings of the Sixth International AAAI Conference on Weblogs and Social Media, Trinity College in Dublin, Ireland, June 4–8, 2012.
Our goal is to use computational methods to analyze large samples of cultural fields, map these samples according to visual style, content and other dimensions, by using met- rics in high dimensional spaces, and visualization tools. To the best of our knowledge, there is no system that provides “exploratory data analysis” with large image sets created by users and available on social media networks. We propose a framework for identifying clusters based on visual and/or semantic similarity that uses digital image analysis to de- scribe the content. Different from content-based image retrieval, and assumptions about existing labels such as ”styles” and ”genres,” we want to map the full visual variability of large cultural datasets. Such datasets sel- dom allow well-defined hypothesis definition with clear la- bels before analysis, and require alternative approaches (Ya- maoka et al. 2011). This is the grand idea of our project, and this paper lays out some preliminary, but fundamental steps towards clustering without “labels.”
Our image datasets come from two Flickr groups: a) Art Now, described as “A group for displaying, fostering aware- ness and discussing the emerging relevant art and artists of today”; b) Graphic Design, described as “Anything from drawings you did in Paint to photoshoped images. In our project, we compare the same number of images (170,000) from each group.