**************************** Princeton Evocation Data 0.4 Original release: July 2, 2007 This version released: July 10, 2009 **************************** OVERVIEW This attempts to give a large scale measurement of synset similarity gathered using human subjects. We used human judgement to collect two datasets of about 100,000 directed synset pairs. The first dataset is a carefully controlled and calibrated study using trained undergraduates; pairs were randomly selected (thus many of the ratings are zero). The second dataset was collected using Mechanical Turk. This leveraged the first study to try to find pairs that would have higher evocation scores. These data, however, are slightly noiser. We attempted to control for this noise by selecting responses consistent with the first study (the level of this control is described in the 2009 paper). USAGE Anyone can feel free to use these data; we only ask that you 1) don't pass it off as your own and 2) that if you publish something that uses these data please reference: Jordan Boyd-Graber, Christaine Fellbaum, Daniel Osherson, and Robert Schapire. Adding Dense, Weighted Connections to WordNet. In Proceedings of the Third International WordNet Conference. Masaryk University Brno, 2006. This paper also describes more of the motivation of this work, an initial analysis of the data, and some attempts at prediction. It is included in the archive as evocation_introduction.pdf If you use the Mechanical Turk data, please cite: Sonya Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum. Collecting Semantic Similarity Ratings to Connect Concepts in Assistive Communication Tools. Modelling, Learning and Processing of Text-Technological Data Structures (Springer Studies in Computational Intelligence). 2009. CONTENTS In this archive are: 1) README.TXT (this file) 2) Four sets of ratings: controlled (trained undergraduates), all (Mechanical Turk - very strict consistency with trained undergraduates), most (Mechanical Turk - strict consistency with trained undergraduates), some (Mechanical Turk - only extreme outliers removed) For each of these sets of ratings there are: a) synsets - The sense keys of the synset pairs b) raw - Raw evocation ratings provided by users. Each line (corresponding to the pair in evocation.synsets) contains the values supplied by human subjects relating how much the first synset brought to mind the second synset. In a line, the values are separated by a space. The included paper uses data computed by taking the median of these scores for each synset pair. c) word-pos-num - Synsets identified by part of speech and sense number. d) standard (controlled only) - The same as controlled.raw except that scores have been normalized for the individual raters. That is, for each individual, the individual's mean and variance have been used to scale all of their responses so that it has mean zero and variance 1. 4) evocation.pdf and mt.pdf - Papers describing the initial evocation data collection and the Mechanical Turk followup, respectively. CONTACT Please address questions and comments to Jordan Boyd-Graber (jbg@princeton.edu)