****************************

Princeton Evocation Data 0.4
Original release: July 2, 2007
This version released: July 10, 2009

****************************

OVERVIEW

This attempts to give a large scale measurement of synset similarity
gathered using human subjects.  We used human judgement to collect two
datasets of about 100,000 directed synset pairs.

The first dataset is a carefully controlled and calibrated study using
trained undergraduates; pairs were randomly selected (thus many of the
ratings are zero).

The second dataset was collected using Mechanical Turk.  This
leveraged the first study to try to find pairs that would have higher
evocation scores.  These data, however, are slightly noiser.  We
attempted to control for this noise by selecting responses consistent
with the first study (the level of this control is described in the
2009 paper).

USAGE

Anyone can feel free to use these data; we only ask that you 1) don't
pass it off as your own and 2) that if you publish something that uses
these data please reference:

Jordan Boyd-Graber, Christaine Fellbaum, Daniel Osherson, and Robert
Schapire. Adding Dense, Weighted Connections to WordNet. In
Proceedings of the Third International WordNet Conference. Masaryk
University Brno, 2006.

This paper also describes more of the motivation of this work, an
initial analysis of the data, and some attempts at prediction.  It is
included in the archive as evocation_introduction.pdf

If you use the Mechanical Turk data, please cite:

Sonya Nikolova, Jordan Boyd-Graber, and Christiane Fellbaum.
Collecting Semantic Similarity Ratings to Connect Concepts in
Assistive Communication Tools.  Modelling, Learning and Processing of
Text-Technological Data Structures (Springer Studies in
Computational Intelligence). 2009.

CONTENTS

In this archive are:

1) README.TXT (this file)

2) Four sets of ratings: controlled (trained undergraduates), all
(Mechanical Turk - very strict consistency with trained
undergraduates), most (Mechanical Turk - strict consistency with
trained undergraduates), some (Mechanical Turk - only extreme outliers
removed)

	For each of these sets of ratings there are:

	a) synsets - The sense keys of the synset pairs

	b) raw - Raw evocation ratings provided by users.  Each line
	(corresponding to the pair in evocation.synsets) contains the
	values supplied by human subjects relating how much the first
	synset brought to mind the second synset.  In a line, the
	values are separated by a space.  The included paper uses data
	computed by taking the median of these scores for each synset
	pair.

	c) word-pos-num - Synsets identified by part of speech and
	sense number.

	d) standard (controlled only) - The same as controlled.raw
	except that scores have been normalized for the individual
	raters.  That is, for each individual, the individual's mean
	and variance have been used to scale all of their responses so
	that it has mean zero and variance 1.


4) evocation.pdf and mt.pdf - Papers describing the initial evocation
data collection and the Mechanical Turk followup, respectively.

CONTACT

Please address questions and comments to Jordan Boyd-Graber (jbg@princeton.edu)