The new dataset for evaluating STSS measures is now available on the datasets page. Years in the making, it has been produced using the best possible methods currently available and the paper from which it is extracted “A new benchmark dataset with production methodology for Short Text Semantic Similarity algorithms” is groundbreaking in establishing the measurement theoretic and statistical validity of the methods used.
The dataset is more representative of the English Language and more demanding than STSS-131, so be prepared for lower correlation coefficients between your algorithms and this dataset than with STSS-65. Both STASIS and LSA score considerably lower. This is a virtue of the dataset, it has much more headroom to demonstrate future improvements in STSS algorithms.