Our new Arabic Word Semantic Similarity benchmark dataset is available contained in the paper of the same name on my publications page. This work is performed with my PhD student Faaza Almarsoomi. We expect it to be of use to scientists who wish to evaluate and compare Arabic Word Semantic Similarity measures.
I have added a white paper on a security application for ST which shows how it goes beyond being a simple lie detector. Some images are redacted, because it is adapted from an earlier document with controlled circulation.
Occasionally I get enquiries through this blog about Silent Talker, because of my papers on it in the publications list.
I am one of the 4 inventors of Silent Talker (with Bandar, McLean & Rothwell). The team of inventors was originally led by Dr Zuhair Bandar, who had the original “Eureka moment” that led to its creation. Zuhair has since left academia to pursue commercial development of Silent Talker and is the point of contact for such enquiries. I continue to lead research activities in academia.
More information on Silent Talker can be found on Wikipedia:
The new dataset for evaluating STSS measures is now available on the datasets page. Years in the making, it has been produced using the best possible methods currently available and the paper from which it is extracted “A new benchmark dataset with production methodology for Short Text Semantic Similarity algorithms” is groundbreaking in establishing the measurement theoretic and statistical validity of the methods used.
The dataset is more representative of the English Language and more demanding than STSS-131, so be prepared for lower correlation coefficients between your algorithms and this dataset than with STSS-65. Both STASIS and LSA score considerably lower. This is a virtue of the dataset, it has much more headroom to demonstrate future improvements in STSS algorithms.
I have just added some variants of function word lists used in my algorithms for Dialogue Act classification (see publications page).
I must admit to not being very productive recently. The reason for this is that my work on a new paper has been disrupted by a complete systems failure on the machine I was using. This was a very nice, stable XP + office 2003 installation, highly superior to anything from Windows Vista for getting work done (as opposed to social networking).
Anyway, today I seem to have made some progress and in particular I beleive I have found the cause of the ATI2DVAG blue screen problem (at least on my machine).
In this case instalations went fine until I allowed it to search for updates for ATI drivers. Once they were loaded the blue screen crashes started occuring. After several re-installations of different combinations of drivers, I believe I have pinned it down to the High Resolution Sound device drivers. Unfortunately windows kept prompting me to install them, but I overcame this by disabling the device itself in the control panel.
I have just added “A Multi-Classifier Approach to Dialogue Act Classification Using Function Words” to the publications list. This is a pre-print of the paper due in July 2012. It extends the technique first described in the papers on classifying quesiton and instruction DAs.
Recognising the DA of each short text is a crucial element in measuring the similarity of 2 DAs.