In Jan 2019, three teams of year two and year three Computer Science and Technology (CST) students performed research on artificial intelligence in natural language processing. The three teams entered the 13th International Workshop on Semantic Evaluation competition and achieved good results in three main tasks against some of the best graduate-level students from this field in the world.

Task Name Participating students Rank 
OffensEval: Identifying and Categorizing Offensive Language in Social Media Wang Jianming,Wu Zhenghao,Zheng Hao Task A: (Offensive language identification) ranks 6th out of 103 teams.
Task B: (Automatic categorization of offense types) ranks 61st out of 75 teams.
Task C: (Offense target identification) ranks 39th out of 65 teams.
EmoContext: Contextual Emotion Detection in Text Huang Zihao,Long Yuepeng,Xu Zimu Ranks 27th out of 165 teams
Hyperpartisan News Detection Lin Yuanzhen,Ning Zhiyuan,Zhong Ruichao -


The International Workshop on Semantic Evaluation was held by the SIGLEX group of the Association for Computational Linguistics (ACL). ACL is the most influential organization for computational linguistics and natural language processing in the world. Its annual international semantic evaluation competition attracts researchers from laboratories in the world's top universities and technology companies. Well more than 100 teams consisting mostly of PhD level students participating in the competition.

In this competition, the three tasks selected by CST students focus on detecting emotions, opinions, and abusive language in messages posted in social media. For example, in the task OffensEval: Identifying and Categorizing Offensive Language in Social Media, given a text posted in Twitter, participating teams need to utilize the given training data set to create an artificial intelligence model that can identify whether the text posted is an offensive speech or not. Due to the diversity of human languages, it is difficult for a machine to identify such text correctly. Also users in social media often speak in a casual (not grammatically standard) way, making it harder for machine to understand the text.

At first, the students found such a complex natural language processing task a headache and did not know how to tackle the task. Professor Weifeng Su, the director of the CST program, and Dr. Jefferson Fong, a program assistant professor, held weekly seminars to help students quickly grasp the concepts of artificial intelligence, machine learning, deep learning and natural language processing and familiarize the students with the methods and tools used in the research. When the students had a question, the instructor would patiently discuss with the students and give a feasible suggestions for a solution.

Finally, all the student in the research groups chose to use the Bidirectional Encoder Representations from Transformers (BERT) model released by Google Research in October 2018. Because the training of the BERT model requires a lot of computational power, during the competition, the Data Science program provided the research teams with the computational resources for training, which allowed the researchers to make smooth progress. The students used the given training data for the task to fine-tune the pre-train model. They tried many different ways to improve the results.

Through the competition, the students learned much about the cutting-edge technology in artificial intelligence and natural language processing. Furthermore they also improved their coding and teamwork skills.

(From CST)