TOKYO, Feb 20, 2019 – (JCN Newswire) – Prince of Songkla University Phuket Campus (PSU) and Hitachi, Ltd. (TSE: 6501) have commenced a program of joint research in the field of Thai natural language processing(1). As the first phase of this research, both parties have jointly developed an AI-driven prototype sentiment analysis engine capable of classifying documents written in the Thai language into positive, negative, and neutral sentiment categories.
The engine analyses expressions using a sentiment dictionary refined based on approximately 100 million words of the Thai language data gathered from various social media. Although Thai has numerous specialised spoken language expressions and is known for being more difficult to process than other languages, the engine can perform high-precision sentiment analysis with support for spoken Thai expressions used on social media.
PSU and Hitachi will work towards practical applications of the engine, utilising real-time data posted on Facebook, Twitter and other social media to perform joint evaluations of the prototype, with plans to implement it as part of Hitachi’s Sentiment Analysis Service(2) come April 2019.
In Thailand, mobile devices are gaining widespread popularity at an exponential rate, with a high rate of social media penetration across the country. Thailand has a population of around 69 million, out of which approximately 50 million are Facebook users, and 12 million are Twitter users(3). There is demand to develop products and services based on messaging patterns unique to Thai people on social media. The Thai language has numerous unique spoken language expressions, and many users make frequent use of non-standard spellings, newly created onomatopoeic words and emoticons, which can be difficult to process. Because of such challenges, a massive amount of data pre-processing is required, in order to compensate for inconsistencies in how expressions are displayed in written form.
PSU has closely investigated around 100 million words of the Thai language data gathered from various social media, and constructed a large-scale, highly accurate Thai sentiment dictionary, making it one of the leading research institutions in the field of Thai language processing.
Hitachi, as of October 2018, began offering its Sentiment Analysis Service, a technology capable of classifying and visualising customer voices-gathered from Japanese language media, conversation records and various other sources-into around 1,300 topics, feelings and intentions.
The engine is the first collaboration product between PSU and Hitachi, taking advantage of the results of the research conducted by PSU and the system architecture design expertise and data processing technologies elaborated through Hitachi’s large-scale systems development activities. Utilising hybrid noise-removal functionality (which combines machine learning with several other technologies) and PSU’s large-scale, highly accurate sentiment dictionary, the engine is able to analyse Thai language on social media, while handling a diverse range of unique expressions and spellings.
PSU and Hitachi will continue working together to validate the performance of the engine when real-time data is used, further refine analysis accuracy (such as by providing support for seven-stage sentiment evaluation(4)), and aim to commence service provision through Hitachi as of April 2019. Beyond that, both parties will continue to engage in joint development with the aim of further enhancing the analysis engine, such as by adding sentiments based on consideration of context.
(1) Computer processing of language that is typically used for everyday communication (i.e. natural language).
(2) News release (October 1, 2018): Hitachi Launches Sentiment Analysis Service to Classify and Visualize Voices of Customers into Around 1,300 Types of Topics, Feelings, and Intentions Utilising AI
(3) As of January 2019. Source: StatCounter “Social Media Stats Thailand”
(4) Seven-stage evaluation consists of 3 positive stages, 3 negative stages and one neutral stage, for a total of seven stages.