Home | Chinese | Contacts TRS
 
 
      TRS
      About TRS
      Mangement
      Research
      Milestones
      Awards
 
      Products
      TRS Database Server
      TRS WCM
      TRS CKM
      TRS CIS
      TRS InfoRadar
 
      Contacts TRS

      Tel86-10-64848899
      Fax86-10-64889088
      Emailtrs@trs.com.cn

 
 
 
Your Position: TRS CKM
 
     
 

TRS CKM is an implementation of low level text mining algorithm to provide stable and encapsulated APIs for the Chinese text mining application. So far, TRS CKM has 8 components including taxonomy, similarity search, dissertation, filtration, phrase search, clustering, segmentation and extraction.


TRS CKM can be applied in Enterprise Portal, Information value added service, intelligent search engine, digital library, intelligence analysis, information security and filtration and E-commerce.

 
 

Overview of functions of components

TRS Text TaxonomyText taxonomy can classify the text without human interference to improve the efficiency of processing unstructured information resources
Text taxonomy classifies the text into categories based on contents and can be used to classify news, resume, email, office documents. It provides two ways to classify the text which are content based and rules based.

 

Functions and advantages
High accuracy, accuracy rate can be up to 86%-90% after train the computer enough and rules based classification can be up to 95%.
High speed classification:? about 40 articles per seconds
The users can combine to use two kinds of methods simultaneously to get higher accuracy
The mechanism of adaptive learning and complementary training are supported to let users feedback according to the real situation and complement training material and rules so that the accuracy can be improved constantly
Scalability, several taxonomy standards and systems are supported, English and Chinese classification are supported as well

 

Similarity search Duplication detection technology based on fingerprint and has good performance at the scale of 10,000,000 web pages
Duplication searching is finding the similar articles with specific sample in a very large scale articles. The practice shows that TRS duplication searching has very good performance in processing unstructured data.

 

Functions and Advantages
High accuracy, the technology is based on the characteristic vector of the articles and will not misjudge while the tiny changes on title and contents, even though the title is changed. It is adaptive for other languages.
Configurable threshold: the user can configure the threshold to control the similarity of two articles based on the users needs.
High speed, similarity search can get the result within one second when there are 200 thousands articles, and with more than 1 million articles, the response time is 2-3 seconds.

 

TRS Text clusteringThe technology is based on similarity algorithm to provide the function of visual analysis for the massive documents.
Text Clustering can separate massive documents into categories and produce the keywords for each category. The technology can be applied in news dissertation, tracing the headlines intelligence visual analysis.

 

Functions and advantages
High speed and accuracy, the accuracy ratio is above 75%, the speed is 100 articles per 1seconds and 10000 articles about 5 minutes.
Scalability, can be used in other languages
Supporting hierarchy clustering, the several key-words could be produced for each category.

 

Text abstracting? Text abstracting is based the technology of statistic and able to produce the abstract automatically.
Text abstracting can get the keyword and abstract automatically for users and can reduce the workload of the editors greatly.

 

Functions and advantages
Indexing keywords, the system can index the keyword in the article based the keywords table and support Chinese government standard keywords table currently.
Weight-based abstract is supported according to the keywords users provide to produce the keywords based abstract. At the same time, users can configure the length of abstract.??
It can be used in English and other languages.
High performance, about 100 articles per second.

 

TRS text filtration The technology is based on contents understanding.
The technology is based on statistic and machine learning algorithms with the function of recognizing and filtering harmful and garbage information to keep users from harassment. The technology could be applied in Internet monitoring, email shielding etc.

 

Functions and advantages
Positive and negative information filtering: Unlike traditional keywords matching? based filtering technology, TRS text filtering is based on statistic and machine learning algorithms can judge the positive and negative information very accurately.
High performance: average speed up to 40 articles.
Adaptive, users can use different template to filter contents for different topics.

 

Phrase searching The technology that combines advantage of data mining and manual arrangement, it is the very good assistant when users search the contents.
The technology can get the related phrase based on semantic information, phrase structure and phrase dictionary. This function is recommend the related phrase when the user input phrase in search box.


Functions and advantages
Support super large dictionary that includes about one million items. The phrase dictionary can be maintained based on searching logs automatically and added or changed manually.
High performance: up to 100 searches per seconds
 

Text segmentationBased on rules and statistic to reduce ambiguity efficiently
The segmentation technology is based on rules and statistic to segment the text into meaningful words or characters.

 
Functions and Advantages
Embedding ambiguity rules library.
Recognize people name, places name and organizations name|
Support multiple language and GB18030 and UTF8 encoding.
High speed, 300K bytes above per seconds.
 

Text extraction Based on rules and statistic algorithms to extract the fact information from text.
The technology extracts the meaningful fact information from unstructured information and presents these information in the structured form

 
 

TRS CKM operating environmen

Supported operating systems include

Windows NT/2000/xp/2003,Linux

 
 
TRS provides local dynamic library link (C API) and SOAP API (Web Services) to facilitate users to develop the applications rapidly or embed CKM into other applications easily.
 
 
                         ©Copyright-2010 Beijing TRS Information Technology CO.LTD all rights reserved                           | Home | Contacts TRS |