Labjor/Unicamp has developed in recent years a computer system known as SAPO (Scientific Automatic Press Observer), which collects, selects, organizes and measures, in an automated fashion, content related to scientific topics published in non-specialized online media. Besides performing searches and retrieving articles containing certain keywords and/or published in a set period of time, the system also produces indicators of the presence of S&T in online media. Articles extracted from the vehicle are analyzed and classified as of scientific content (S&T) or not. The automatic classification of articles as S&T and not S&T introduced by SAPO has inspired other groups to create similar systems in different languages. The existing classifiers are based on a set of keywords (thesaurus) related to science and technology (S&T), but the method has been subject of debate among the collaborative research groups. In this article we will briefly introduce SAPO and present a work in progress on enhancing the classification method based on text mining. 

">
 [PCST]
PCST Network

Public Communication of Science and Technology

 

Frogenstein
The SAPO (scientific automatic press observer) system and beyond

Carlos Vogt   State University of Campinas (Unicamp) Virtual University of the State of São Paulo (Univesp)

Ana Paula Morales   State University of Campinas (Unicamp) Virtual University of the State of São Paulo (Univesp)

Daniel Carnelossi   Virtual University of the State of São Paulo (Univesp)

Angelo Grossi   Virtual University of the State of São Paulo (Univesp)

Labjor/Unicamp has developed in recent years a computer system known as SAPO (Scientific Automatic Press Observer), which collects, selects, organizes and measures, in an automated fashion, content related to scientific topics published in non-specialized online media. Besides performing searches and retrieving articles containing certain keywords and/or published in a set period of time, the system also produces indicators of the presence of S&T in online media. Articles extracted from the vehicle are analyzed and classified as of scientific content (S&T) or not. The automatic classification of articles as S&T and not S&T introduced by SAPO has inspired other groups to create similar systems in different languages. The existing classifiers are based on a set of keywords (thesaurus) related to science and technology (S&T), but the method has been subject of debate among the collaborative research groups. In this article we will briefly introduce SAPO and present a work in progress on enhancing the classification method based on text mining. 

[PDF 210.30 kB]Download the full paper (PDF 210.30 kB)

BACK TO TOP