In this talk we present the features of Science in the Media Monitoring (SMM). a system for automatically tracking science and technology issues in the digital media. The system’s architecture is able to be adapted to different languages and topics, and to collect different sources of information (like online newspapers, blogs and others). At the moment our database is collecting contents from mainstream Italian online newspapers starting from 2008 (with a consistent and coherent database from 2010), and from 2013 of several Italian blogs and from three International English and French newspapers, but it can be expanded for new and other sources, like for example Twitter and social media. Texts collected are parsed and cleaned from html, automatically tagged by a classifier which works through a thesaurus of weighted keywords and stored in the database, that is thus able to make automatic distinctions of relevant and not relevant contents in relation to the selected topics (for example contents regarding science and technology, but also more specific topics like food safety, nanotechnology and others). The systems has a user interface for Boolean search in its database, with charts and word clouds, that give results based on the metadata, like the length of each entry, the source, date, link and other basic information useful for analysis. Results can be exported by the user in txt and excel formats. 

">
 [PCST]
PCST Network

Public Communication of Science and Technology

 

The science in the media monitor (SMM) system

Federico Neresini   University of Padua, Italy Observa Science in Society

Andrea Lorenzet   University of Padua, Italy Observa Science in Society

In this talk we present the features of Science in the Media Monitoring (SMM). a system for automatically tracking science and technology issues in the digital media. The system’s architecture is able to be adapted to different languages and topics, and to collect different sources of information (like online newspapers, blogs and others). At the moment our database is collecting contents from mainstream Italian online newspapers starting from 2008 (with a consistent and coherent database from 2010), and from 2013 of several Italian blogs and from three International English and French newspapers, but it can be expanded for new and other sources, like for example Twitter and social media. Texts collected are parsed and cleaned from html, automatically tagged by a classifier which works through a thesaurus of weighted keywords and stored in the database, that is thus able to make automatic distinctions of relevant and not relevant contents in relation to the selected topics (for example contents regarding science and technology, but also more specific topics like food safety, nanotechnology and others). The systems has a user interface for Boolean search in its database, with charts and word clouds, that give results based on the metadata, like the length of each entry, the source, date, link and other basic information useful for analysis. Results can be exported by the user in txt and excel formats. 

[PDF 270.99 kB]Download the full paper (PDF 270.99 kB)

BACK TO TOP