Arquivo.pt preserves millions of files collected from the web since 1996 and provides a public search service over this information. It contains information in several languages. Periodically it collects and stores information published on the web. Then, it processes the collect data to make it searchable, providing a "Google-like" service that enables searching the past web (English user interface available at archive.pt). This preservation workflow is performed through a large-scale distributed information system.
In this project we aim to provide creative and innovative ways of exploring the data preserved by Arquivo.pt . "Conta-me Historias" offers a narrative temporal view which enables users to get a temporal historical perspective of their searches. In order to guarantee the plurality and diversity of the information, we resort to 24 Portuguese news providers. Based on this, users will be able to construct their own narrative story either following a more credible source or a sensationalist one. One such approach offers journalists a privileged environment for the research of past events, historians, the possibility to revisit the past, and citizens, a democratic and plural access to an enormous wealth of information.
To showcase the Archive.pt data, we show the user the most important excerpts (namely text titles) of a topic over time. For the selection of the best news titles we resort to YAKE! a keyword extractor developed by our team, which has recently been awarded the Best Short Paper Award at the 40th edition of European Conference on Information Retrieval (ECIR'18). Additionally, we use SentiLex-PT01, a sentiment analysis tool for the Portuguese language developed by a national team of researchers, used on our project to analyze the sentiment of titles selected as relevant by YAKE!.