Ph.D DISSERTATION: Past, Present, and Future on News Stream: Discovering Story Chains, Selecting Public Front-page and Predicting Public Events
Çağrı Toraman
Ph.D Candidate
(Supervisor: Prof. Dr. Fazlı Can)
Computer Engineering Department
Abstract
News streams have several challenges for the past, present, and future of events. The past hides relations among events and actors; the present reflects needs of news readers; and the future waits to be predicted. The thesis has three parts regarding these time periods: We discover news chains using zigzagged search in the past, select front-page of current news for public, and predict future public reactions to events. In the first part, given an input document, we develop a framework for discovering story chains in a text collection. A story chain is a set of related news articles that reveal how different events are connected. The framework has three complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text-mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct two user studies in terms of four effectiveness measures—relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method. In the second part, we select news articles for public front pages using raw text, without any meta-attributes such as click counts. Front-page news selection is the task of finding important news articles in news aggregators. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then, we select important documents from important topics using a priority-based method that helps in fitting news content into the length of the front page. A user study is conducted to measure effectiveness and diversity. Annotation results show that up to 7 of 10 news articles are important, and up to 9 of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research. In the last part, we aim to predict future public reactions to news events by exploiting related microblog texts, specifically tweets. Microblog environments like Twitter are increasingly becoming more important to leverage people’s opinion on news events. We define public reactions in terms of their dimension and direction. Our system collects and preprocesses tweets, creates an inverted index to search tweets efficiently, filters them with various methods according to news events; and then uses temporal, spatial and textual features to model predictive classifiers. We create a public-reaction dataset, called BilPredict-2017 that includes 80 events including terrorist attacks in Turkey from 2015 to 2017. We plan to model ensemble classifiers, and evaluate the success of our system on BilPredict-2017.
DATE: 20 September 2017, Wednesday @ 15:40
PLACE: EA-409