changing-times.github.io - The Times, They Are A-Changin'

Example domain paragraphs

The focus of this work is to study the extent to which post-publication changes happen to news articles. We collect articles over a period of 9 months from news publishers of varying popularity and political biases and show that 165k out of 600k articles exhibit some post-publication changes. We also leverage Natural Language Processing to measure the semantics of these changes, such as whether a change alters the meaning of the paragraph it occurs in - which, in 22% of cases, it does.

Parsers: You can find the Python code used to parse the article HTML from the various crawled articles here . Mapping of changes to categories: You can download the aforementioned CSVs that we created from this folder . R Notebooks used for analysis and graphs: Finally, a collection of notebooks we used to arrive at various metrics and create the graphs present in the paper can be found here .

This work was done at Stony Brook University, at PragSec Lab . For any queries or questions, contact Chris at: [email protected]