“Spatial, Temporal, and Content Analysis of Twitter for Wildfire Hazards”

In a study by Wang et al. 2016, social media data are used to reveal the local of twitter activity before, during, and after California wildfire events. With Twitter and other social media networks becoming more available as a means for looking at peoples’ behavioral patterns, this was an important study for understanding the ways that “space and time are strongly related to situation awareness in emergency situations” (Wang et al. 2016).

The research team starts with a “first phase” that includes identifying tweets that have keywords in them: “wild” and “fire”. These words are then useful for generating a holistic overview of where people were tweeting about the fires before, during, and after the disaster. The “second phase” then focused more specifically on twitter activity regarding specific fires.

The major techniques used in this paper are kernel density estimation (KDE), text mining, and a social network analysis. Beginning with the KDE, this step is performed to find the most concentrated “hot spots” of twitter activity about the fires. The points of twitter activity were turned into a raster map, and adjusted for population density estimation. For the content analysis, text mining is conducted by using the “tm“ package for R studio. URLs were removed, worlds were converted to base forms, and “meaningless” words were removed. After text mining, the social network analysis is used to consider the locations of retweets. The social network analysis was also conducted using an R package called “igraph.”

Results were conveyed in a series of visuals, charts, and graphs. Term frequency was visualized in a bar chart while the retweet network was in a line graph. The dual kernel density estimations were visualized in a heat map, and the spatial distribution of the geotagged tweets from phase one were visualized in a dot density map. Lastly, figure 10 displayed the spatial distribution of the retweet network.

This analysis could be both reproducible or replicable. For a reproduction–which would work to use the same data and techniques to produce the same outputs as Wang et al.– we have access through Twitter developer accounts to the data that was used as inputs, and we have access to the R studio tools that were used during the procedure. The article does become a little vague when considering what the Kernel Density parameters were, and what changes were made during the text mining stage. Additionally, it is unclear how many of the tweets were used in the analysis, and what sources of uncertainty or changes they may have accounted for during the procedure that were not explicitly mentioned in the paper. What is the 1% sample limitation, and how does this uncertainty potentially limit the validity of the overall results?

A replication of this study could be completed by using new data and a different study context to test the applicability of the original findings and methods. The uncertainty in KDE parameters, text mining, and the 1% sample limitation would still be limiting to a true replication. The first steps in the methodology–which identify keywords–is a useful starting point for research that works to replicate a similar process in a different area or study field. Replications using API data from social media networks will likely become increasingly popular as a means for understanding human interaction with disaster and large scale events, as it can help report mass trends of reaction in a time and spatial sensitive manner.

Readings for the Week: Wang, Z., X. Ye, and M. H. Tsou. 2016. Spatial, temporal, and content analysis of Twitter for wildfire hazards. Natural Hazards 83 (1):523–540.

Main Page