The dataset provided with this assignment is called `CMU-MisCOV19' and it comes from a
research project at the Centre for Machine Learning and Health at Carnegie Mellon University. The work was presented under the following title at CIKM 2020, `Characterizing COVID-19 Misinformation Communities Using a Novel Twitter Dataset'. Part of the abstract from this paper is presented as follows to build your understanding as to why was this data collected and annotated.
From conspiracy theories to fake cures and fake treatments, COVID-19 has become a hotbed for the spread of misinformation online. It is more important than ever to identify methods to debunk and correct false information online. In this paper, we present a methodology and analyses to characterize the two competing COVID-19 misinformation communities online: (i) misinformed users or users who are actively posting misinformation, and (ii) informed users or users who are actively spreading true information, or calling out misinformation. The goals of this study are twofold: (i) collecting a diverse set of annotated COVID-19 Twitter dataset that can be used by the research community to conduct meaningful analysis; and (ii) characterizing the two target communities in terms of their network structure, linguistic patterns, and their membership in other communities."
To create this dataset, the authors used a diverse set of keywords to infer tweets through
Twitter search API. For the annotation process, 17 categories were identified, and the tweets were annotated manually. A codebook on annotations and categories, created by the authors, has been provided. Please refer to this codebook to familiarize yourself with the categories.
The list of categories that these tweets have been categorised/annotated as is provided
below:
1. Irrelevant
2. Conspiracy
3. True Treatment
4. True Prevention
5. Fake Cure
6. Fake Treatment
7. False Fact or Prevention
8. Correction/Calling out
9. Sarcasm/Satire
10. True Public Health Response
11. False Public Health Response
12. Politics
13. Ambiguous/Difficult to classify
14. Commercial Activity or Promotion
15. Emergency Response
16. News
17. Panic Buying
The study mentions that 4573 tweets were annotated, and the annotations were made
publicly available. However, at the time of data extraction for this assignment, some of
these tweets or their authors’ accounts had either been suspended or taken down by
Twitter, or the privacy settings had changed. Therefore, the number of tweets provided for this assignment is slightly less than those annotated in the study.
The aim is to analyze the tweets’ and provide insights into the general trends and patterns of tweets by annotation. To achieve this a series of specific tasks have been outlined.
Task A – Text Mining (25%)
Over the last four weeks you have seen a range of text pre-processing techniques. You are required to utilise these techniques to clean the textual data, extract knowledge and
produce informative visualisations.
Task B – Sentiment Analysis (20%)
Sentiment analysis is “the process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral.”
You are required to identify comparisons between annotations and investigate the
‘sentiment’ in the provided textual data. You will need to utilise sentiment analysis
techniques covered during lecturers to extract the required information.
Task C – Topic Modelling (15%)
You are required to cluster different word groups and expressions from the tweets that best characterise the information, you will need to uncover hidden trends within the text by annotation. To preform this task you will need to utilise the topic modelling techniques
considered within lecturers.
Task D – Further exploration (5%)
You are required to utilise any further techniques shown to you in lectures or from your
own research in order to draw meaningful insight from the text.
Presentation of Code (10%)
You are required to submit your code in a programming notebook ( R Markdown Report).
You will need to submit you .rmd along with a html or pfd version. Marks available for
students who do the following:
• Break their code into small, meaningful chunks and functions
• Declare all variables using appropriate naming convention
• Comment code in an appropriate, useful manner
• Create a presentable, professional easy to follow R Markdown Report outlining the
analysis for preformed for each task.
Task E - Demonstration (25%)
You are required to deliver a 10-minute demonstration of your code summarising your main findings of each task. This demonstration will take place following the submission deadline. The Lecturer who will interact and ask questions to gauge understanding. At this point, you may be asked to provide information or explain parts of your code. The demonstration is used to test that you a) understand your code and b) can explain the algorithms utilised.
How can you contact us for assignment Help.
Via Email: you can directly send your complete requirement files at email id contact@codersarts.com and our email team follow up there for complete discussion like deadline , budget, programming , payment details and expert meet if needed.
Website live chat: Chat with our live chat assistance for your basis queries and doubts for more information.
Contact Form: Fill up the contact form with complete details and we'll review and get back to you via email
Codersarts Dashboard: Register at Codersarts Dashboard , and track order progress
Comments