The Department of Justice (DOJ) Office of the Inspector General (OIG) conducted a comprehensive review of the FBI’s Crossfire Hurricane investigation, which looked into the possible coordination between the Donald J. Trump for President Campaign and the Russian government’s interference in the 2016 U.S. presidential election. This investigation involved examining a range of actions, relationships, and procedures, including the decision-making process behind opening the investigation, the FBI’s relationship with key individuals like Christopher Steele, FISA surveillance applications, and the overall adherence to Department and FBI policies.
The primary objective of this use case is to provide a data-driven analysis and visualization of the Crossfire Hurricane investigation report to enhance understanding, transparency, and accountability. By using advanced text analysis and word cloud visualization techniques, the aim is to identify key themes, terms, and patterns within the extensive documentation of the OIG review. This will aid in summarizing the vast amount of information, highlighting the most frequent and pivotal elements of the investigation, and providing a digestible format for stakeholders.
The proposed solution involves utilizing the aforementioned R script to process, analyze, and visualize the content of the OIG’s report. The process includes:
Text Extraction and Preprocessing: Using pdftools, the text of the report will be extracted from the PDF file. This text will undergo preprocessing, including stemming and stopword removal, to refine the data for analysis.
Tokenization and Frequency Analysis: The text will be tokenized and a document-feature matrix will be created. This matrix will facilitate the analysis of term frequencies, helping to identify the most commonly used words and phrases within the report.
Data Visualization with Word Cloud: Utilizing the Wordcloud2 library, a word cloud will be generated. This visual representation will instantly convey the most prominent themes and terms in the report, based on their frequency. The word cloud’s color and size customization will enhance the visual appeal and clarity.
Frequency Statistics Analysis: Besides the word cloud, the script also provides detailed frequency statistics of terms, offering deeper insights into the specific language and terminology used in the report.
# Load required libraries
library(pdftools)
library(tm)
library(quanteda)
library(wordcloud2) # Load Wordcloud2 library
library(quanteda.textstats) # Load the textstats package for frequency statistics
#Data source
pdf <- pdf_text("https://www.justice.gov/storage/120919-examination.pdf")
head(pdf)
## [1] " REDACTED FOR PUBLIC RELEASE\n\n\n\n\n Office of the Inspector General\n U.S. Department of Justice\n OVERSIGHT INTEGRITY GUIDANCE\n\n\n\n\n Review of Four FISA Applications and\n Other Aspects of the FBI’s Crossfire\n Hurricane Investigation\n\n\n\n\nOversight and Review Division 20-012 December 2019 (Revised)\n\n\n\n REDACTED FOR PUBLIC RELEASE\n"
## [2] " NOTICE\n\n\nThis report was originally issued on December 9, 2019. The report was updated on\nDecember 11 and December 20, 2019, with the following changes (page references are to\nthe public version of the report):\n\n <U+F0B7> On pages iv, xvi, 400, and 407, we changed the phrase “before and after” to “both\n during and after the time.” In all instances, the phrase appears in connection to the\n time period during which we found that the Crossfire Hurricane team used\n Confidential Human Sources (CHSs) to interact and consensually record\n conversations with Page and Papadopoulos. The corrected information appearing in\n this updated report reflects the accurate information concerning these time periods\n that previously appeared, and still appears, on pages 305 and 313 (e.g., the\n statement on page 305 that “the Crossfire Hurricane team tasked CHSs to interact\n with Page and Papadopoulos both during the time Page and Papadopoulos were\n advisors to the Trump campaign, and after Page and Papadopoulos were no longer\n affiliated with the Trump campaign”).\n\n <U+F0B7> On pages ix, 164, 165, 214, and 364 we removed redactions of certain information\n related to Person 1. We also removed redactions throughout the report related to\n the dates the Carter Page FISA applications were filed and the dates FISA authority\n expired for each application. These changes to previously-redacted text were made\n in response to subsequent decisions made by the Department of Justice and the FBI\n about the classification of the underlying information. See page 14, footnote 24.\n\n <U+F0B7> On pages xi, 242, 368, and 370, we changed the phrase “had no discussion” to “did\n not recall any discussion or mention.” On page 242, we also changed the phrase\n “made no mention at all of” to “did not recall any discussion or mention of.” On page\n 370, we also changed the word “assertion” to “statement,” and the words “and\n Person 1 had no discussion at all regarding WikiLeaks directly contradicted” to “did\n not recall any discussion or mention of WikiLeaks during the telephone call was\n inconsistent with.” In all instances, this phrase appears in connection with\n statements that Steele’s Primary Sub-source made to the FBI during a January 2017\n interview about information he provided to Steele that appeared in Steele’s election\n reports. The corrected information appearing in this updated report reflects the\n accurate characterization of the Primary Sub-source’s account to the FBI that\n previously appeared, and still appears, on page 191, stating that “[the Primary Sub-\n Source] did not recall any discussion or mention of Wiki[L]eaks.”\n\n <U+F0B7> On page 57, we added the specific provision of the United States Code where the\n Foreign Agents Registration Act (FARA) is codified, and revised a footnote in order to\n reference prior OIG work examining the Department’s enforcement and\n administration of FARA.\n\n <U+F0B7> On page 413, we changed the word, “three” to “second and third.” The corrected\n information appearing in this updated report reflects the accurate description of the\n Carter Page FISA applications that did not contain the information the FBI obtained\n from Steele’s Primary Sub-source in January 2017 that raised significant questions\n about the reliability of the Steele reporting. This information previously appeared,\n and still appears, accurately on pages xi, xiii, 368, and 372.\n"
## [3] "[PAGE INTENTIONALLY LEFT BLANK]\n"
## [4] " Executive S u m ma ry\n Revie w of Four FISA Applications and Other Aspects of the FBI's Crossfire\n Hurricane In vestigation\n\n\n\n\nBackg ro u nd OIG Methodo logy\n\n The Depa rtment of J u stice ( Depa rtm e n t ) Offi ce T h e O I G exa m i ned m o re t h a n o n e m i l l i o n\nof the I n spector Gen e ra l ( OIG) u n d e rtoo k th is revi ew to d o c u m ents t h a t w e re i n the D e p a rt m e nt's a n d F B I 's\nexa m i n e certa i n act i o n s by the Federa l B u re a u of possession a n d co n d u cted ove r 1 70 i nte rv i ews i nvolv i n g\nInvestigati o n ( F B I ) a nd the De pa rt m e n t d u ri n g a n FBI m o re t h a n 1 0 0 w itnesses . T h ese w i t n esses i n c l u ded\ni nvesti gati o n o pened on J u l y 3 1 , 20 1 6, k n own as fo r m e r FBI D i rect o r Co rn e y , fo rm e r Atto rn ey G e n e ra l\n\"Crossfi re H u rrica n e, \" i nto whether i n d iv i d u a l s ( A G ) Lo retta Lyn c h , fo rm e r Deputy Attorney G e n e ra l\nassoci ated with t h e D o n a l d J . Tru m p fo r Presid e nt ( DA G ) S a l l y Yates, fo r m e r D A G R o d Rosenste i n, fo rm e r\nCa m pa i g n were coo rd i nati n g , witti ng l y o r u nwitti n g ly, Acti n g A G a nd Act i n g DAG a n d cu rre nt F B I G e n e ra l\nwith the Russ i a n govern m e nt's efforts to i nte rfe re in the Cou n s e l D a n a Boe n te , fo rm e r F B I D e p uty D i rector\n20 1 6 U . S . presi d e nti a l elect i o n . O u r rev i e w i n c l u d ed A n d rew M cCa be, fo rm e r FBI Genera l Cou n s e l J a m e s\nexa m i n i ng : B a k e r, a nd D e p a rt m e n t a tto r n ey B ruce O h r a n d h i s\n w ife . T h e O I G a l so i n te rv i ewed Ch risto p h e r Stee l e a nd\n • The decision to o pen Crossfi re H u rri ca n e a nd fo u r cu rre nt a n d form e r e m p l oyees of oth e r U . S .\n i n d iv i d u a l cases on cu rrent a n d fo rm e r m e m bers gove rn ment agenc i e s . Two w i t n ess e s , G l e n n Si m pson\n of the Tru m p ca m pa i g n , G e o rg e Pa pado pou los, a nd J on a t h a n W i n e r ( a fo r m e r D e p a rt m e n t of State\n Carter Pag e , Pa u l Ma n afort, a nd M i chael Flyn n ; offici a l ) , d ecl i n ed o u r req u ests for vo l u nta ry i nte rv i e w s ,\n the ea rl y i n vesti g ative steps taken ; a n d whet h e r a n d we w e re u n a b l e to co m pe l t h e i r testi m o n y .\n t h e open i n g s a n d ea rly steps com p l i e d w i t h\n Departm ent a n d F B I pol icies ; We were g iven b ro a d a ccess t o rel e v a nt\n m a te r i a ls by t h e D e p a rt m e n t a n d the FBI . I n a d d iti o n ,\n • The FBI's re l ati onsh i p w i t h Ch risto p h e r Ste e l e ,\n we rev i ewed re lev a n t i nform ati o n t h a t oth e r U . S .\n w h o m t h e F B I co n s i d e red t o b e a co nfidential\n govern m ent a g e nci es p rovi d e d t h e FBI i n t h e co u rse o f\n human sou rce (CH S ) ; its re ce i pt, use, a nd\n t h e Crossfi re H u rr i c a n e i n vesti g a t i o n . H oweve r,\n eva l u ati o n of e l ect i o n reports fro m Stee l e ; a n d its\n bec a u se the activities of oth e r a g e n ci e s a re o utsi d e o u r\n deci s i o n to close Steele as a n FBI CH S ;\n j u r i s d i ctio n , we d i d n o t s e e k t o o bta i n records from\n • Fou r FBI a p p l i cati o n s fi l e d with the Foreig n t h e m t h a t the FBI n e v e r rece i v e d o r revi ewed , exce pt\n Inte l l i ge nce S u rve i l la nce Co u rt ( FISC) i n 20 1 6 a n d fo r a l i m ited a m o u nt of State D e p a rt m e n t reco rd s\n 20 1 7 to co n d u ct Foreig n I ntel l i g e nce S u rv e i l l a nce relati ng t o Steel e ; w e a l so d i d n ot s e e k t o a s sess a n y\n Act ( FISA) s u rve i l l a nce targ eti ng Ca rte r P a g e ; a n d act i o n s o t h e r a g e nc i e s m ay have taken . A d d i ti o n a l l y ,\n w heth e r these a p p l icati o n s co m p l ied with o u r rev i e w d i d n ot i n d ep e n d entl y s e e k to dete rm i ne\n Depa rt m e n t a n d FBI po l i cies a n d satisfi ed the wheth e r co rro borati o n e x i sted for the Ste e l e election\n gove rn ment's o b l i gati o n s to the FISC ; re p o rt i n g ; rat h e r, ou r revi ew was focused on\n i nfo r m a t i o n that w a s ava i l a b l e to the FBI co n c e r n i n g\n • The i nte racti o n s of D e pa rt m e n t a ttorney Bru ce Stee l e ' s re ports p r i o r t o a n d d u ri n g t h e p e n d e n cy of t h e\n Oh r with Steele, t h e FBI, G l e n n S i m ps o n of F u s i o n Ca rte r P a g e F I S A a uth o rity .\n G PS, a nd t h e State D e p a rtm e n t ; w h et h e r w o rk\n O h r 's s pouse pe rfo rmed fo r Fu s i o n G P S i m pl icated O u r ro l e i n th i s rev i ew was not to seco n d -g u ess\n eth ica l ru les a p p l ica b l e to O h r ; and O h r's d i scretio n a ry j u d g ments by D e p a rt m e n t perso n n e l\n i nteract i o n s with Depa rtm e n t attorneys rega rd i n g a b o u t w h et h e r to o p e n a n i n vesti g a t i o n , or s p ecific\n the M a nafort cri m i na l case ; a n d j u d g ment ca l ls m a d e d u ri n g the cou rse of a n\n • The F B I 's use of U ndercov e r E m p l oyees ( U CEs) i nvesti gati o n , w h e re those d ecis i o n s co m p l i ed w i t h o r\n and CHSs oth e r tha n Steele in th e Crossfi re w e re a u t h o rized by Depa rtm ent ru l es, po l i cies, o r\n H u rrica ne i n vestigation ; w h e t h e r the FBI p l a ced p roced u re s . We d o n o t criti c i z e pa rti cu l a r d e c i s i o n s\n a n y C H S s with i n the Tru m p ca m pa i g n o r tasked m e re l y beca u s e we m i g ht h a v e reco m m e n d ed a\n a n y C H S s to re po rt on the Tru m p ca m pa i g n ; d i ffe re nt i nvest i g at i v e strateg y o r ta ct i c b a sed o n the\n whethe r the use o f C H S s a n d UCEs co m p l ied with fa cts l e a rned d u r i n g o u r i n vesti g a ti o n . T h e q u esti o n we\n D e pa rt m e n t and FBI po l i cies ; and the atte nd a n ce consid e red was not wheth e r a p a rt i c u l a r i n vesti g a tive\n of a Crossfi re H u rrica ne s u pe rv i s o ry agent at d e ci s i o n was i d e a l or cou l d have been h a n d l e d m o re\n cou nteri ntel l i g ence b ri efi ng s g i v e n to t h e 20 1 6 effectively, b u t rat h e r wh et h e r the D e p a rt m ent a n d the\n p res i d e nti a l ca nd i d ates a n d certa i n ca m pa i g n FBI com pl i ed with a p p l i c a b l e l e g a l req u i rem e n ts,\n advisors. p o l i c i es, and p roce d u res in ta k i n g the act i o n s we\n rev i e w e d o r, a lte rnatively, w heth e r the ci rcu msta n ces\n su rrou nd i n g the d e c i s i o n i n d i ca ted that it w a s ba sed on\n"
## [5] " Executive Summary\n Review of Four FISA Applications and Other Aspects of the FBI's Crossfire\n Hurricane Investigation\n\n\n\n\ninaccurate or incomplete information, or considerations analysis, the Crossfire Hurricane team opened individual\nother than the merits of the investigation. If the cases in Aug ust 2016 on four U.S. persons-\nexplanations we were given for a particular decision Papadopoulos, Carter Page, Paul Manafort, and Michael\nwere consistent w ith legal requirements, policies, Flynn-all of whom were affiliated with the Trump\nprocedures, and not unreasonable, we did not conclude campaign at the time the cases were opened.\nthat the decision was based on improper considerations\nin the absence of documentary or testimonial evidence As detailed in Chapter Two, the Attorney\nto the contrary. General's Guidelines for Domestic Operations (AG\n Guidelines) and the FBI's Domestic Investigations\nThe Opening of Crossfire Hurricane and Operations Guide (DIOG) both require that FBI\nFour Related Investigations, and Early investigations be undertaken for an \"authorized\n purpose\"-that is, \"to detect, obtain information about,\nInvestigative Steps or prevent or protect against federal crimes or threats\n to the national security or to collect fo reign\nThe Opening of Crossfire Hurricane and Four Individual intelligence. \" Additionally, both the AG Guidelines and\nCases t he DIOG permit the FBI to conduct an investigation,\n even if it might impact First Amendment or other\n As we describe in Chapter Three, the FBI constitut ionally protected activity, so long as there is\nopened Crossfire Hurricane on Ju ly 31, 2016, just days some legitimate law enforcement purpose associated\nafter its receipt of information from a Friendly Foreign with the investigation.\nGovernment (FFG) reporting that, in May 2016, during\na meeting with the FFG, then Trump campaign foreign In addition to requiring an authorized purpose,\npolicy advisor George Papadopoulos \"suggested the FBI investigations must have adequate factual\nTrump team had received some kind of suggestion from predication before being initiated. The predication\nRussia that it could assist this process with the requirement is not a legal requirement but rat her a\nanonymous re lease of information during the campaign prudential one imposed by Department and FBI policy.\nthat woul d be damaging to Mrs. Clinton (and President The DIOG provides for two types of investigations,\nObama).\" The FBI Electronic Commu nication (EC) Preliminary Investigations and Full Investigations. A\nopening the Crossfire Hurricane investigation stated Preliminary I nvestigation may be opened based upon\nthat, based on the FFG information, \"this investigation \"any allegation or information\" indicative of possible\nis being opened to determine whether individual (s) criminal activity or t hreats to the national security. A\nassociated with the Trump campa ign are witting of Full Investigation may be opened based upon an\nand/or coordinating activities w ith the Government of \"articulable factual basis\" that \"reasonably ind icates\"\nRussia.\" We did not find information in FBI or any one of three defined circumstances exists,\nDepartment ECs, emails, or other documents, or including:\nthrough witness testimony, indicating that any\ninformation other than the FFG information was relied An activity consti t uting a federal crime\nupon to predicate the opening of the Crossfire Hurricane or a threat to the national security has\ninvestigation. Alt hough not m entioned in the EC, at t he or may have occurred, is or may be\ntime, FBI officials involved in opening the investigation occurring, or wil l or may occur and the\nhad reason to believe t hat Russia may have been investigation may obtain information\nconnected to the Wikileaks disclosures that occurred relating to the activit y or the\nearl ier in July 2016, and were aware of information involvement or role of an individual,\nregarding Russia's efforts to interfere with the 2016\n group, or organ ization in such activity.\nU.S. elections. These officia ls, though, did not become\naware of Steele's election reporting until weeks later In Ful l Investigations such as Crossfire\nand we therefore determined that Steele's reports Hurricane, al l lawful investigat ive methods are allowed.\nplayed no role in the Crossfire Hurricane opening. In Preliminary Investigations, all lawful investigative\n methods (including the use of CHSs and UCEs) are\n The FBI assembled a Headquarters-based permitted except fo r mail opening, physical searches\ninvestigative team of special agents, analysts, and req uiring a search warrant, electronic surveillance\nsupervisory special agents (referred to throughout this req uiring a judicial order or warrant (Title III wiretap or\nreport as \"the Crossfire Hurricane team \" ) who a FISA order), or requests under Title VII of FISA. An\nconducted an initial analysis of links between Trump investigation opened as a Preliminary Investigation may\ncampaign members and Russia. Based upon this be converted subsequently to a Full Investigation if\n\n\n ii\n"
## [6] " Executive Summary\n Review of Four FISA Applications and Other Aspects of the FBI's Crossfire\n Hurricane Investigation\n\n\n\n\ninformation becomes available t hat meets the Add it ionally, given t he low threshold for\npredication standard. As we describe in the report, all predication in the AG Gu idelines and the DIOG, we\nof the investigative actions taken by the Crossfire concluded that the FFG informa t ion, provided by a\nHurricane team, from the date the case was opened on government the United Stat es Intelligence Community\nJuly 31 until October 21 (the date of the first FISA (USIC) deems trustworthy, and describing a first- hand\norder) would have been permitted whether the case account from an FFG employee of a conversation with\nwas opened as a Preliminary or Full Investigation. Papadopoulos, was sufficient to predicate the\n investigation. This information provided the FBI with an\n The AG Guidelines and the DIOG do not provide articulable factua l basis that, if t rue, reasonably\nheightened predication standards for sensitive matters, indicated activity const ituting either a federa l crime or a\nor allegations potentially impacting constitutionally threat to national security, or both, may have occurred\nprotected activity, such as First Amendment rights. or may be occurring. For similar reasons, as we detail\nRather, the approval and notification requi rements in Chapter Three, we concluded that the quantum of\ncontained in the AG Guidelines and the DIOG are, in information articu lated by t he FBI to open the individual\npart, intended to provide the means by which such investigations on Papadopou los, Page, Flynn, and\nconcerns can be considered by senio r officials. Manafort in August 2016 was sufficient to satisfy t he\nHowever, we were concerned to find that neither the AG low threshold established by the Department and the\nGuidelines nor the DIOG contain a provision requiring FBI.\nDepartment consultation before opening an\ninvestigation such as the one here involving the alleged As part of ou r review, we also sought to\nconduct of individuals associated with a major party determine whether there was evidence that political\npresidential campaign. bias or other improper considerations affected decision\n making in Crossfire Hurricane, including t he decision to\n Crossfire Hurricane was opened as a Full open the investigation. We discussed the issue of\nInvestigation and all of the senior FBI officials who political bias in a prior OIG report, Review of Various\nparticipated in discussions about whether to open a Actions in Advance of the 2016 Election, where we\ncase told us the information warranted opening it. For described text and instant messages between t hen\nexample, then Counterintelligence Division (CD) Special Counsel to the Deputy Director Lisa Page and\nAssistant Director (AD) E.W. \"Bill\" Priestap, who then Section Chief Peter Strzok, among others, that\napproved the case opening, told us that t he included statements of hosti lity toward then candidate\ncombination of the FFG information and the FBI 's Trump and statements of support for then candidate\nongoing cyber intrusion investigation of the July 2016 Hillary Clinton. In this review, we found t hat, while Lisa\nhacks of the Democratic National Committee's (DNC) Page attended some of the discussions regard ing the\nemails, created a counterintelligence concern that the opening of the investigat ions, she did not play a role in\nFBI was \"obligated\" to investigate. Priestap stated that the decision to open Crossfire Hurricane or the four\nhe considered whether the FBI should conduct individual cases. We further found t hat w hile Strzok\ndefensive briefings for the Trump campaig n but was directly involved in the decisions to open Crossfire\nultimately decided that providing such briefings created Hurricane and t he four individual cases, he was not the\nthe risk that \"if someone on the campaign was engaged sole, or even the highest-level, decision maker as to\nwith the Russians, he/she would very likely change any of those matters. As noted above, then CD AD\nhis/her tactics and/or otherwise seek to cover-up Priestap, Strzok's supervisor, was the officia l who\nhis/ her activities, thereby preventing us from finding ultimately made the decision to open the investigation,\nt he truth.\" We did not identify any Department or FBI and evidence reflected t hat t his decision by Priestap\npolicy that applied to this decision and therefore was reached by consensus after multiple days of\ndetermined that the decision was a judgment call that discussions and meetings t hat included Strzok and\nDepartment and FBI policy leaves to the discretion of other leadership in CD, the FBI Deputy Director, the FBI\nFBI officials. We also concluded that, under the AG General Counsel, and a FBI Deputy General Cou nsel.\nGuidelines and the DIOG, the FBI had an authorized We concluded that Priestap's exercise of discretion in\npurpose when it opened Crossfire Hurricane to obtain opening the investigation was in compliance with\ninformation about, or protect against, a national Department and FBI policies, and we did not find\nsecurity threat or federa l crime, even though the documentary or testimonial evidence that political bias\ninvestigation also had the potential to impact or improper motivation influenced his decision. We\nconstitutionally protected activity. similarly found that, while the forma l documentation\n opening each of th e four individua l investigations was\n approved by Strzok (as required by the DIOG), the\n\n\n iii\n"
This code is taking the text extracted from a PDF file and structuring it into a corpus, which is a necessary step in many text analysis workflows. The corpus then serves as the foundation for further text processing and analysis operations.
# Create a corpus from the pdf text
corp <- corpus(pdf)
This code snippet is performing the tokenization of the text data contained within a corpus, pdf file above, and then displaying the resulting tokens. This is a foundational step in many natural language processing (NLP) and text analysis tasks, as it transforms raw text into a structured form that algorithms and analytical methods can process more effectively.
# Create a tokens object from the corpus
tokens <- tokens(corp)
tokens
## Tokens consisting of 478 documents.
## text1 :
## [1] "REDACTED" "FOR" "PUBLIC" "RELEASE" "Office"
## [6] "of" "the" "Inspector" "General" "U.S"
## [11] "." "Department"
## [ ... and 33 more ]
##
## text2 :
## [1] "NOTICE" "This" "report" "was" "originally"
## [6] "issued" "on" "December" "9" ","
## [11] "2019" "."
## [ ... and 608 more ]
##
## text3 :
## [1] "[" "PAGE" "INTENTIONALLY" "LEFT"
## [5] "BLANK" "]"
##
## text4 :
## [1] "Executive" "S" "u" "m" "ma"
## [6] "ry" "Revie" "w" "of" "Four"
## [11] "FISA" "Applications"
## [ ... and 2,407 more ]
##
## text5 :
## [1] "Executive" "Summary" "Review" "of" "Four"
## [6] "FISA" "Applications" "and" "Other" "Aspects"
## [11] "of" "the"
## [ ... and 869 more ]
##
## text6 :
## [1] "Executive" "Summary" "Review" "of" "Four"
## [6] "FISA" "Applications" "and" "Other" "Aspects"
## [11] "of" "the"
## [ ... and 976 more ]
##
## [ reached max_ndoc ... 472 more documents ]
This code snippet takes the previously created tokens, applies stemming to consolidate words to their root forms, removes common stopwords to reduce noise in the data, and then displays the resulting processed tokens. These steps are essential in text preprocessing, setting the stage for more focused and efficient analysis in natural language processing tasks.
# Apply stemming and stopword removal
tokens <- tokens_wordstem(tokens)
tokens <- tokens_remove(tokens, stopwords("en"), padding = FALSE)
tokens
## Tokens consisting of 478 documents.
## text1 :
## [1] "REDACTED" "PUBLIC" "RELEASE" "Office" "Inspector" "Gener"
## [7] "U.S" "." "Depart" "Justic" "OVERSIGHT" "INTEGRITi"
## [ ... and 22 more ]
##
## text2 :
## [1] "NOTICE" "report" "origin" "issu" "Decemb" "9" "," "2019"
## [9] "." "report" "updat" "Decemb"
## [ ... and 396 more ]
##
## text3 :
## [1] "[" "PAGE" "INTENTIONALLi" "LEFT"
## [5] "BLANK" "]"
##
## text4 :
## [1] "Execut" "S" "u" "m" "ma" "ry"
## [7] "Revi" "w" "Four" "FISA" "Applicat" "Aspect"
## [ ... and 1,973 more ]
##
## text5 :
## [1] "Execut" "Summari" "Review" "Four" "FISA" "Applicat"
## [7] "Aspect" "FBI" "Crossfir" "Hurrican" "Investig" "inaccur"
## [ ... and 579 more ]
##
## text6 :
## [1] "Execut" "Summari" "Review" "Four" "FISA" "Applicat"
## [7] "Aspect" "FBI" "Crossfir" "Hurrican" "Investig" "inform"
## [ ... and 616 more ]
##
## [ reached max_ndoc ... 472 more documents ]
This code snippet processes the tokenized text data by retaining only the tokens that are words (i.e., sequences of letters) and removing everything else like punctuation marks and numeric characters. This is a common step in text processing, as it helps in focusing the analysis on the textual content, removing non-textual elements that might not be relevant for certain types of analysis like sentiment analysis, topic modeling, etc.
# Remove punctuation and numbers
tokens <- tokens_select(tokens, pattern = "\\p{L}+", valuetype = "regex", selection = "keep")
tokens
## Tokens consisting of 478 documents.
## text1 :
## [1] "REDACTED" "PUBLIC" "RELEASE" "Office" "Inspector" "Gener"
## [7] "U.S" "Depart" "Justic" "OVERSIGHT" "INTEGRITi" "GUIDANCE"
## [ ... and 17 more ]
##
## text2 :
## [1] "NOTICE" "report" "origin" "issu" "Decemb" "report" "updat" "Decemb"
## [9] "Decemb" "follow" "chang" "page"
## [ ... and 270 more ]
##
## text3 :
## [1] "PAGE" "INTENTIONALLi" "LEFT" "BLANK"
##
## text4 :
## [1] "Execut" "S" "u" "m" "ma" "ry"
## [7] "Revi" "w" "Four" "FISA" "Applicat" "Aspect"
## [ ... and 1,844 more ]
##
## text5 :
## [1] "Execut" "Summari" "Review" "Four" "FISA" "Applicat"
## [7] "Aspect" "FBI" "Crossfir" "Hurrican" "Investig" "inaccur"
## [ ... and 468 more ]
##
## text6 :
## [1] "Execut" "Summari" "Review" "Four" "FISA" "Applicat"
## [7] "Aspect" "FBI" "Crossfir" "Hurrican" "Investig" "inform"
## [ ... and 513 more ]
##
## [ reached max_ndoc ... 472 more documents ]
The provided R code snippet creates a Document-Feature Matrix (DFM) from the pre-processed tokens and then displays this matrix.
# Create a document-feature matrix
dtm <- dfm(tokens)
dtm
## Document-feature matrix of: 478 documents, 4,975 features (97.04% sparse) and 0 docvars.
## features
## docs redacted public release office inspector gener u.s depart justic
## text1 2 2 2 1 1 1 1 1 1
## text2 0 1 0 0 0 0 0 2 1
## text3 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 0 0 0
## text5 0 0 0 0 0 1 2 2 0
## text6 0 0 0 0 0 2 0 5 0
## features
## docs oversight
## text1 2
## text2 0
## text3 0
## text4 0
## text5 0
## text6 0
## [ reached max_ndoc ... 472 more documents, reached max_nfeat ... 4,965 more features ]
This R code snippet modifies an existing Document-Feature Matrix (DFM) by trimming it based on term frequency, then displays the updated matrix. It uses the dfm_trim() function to retain only those terms in the DFM that occur at least 100 times, thereby filtering out less frequent terms. The result is stored back in the dtm variable, and the last line outputs the trimmed DFM.
# Trim the document-feature matrix based on term frequency
dtm <- dfm_trim(dtm, min_termfreq = 100)
dtm
## Document-feature matrix of: 478 documents, 284 features (72.40% sparse) and 0 docvars.
## features
## docs public office gener u.s depart review four fisa applicat fbi
## text1 2 1 1 1 1 2 1 1 1 1
## text2 1 0 0 0 2 0 0 3 0 4
## text3 0 0 0 0 0 0 0 0 0 0
## text4 0 0 0 0 0 0 1 2 1 16
## text5 0 0 1 2 2 1 4 3 1 11
## text6 0 0 2 0 5 4 4 2 1 16
## [ reached max_ndoc ... 472 more documents, reached max_nfeat ... 274 more features ]
This R code snippet converts a Document-Feature Matrix (DFM) into a data frame that contains word frequency statistics, suitable for use with the Wordcloud2 package, and then displays this data frame. It uses the textstat_frequency() function to calculate the frequencies of terms in the DFM, stores the result in the word_freq variable, and then outputs the contents of word_freq.
# Convert the document-feature matrix to a data frame suitable for Wordcloud2
word_freq <- textstat_frequency(dtm)
word_freq
This R code snippet creates a data frame from the word frequency statistics, suitable for generating a word cloud using the Wordcloud2 package, and then displays this data frame. It extracts the ‘feature’ (words) and ‘frequency’ columns from the word_freq object, creates a new data frame word_freq_df with these columns, and outputs the contents of word_freq_df.
# Convert the document-feature matrix to a data frame suitable for Wordcloud2
word_freq_df <- data.frame(word = word_freq$feature, freq = word_freq$frequency)
word_freq_df
This R code snippet is intended to create a word cloud visualization
using the Wordcloud2 package. However, it seems to be incomplete. The
full function call for creating a word cloud should be something like
wordcloud2(word_freq_df)
, where word_freq_df
is a data frame containing word frequencies. The snippet as provided,
wordc
, is not a complete command.
# Create a word cloud using Wordcloud2
wordcloud2(word_freq_df, size = 0.5, color = 'random-dark')
This R code snippet calculates the frequency statistics of terms in a
Document-Feature Matrix (DFM) and then displays these statistics. It
uses the textstat_frequency()
function to compute the
frequencies, stores the results in the frequency_stats
variable, and outputs the contents of frequency_stats
.
# Get frequency statistics
frequency_stats <- textstat_frequency(dtm)
frequency_stats
Enhanced Understanding: The visualization and frequency analysis will make the complex and lengthy OIG report more accessible and comprehensible to a broader audience, including department officials, policymakers, journalists, and the general public.
Efficient Information Dissemination: The summarized visual format enables quick dissemination and easier digestion of the key findings and themes of the report, which is crucial in a fast-paced information environment.
Strategic Decision-Making: By clearly identifying the most frequent topics and concerns in the investigation, policymakers and department leaders can make more informed decisions regarding policy changes, future investigations, and resource allocation.
Transparency and Accountability: This approach promotes transparency by openly presenting the main elements of the investigation and supports accountability by highlighting the key areas of focus and concern.
Research and Educational Tool: The analysis can serve as a valuable resource for researchers and educators in fields like political science, law, and public administration, providing an analytical foundation for further study and discussion.