The source text data could come in a portable document format (.pdf). Scientific research papers usually comes in PDF format. If you want to perform text mining, then you need to import the text from the PDF file into the R environment before doing any processing. In this recipe, you will import text data from a PDF file.
Importing plain text data from a PDF file
Getting ready
To implement this recipe, you will need to install the pdftools library.
To install the required library, run the following code:
install.packages("pdftools")
The source data for this recipe is given in the following three different PDF files containing three abstracts. The filenames are as follows:
- abstract_1.pdf
- abstract_2.pdf
- abstract_3...