Data preprocessing
Data preprocessing is one of the most important parts of an analytics or a data science pipeline. It involves methods and techniques to sanitize the data being used, quick hacks for making the dataset easy to handle, and the elimination of unnecessary data to make it lightweight and efficient when used in the analytics process. For this recipe, we will use the MLBase
package of Julia, which is known as the Swiss Army Knife of writing machine learning code. Installation and setup instructions for the library will be explained in the Getting ready section.
Getting ready
- To get started with this recipe, you have to add theÂ
MLBase
Julia package, which can be done by running theÂPkg.add()
function in the REPL. It can be done as follows:Pkg.add("MLBase")
- After installing the package, it can be imported using theÂ
using ...
command in the REPL. It can be done as follows:using MLBase
After importing the package following the preceding steps...