Loading data
In Rattle, you have to explicitly declare the role of each variable. A variable can have five different roles:
- Input: The prediction process will use input variables to predict the value of the target variable.
- Target: The target variable is the output of our model.
- Risk: The risk variable is a measure of the target variable.
- Ident or Identifier: An identifier is a variable that identifies a unique occurrence of an object. In our preceding example, the variable Person is an identifier that identifies a unique person.
- Ignore: A variable marked Ignore will be ignored by the model. We'll come back to this role later-some variables can create noise and decrease the performance of your predictive model.
Rattle can load data from many data sources. Here are some options:
- Use the Spreadsheet option to load data from a Comma Separated Value (CSV) file.
- Open Database Connectivity (ODBC) is a standard to define database connectivity. Using this standard, you can load from most common databases...