Using columnar databases for improved performance
Most relational databases use a row-based data storage architecture—the data is stored in the database row by row. Whenever the database performs a query, it retrieves the relevant rows for the query before processing the query. This architecture is well suited for business transactional uses, where complete records (that is, including all columns) are written, read, updated, or deleted, a few rows at a time. For most statistical or analytical use cases, however, many rows of data, often with only a few columns, need to be read. As a result, row-based databases are sometimes inefficient at analytical tasks because they read entire records at a time regardless of how many columns are actually needed for analysis. The following figure depicts how a row-based database might compute the sum of one column.
The increase in demand for data analysis platforms in recent years has led to the development...