Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Mastering Julia

You're reading from   Mastering Julia Enhance your analytical and programming skills for data modeling and processing with Julia

Arrow left icon
Product type Paperback
Published in Jan 2024
Publisher Packt
ISBN-13 9781805129790
Length 506 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Malcolm Sherrington Malcolm Sherrington
Author Profile Icon Malcolm Sherrington
Malcolm Sherrington
Arrow right icon
View More author details
Toc

Table of Contents (14) Chapters Close

Preface 1. Chapter 1: The Julia Environment 2. Chapter 2: Developing in Julia FREE CHAPTER 3. Chapter 3: The Julia Type System 4. Chapter 4: The Three Ms 5. Chapter 5: Interoperability 6. Chapter 6: Working with Data 7. Chapter 7: Scientific Programming 8. Chapter 8: Visualization 9. Chapter 9: Database Access 10. Chapter 10: Networks and Multitasking 11. Chapter 11: Julia’s Back Pages 12. Index 13. Other Books You May Enjoy

Data arrays and data frames

Users of R will be aware of the success of data frames when employed in analyzing datasets, a success that has been mirrored by Python with the pandas package.

Julia too adds data frame support through the use of a DataFrames package.

The package extends Julia’s base by introducing three basic types, as follows:

  • Missing.missing: An indicator that a data value is missing
  • DataArray: An extension to the Array type that can contain missing values
  • DataFrame: A data structure for representing tabular datasets

It is such a large topic that we will be looking at data frames in some depth when we consider statistical computing.

However, here’s some code to get a flavor of processing data with these packages:

julia> using DataFrames
julia> df1 = DataFrame(ID = 1:4,
                       Cost = [10.1,7.9,missing,4.5])
4 ×2 DataFrame
│ Row │ ID │ Cost    │
├─────┼────┼─────────┤
│  1  │  1 │ 10.1    │
│  2  │  2 │ 7.9     │
│  3  │  3 │ missing │
│  4  │  4 │ 4.5     │

Common operations include computing mean(d) or var(d) of the Cost because of the missing value in row 3:

julia> using Statistics
julia> mean(!, df1[:Cost])
missing

We can create a new data frame by dropping ALL rows with missing values, and now statistical functions can be applied as normal:

julia> df2 = dropmissing(df1). << This might have changed ??? >>>
3 ×2 DataFrames.DataFrame
│ Row │ ID │ Cost │
├─────┼────┼──────┤
│  1  │  1 │ 10.1 │
│  2  │  2 │ 7.9  │
│  3  │  4 │ 4.5  │
julia> (μ,σ) = (mean(df2[!,:Cost]),std(df2[!,:Cost]))
(7.5, 2.8213471959331766)

We will cover data frames in much greater detail when we consider data I/O in Chapter 6.

At this time, we will look at the Tables API, implemented in the Tables.jl file, which is used by a large number of packages.

You have been reading a chapter from
Mastering Julia - Second Edition
Published in: Jan 2024
Publisher: Packt
ISBN-13: 9781805129790
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image