JuliaDB in Julia 1.0

Blog

JuliaDB in Julia 1.0

‹

›

Product Updates

JuliaDB in Julia 1.0

Product Updates

JuliaDB in Julia 1.0

Date Published

Feb 27, 2019

Contributors

Dr. Josh Day

Date Published

Feb 27, 2019

Contributors

Dr. Josh Day

In many data science applications, it is easy to run out of memory when working with data. An analyst working with big data has a few options available to them:

Buy more RAM.
Rent RAM with a cloud-based service.
Use a sample of the dataset.
Buy a SAS license.

None of these options are particularly good solutions.

Introducing JuliaDB

With JuliaDB, one can easily read big data, save it in an efficient binary format, and even run operations out-of-core. Analytics are available via OnlineStats integration, making statistical calculations on big data a breeze.

OnlineStats implements on-line (single-pass) algorithms for statistics and models, meaning you can run analyses like linear regression on data that is too big to fit in memory. Every statistic/model in OnlineStats also supports merging, enabling parallel processing. The combination of on-line updating/merging eliminates the need for the entire dataset to be loaded into RAM simultaneously, allowing analyses that would not be possible with traditional methods. Below is a visualization of how JuliaDB integrates with OnlineStats by scheduling the updating and merging operations:

Example

From Kaggle's Huge Stock Market Dataset, there are over 7000 CSVs with historical price data (each stock's history in a different file). JuliaDB can quickly load them into a distributed dataset and perform group-by operations:

using Distributed
addprocs(4)

@everywhere using JuliaDB, OnlineStats

# 7195 CSVs with 14,887,665 rows
files = glob("*.txt", "Stocks")

t = loadtable(files, filenamecol=:Stock)

groupreduce(Mean(), t, :Stock; select=:Volume

Main Features

Just-in-Time Compiled

JuliaDB leverages Julia’s just-in-time compiler (JIT) so that table operations – even custom ones – are fast.

Compute in Parallel

Process data in parallel or even calculate statistical models out-of-core through integration with OnlineStats.jl.

Store Any Data Type

JuliaDB supports Strings, Dates, Float64… and any other Julia data type, whether built-in or defined by you.

Fast User-Defined Functions

JuliaDB is written 100% in Julia. That means user-defined functions are JIT compiled.

Fast CSV Parser

CSVs are loaded extremely fast! Many files can be read at the same time to create a single table.

Open Source

JuliaDB is released under the MIT License.

JuliaDB for Time Series

The ability to index (sort) on any number of columns and store any data type makes JuliaDB ideal for time series analysis. For a big data time series example, see the demo here.

Feature	JuliaDB	Pandas	xts (R)	TimeArrays
Distributed Computing
Data larger than memory
Multiple Indexes
Index Type(s)	Any	Built-ins	Time	Time
Value Type(s)	Any	Built-ins	Built-ins	Any
Compiled UDFs

Resources

Authors

Dr. Josh Day

Senior Software Engineer

Authors

Dr. Josh Day

Senior Software Engineer

Authors

Dr. Josh Day

Senior Software Engineer

‹ Newsletter March 2019 - Julia Co-Creators Receive James H. Wilkinson Prize at SIAM CSE19

Growing a Compiler - Getting to Machine Learning from a General Purpose Compiler ›

Learn about Dyad

Get Dyad Studio – Download and install the IDE to start building hardware like software.

Read the Dyad Documentation – Dive into the language, tools, and workflow.

Join the Dyad Community – Connect with fellow engineers, ask questions, and share ideas.

Learn about Dyad

Get Dyad Studio – Download and install the IDE to start building hardware like software.

Read the Dyad Documentation – Dive into the language, tools, and workflow.

Join the Dyad Community – Connect with fellow engineers, ask questions, and share ideas.

Want to get enterprise support, schedule a demo, or learn about how we can help build a custom solution? We are here to help.

Contact Sales ›

Want to get enterprise support, schedule a demo, or learn about how we can help build a custom solution? We are here to help.

Contact Sales ›

Recent Blog Posts

All Blog Posts ›

Dec 22, 2025

•

Research & Innovation

Fortifying the Citadel: A Community Call to Secure the Julia Ecosystem

Mridul Upadhyay

Dec 12, 2025

•

Product Updates

Scaling Workflows and Securing the Enterprise: What’s New in JuliaHub 25.10

Mridul Upadhyay

Dec 11, 2025

•

Newsletter

Announcing Dyad v2.0.0, JuliaHub and Synopsys Partnership & Dyad Livestreaming

JuliaHub

Recent Blog Posts

All Blog Posts ›

Dec 22, 2025

•

Research & Innovation

Fortifying the Citadel: A Community Call to Secure the Julia Ecosystem

Dec 12, 2025

•

Product Updates

Scaling Workflows and Securing the Enterprise: What’s New in JuliaHub 25.10

Dec 11, 2025

•

Newsletter

JuliaDB in Julia 1.0

JuliaDB in Julia 1.0

Introducing JuliaDB

Example

Main Features

Just-in-Time Compiled

Compute in Parallel

Store Any Data Type

Fast User-Defined Functions

Fast CSV Parser

Open Source

JuliaDB for Time Series

Resources

Tags

Tags

Tags

Authors

Dr. Josh Day

Authors

Dr. Josh Day

Authors

Dr. Josh Day

Recent Blog Posts

Fortifying the Citadel: A Community Call to Secure the Julia Ecosystem

Scaling Workflows and Securing the Enterprise: What’s New in JuliaHub 25.10

Announcing Dyad v2.0.0, JuliaHub and Synopsys Partnership & Dyad Livestreaming

Recent Posts

Fortifying the Citadel: A Community Call to Secure the Julia Ecosystem

Scaling Workflows and Securing the Enterprise: What’s New in JuliaHub 25.10

Recent Blog Posts

Fortifying the Citadel: A Community Call to Secure the Julia Ecosystem

Scaling Workflows and Securing the Enterprise: What’s New in JuliaHub 25.10

Announcing Dyad v2.0.0, JuliaHub and Synopsys Partnership & Dyad Livestreaming