Tabular Data Intro

Dyalog
APL
Tabular
Author

Isaac Flath

Published

July 7, 2022

Tabular Data

This article will get us started reading and working with tabular data. The goal is to read in some csv data, do a few basic operations (filtering, aggregation, etc.), and create some basic plots.

]box on -style=max
┌→────────────────┐ │Was ON -style=max│ └─────────────────┘

Moving Average

We will start with a moving average as the simplest model. To do this we need to accomplish a few things:

  1. Load data from CSV
  2. Filter data
  3. Calculate Moving Average
  4. Plot

Load Data from CSV

APL has a CSV function to read csv data in that we can use. I am using simple stock ticker data for a moving average.

eod ← 'eod-quotemedia.csv'
eod_ar ← ⎕CSV eod '' 4
⍴ eod_ar ⍝ Check Size
┌→───────┐ │490738 3│ └~───────┘

We can get the head of the dataframe by selecting the first 5 rows and 3 columns. It’s handy to have a small sized piece of data as you develop so we store in it’s own array also. Lets us specify the head slice.

⎕←eod_ar_s ← 5 3 ↑ eod_ar
┌→──────────────────────────────────┐ ↓ ┌→───┐ ┌→─────┐ ┌→────────┐ │ │ │date│ │ticker│ │adj_close│ │ │ └────┘ └──────┘ └─────────┘ │ │ ┌→─────────┐ ┌→┐ │ │ │2013-07-01│ │A│ 29.9942 │ │ └──────────┘ └─┘ │ │ ┌→─────────┐ ┌→┐ │ │ │2013-07-02│ │A│ 29.6501 │ │ └──────────┘ └─┘ │ │ ┌→─────────┐ ┌→┐ │ │ │2013-07-03│ │A│ 29.7052 │ │ └──────────┘ └─┘ │ │ ┌→─────────┐ ┌→┐ │ │ │2013-07-05│ │A│ 30.4346 │ │ └──────────┘ └─┘ │ └∊──────────────────────────────────┘

Filter for 1 ticker

Next, we need to filter for a particular ticker. The first thing we need to know is where the column we want to sort on is located. We can of course see that it’s the second column, but let’s pretend we need to use APL for this to get some practice.

⎕ ← ticker_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker'
2

Let’s break this down a bit:

  • 1⌷eod_ar Gets the first row - this is our header row
  • ∊ ⊂'ticker' Determines where ‘ticker’ is and returns a mask ([0 1 0]
  • is applied to get the location of the true values (2)
  • ,/ Applies a concatenation to flatten from an array to a value

The next step is getting a truth mask (ie [0 1 1 0 0]) that tells us which rows contain the value to filter for, in this case AAPL

tickers ← eod_ar[;ticker_loc] ∊ ⊂ 'AAPL'

Similar to our first filtering section, ∊ ⊂ 'AAPL' is checking membership of eod_ar[;ticker_loc] to return the mask.

We can see we found 1009 rows out of 490,738 that had this ticker.

⍝ We have a truth array (0/1) of whether row contains 'AAPL' or not
('size',⍴ tickers) , 'Found',+/tickers 
'min',(⌈/ tickers) , 'max',⌊/tickers
┌→─────────────────────┐ │size 490738 Found 1009│ └+─────────────────────┘
┌→──────────┐ │min 1 max 0│ └+──────────┘

+/ distributes the + sign between each item in the list giving us a sum of the vector. The same approach can be used for and to get minimum and maximum values (0 and 1). gives us our shape.

Next we need to use this mask to actually filter out data. This is quite easy and we can pass our mask with our full array to do that filtering using

AAPL ← tickers ⌿ eod_ar

So when we put that together our full solution is:

col_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker'           ⍝ Column Location
AAPL ← eod_ar ⌿⍨ eod_ar[;col_loc] ∊ ⊂ 'AAPL'  ⍝ Filter Array
5 3 ↑ AAPL                                    ⍝ Head of array
┌→────────────────────────────┐ ↓ ┌→─────────┐ ┌→───┐ │ │ │2013-07-01│ │AAPL│ 53.1092 │ │ └──────────┘ └────┘ │ │ ┌→─────────┐ ┌→───┐ │ │ │2013-07-02│ │AAPL│ 54.3122 │ │ └──────────┘ └────┘ │ │ ┌→─────────┐ ┌→───┐ │ │ │2013-07-03│ │AAPL│ 54.612 │ │ └──────────┘ └────┘ │ │ ┌→─────────┐ ┌→───┐ │ │ │2013-07-05│ │AAPL│ 54.1734 │ │ └──────────┘ └────┘ │ │ ┌→─────────┐ ┌→───┐ │ │ │2013-07-08│ │AAPL│ 53.8658 │ │ └──────────┘ └────┘ │ └∊────────────────────────────┘

Calculate Moving Average

We will calculate a 90 period moving average over the stock prices for the AAPL ticker we just filtered for.

ws ← 90

Let’s start with calculating a moving sum. Instead of using +/ to sum over the full list, we can use +⌿ to get a moving sum based on the ws we give it.

movsum ← ws +⌿ AAPL[;3]

When we have a moving sum, a moving average is easy. Simply divide the moving sum by the window size to get an average.

movavg ← movsum ÷ ws
'movavg length:',⍴ movavg
'AAPL length:', ⍴ AAPL[;3]
┌→─────────────────┐ │movavg length: 920│ └+─────────────────┘
┌→────────────────┐ │AAPL length: 1009│ └+────────────────┘

Our moving average is shorter than our original data because we are not calculating when we don’t have enough data at the beginning of our time period. Let’s pad the beginning with the raw values.

padded ← ((ws-1) ↑ AAPL[;3]) , movavg
'padded length:',⍴ padded
'AAPL length:', ⍴ AAPL[;3]
┌→──────────────────┐ │padded length: 1009│ └+──────────────────┘
┌→────────────────┐ │AAPL length: 1009│ └+────────────────┘

So the full moving average with padding back to original size looks like this

movavg ← (ws ↑ AAPL[;3]) , ws ÷⍨ ws +⌿ AAPL[;3]

Plot

We can not plot our original data with our moving average. It’s a bit annoying I have to pass in movavg twice to get it to work. It is probably something I am not understanding about how plotting works in APL.

]Plot AAPL[;3] movavg movavg
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 50 60 70 80 90 100 110 120 130 140 150 160 for X-axis labels 0 100 200 300 400 500 600 700 800 900 1000 1100 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Points follow ... Line Points follow ... Line Reset to original origin
┌⊖┐ ⌽0│ └~┘

Full Solution

eod_ar ← ⎕CSV 'eod-quotemedia.csv' '' 4
col_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker'       
AAPL ← eod_ar ⌿⍨ eod_ar[;col_loc] ∊ ⊂ 'AAPL'  
movavg ← (ws ↑ AAPL[;3]) , ws ÷⍨ (ws←90) +⌿ AAPL[;3]
]Plot AAPL[;3] movavg movavg
Created by Causeway SVG engine - SharpPlot v3.71.0 Paint the paper ===== Border ===== Region ===== X-Axis Ticks ===== X-Axis tickmarks Y-Axis Ticks ===== Y-Axis tickmarks Axes ===== Y-axis labels 50 60 70 80 90 100 110 120 130 140 150 160 for X-axis labels 0 100 200 300 400 500 600 700 800 900 1000 1100 Heading, subheading and footnotes ===== Start of Line Chart =========== Points follow ... Line Points follow ... Line Points follow ... Line Reset to original origin
┌⊖┐ ⌽0│ └~┘