]box on -style=max
1 Tabular Data
This article will get us started reading and working with tabular data. The goal is to read in some csv data, do a few basic operations (filtering, aggregation, etc.), and create some basic plots.
2 Moving Average
We will start with a moving average as the simplest model. To do this we need to accomplish a few things:
- Load data from CSV
- Filter data
- Calculate Moving Average
- Plot
2.1 Load Data from CSV
APL has a CSV function to read csv data in that we can use. I am using simple stock ticker data for a moving average.
eod ← 'eod-quotemedia.csv'
eod_ar ← ⎕CSV eod '' 4
⍴ eod_ar ⍝ Check Size
We can get the head of the dataframe by selecting the first 5 rows and 3 columns. It’s handy to have a small sized piece of data as you develop so we store in it’s own array also. ↑
Lets us specify the head slice.
⎕←eod_ar_s ← 5 3 ↑ eod_ar
2.2 Filter for 1 ticker
Next, we need to filter for a particular ticker. The first thing we need to know is where the column we want to sort on is located. We can of course see that it’s the second column, but let’s pretend we need to use APL for this to get some practice.
⎕ ← ticker_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker'
Let’s break this down a bit:
1⌷eod_ar
Gets the first row - this is our header row∊ ⊂'ticker'
Determines where ‘ticker’ is and returns a mask ([0 1 0]⍸
is applied to get the location of the true values (2),/
Applies a concatenation to flatten from an array to a value
The next step is getting a truth mask (ie [0 1 1 0 0]
) that tells us which rows contain the value to filter for, in this case AAPL
tickers ← eod_ar[;ticker_loc] ∊ ⊂ 'AAPL'
Similar to our first filtering section, ∊ ⊂ 'AAPL'
is checking membership of eod_ar[;ticker_loc]
to return the mask.
We can see we found 1009 rows out of 490,738 that had this ticker.
⍝ We have a truth array (0/1) of whether row contains 'AAPL' or not
('size',⍴ tickers) , 'Found',+/tickers
'min',(⌈/ tickers) , 'max',⌊/tickers
+/
distributes the +
sign between each item in the list giving us a sum of the vector. The same approach can be used for ⌊
and ⌈
to get minimum and maximum values (0 and 1). ⍴
gives us our shape.
Next we need to use this mask to actually filter out data. This is quite easy and we can pass our mask with our full array to do that filtering using ⌿
AAPL ← tickers ⌿ eod_ar
So when we put that together our full solution is:
col_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker' ⍝ Column Location
AAPL ← eod_ar ⌿⍨ eod_ar[;col_loc] ∊ ⊂ 'AAPL' ⍝ Filter Array
5 3 ↑ AAPL ⍝ Head of array
2.3 Calculate Moving Average
We will calculate a 90 period moving average over the stock prices for the AAPL ticker we just filtered for.
ws ← 90
Let’s start with calculating a moving sum. Instead of using +/
to sum over the full list, we can use +⌿
to get a moving sum based on the ws
we give it.
movsum ← ws +⌿ AAPL[;3]
When we have a moving sum, a moving average is easy. Simply divide the moving sum by the window size to get an average.
movavg ← movsum ÷ ws
'movavg length:',⍴ movavg
'AAPL length:', ⍴ AAPL[;3]
Our moving average is shorter than our original data because we are not calculating when we don’t have enough data at the beginning of our time period. Let’s pad the beginning with the raw values.
padded ← ((ws-1) ↑ AAPL[;3]) , movavg
'padded length:',⍴ padded
'AAPL length:', ⍴ AAPL[;3]
So the full moving average with padding back to original size looks like this
movavg ← (ws ↑ AAPL[;3]) , ws ÷⍨ ws +⌿ AAPL[;3]
2.4 Plot
We can not plot our original data with our moving average. It’s a bit annoying I have to pass in movavg
twice to get it to work. It is probably something I am not understanding about how plotting works in APL.
]Plot AAPL[;3] movavg movavg
2.5 Full Solution
eod_ar ← ⎕CSV 'eod-quotemedia.csv' '' 4
col_loc ← ,/ ⍸ 1⌷eod_ar ∊ ⊂'ticker'
AAPL ← eod_ar ⌿⍨ eod_ar[;col_loc] ∊ ⊂ 'AAPL'
movavg ← (ws ↑ AAPL[;3]) , ws ÷⍨ (ws←90) +⌿ AAPL[;3]
]Plot AAPL[;3] movavg movavg