Reusing existing C and U matrixes stored as Cell Arrays

Back to Fan's Matlab Examples Table of Content

During Iteration Solution Procedure, sometimes we reuse matrixes.

Potentially, there could be significant speed gains if one does not need to fully recompute based on some N by M matrix, but can reuse it. For example, if we have a choice and state grid that does not change, current utility does not change across policy and value function iterations (only EV() changes).

Potentially, this involves increasing the memory costs of the program. Perhaps ones needs to store N by M by J matrix.

One might want to store results from existing calculations, and retrieve stored results during next iteration if most values from previous iteration are the same.

It turns out one has to be very careful in this process, storing data in large 2 dimensional or 3 dimensional matrixes, and slicing for retrieval actually does not bring about speed gains for the utility evaluation problem. The time it takes to slice and copy subset of tensors is as long as the time it takes to evaluate the matrix.

We need to save and retrieve without copying during retrieval. This is achieved by using cell array.

Using proper cell arrays, we achieve dramatic speed gains.

Problem

The set of operations below is common in iterative solution methods for value/policy functions.

  1. Looping over iteration
  2. within iteration looping over shocks
  3. compute some matrix for consumption value
  4. compute some matrix for utility based on consumption.

Data and Parameter Set Up

In [1]:
clear all;
it_z = 3;
it_rown = 300; % 4GB if 1000
it_coln = round(((it_rown-1)*it_rown)/2 + it_rown);
it_coln = 3000;
it_iter = 50;
it_coll = 0;
% define Arrays
ar_coh = rand([1,it_coln]);
ar_kp = rand([1,it_rown]);
ar_bp = rand([1,it_rown]);

Define Some Function

In [2]:
f_c = @(coh, kp, bp) coh - (kp + bp/(1+0.02))
f_u = @(c) log(c)
f_c =

  function_handle with value:

    @(coh,kp,bp)coh-(kp+bp/(1+0.02))


f_u =

  function_handle with value:

    @(c)log(c)


Tensor Speed and Matrix Speed

Time it takes to generate tensor vs 2 dimensional matrix, and matrix subset extraction speed.

In [3]:
f_ts_u_store = @() rand([it_rown, it_coln, it_z]);
f_mt_u_store = @() rand([it_rown, it_coln*it_z]);

fl_exe_time = timeit(f_ts_u_store)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'rand([it_rown, it_coln, it_z])';

fl_exe_time = timeit(f_mt_u_store)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'rand([it_rown, it_coln*it_z])';

ts_u_store = f_ts_u_store();
mt_u_store = f_mt_u_store();
fl_exe_time =

    0.0267


fl_exe_time =

    0.0372


Speed to Get Element of Matrix vs Evaluate and generate the same sized data.

This is the critical timing question.

Is it faster to store and retrieve? Or is it faster to generate new? Purely comparing data retrieving and data generating times.

In [4]:
f_ts_get = @() ts_u_store(:, :, 1);
f_mt_long_get = @() mt_u_store(:, (it_coln*0+1):(it_coln*1+1));

fl_exe_time = timeit(f_ts_get)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ts_u_store(:, :, 1)';

fl_exe_time = timeit(f_mt_long_get)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'mt_u_store(:, (it_coln*0+1):(it_coln*1+1))';
fl_exe_time =

    0.0045


fl_exe_time =

    0.0045


In [5]:
% time to compute utility
f_mt_c = @() f_u(f_c(ar_coh, ar_kp', ar_bp'));

fl_exe_time = timeit(f_mt_c)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'f_u(f_c(ar_coh, ar_kp, ar_bp))';
fl_exe_time =

    0.0087


Method 1--New Evaluation within Loop Every time

Compute consumption matrix every time.

In [6]:
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        mt_u = f_u(f_c(ar_coh, ar_kp', ar_bp'));
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(1, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(1): mt_u = f_u(f_c())';
Elapsed time is 1.390194 seconds.

fl_exe_time =

    1.5423


Method 2-Compute Consumption only Once

Is this faster? Actually not. The consumption evaluation step was not time consuming. So there seems to be no gain from saving the consumption results.

In fact, this is substantiallly slowly, perhaps it is time consuming to retrieve stored matrix.

Note that this set up here works with 1 state variable, where our consumption grid is actually constant across periods. It does not work with 2 choice model with consumption grid is changing for the cash-on-hand single state version of 2 states and 2 choices model.

In [7]:
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            ts_c_store(:,:,it_z_m) = mt_c;
            mt_u = f_u(mt_c);            
        else
            mt_u = f_u(ts_c_store(:,:,it_z_m));
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(2, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(2): f_u(ts_c_store(:,:,it_z_m))';
Elapsed time is 1.950945 seconds.

fl_exe_time =

    1.6480


Method 3-Compute Utility only Once

Let's not update utility, compute that only once. Now we have significant time saving. There should be slightly higher memory burden, although not substantial.

There are no speed gains at all, this is because in the following command structure, every time when the tensor ts_u_store is sliced to retrieve mt_u, a new mt_u array is copyied.

The time it takes to generate the new mt_u matrix is as much as it takes to calculate f_u(f_c())). Just because we stored the data in ts_u_store, does not mean there is no computational burden when we slice ts_u_store.

In [8]:
ts_u_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            ts_u_store(:,:,it_z_m) = mt_u;
        else
            mt_u = ts_u_store(:,:,it_z_m);
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(3, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(3): mt_u = ts_u_store(:,:,it_z_m);';
Elapsed time is 1.360896 seconds.

fl_exe_time =

    1.5409


Method 4-Each Shock Own Matrix

Rather than using a tensor, store values one by one, as a demonstration case, we separately name three mt_u matrixes. This method is dramatically faster.

But we do not want to have to name each of the matrix, that would involve hard-coding etc, not good.

In [9]:
tic;
for it_iter_n=1:1:it_iter
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            if (rem(it_z_m,3) == 0)
                mt_u_1 = mt_u;
            end
            if (rem(it_z_m,3) == 1)
                mt_u_2 = mt_u;
            end
            if (rem(it_z_m,3) == 2)
                mt_u_3 = mt_u;
            end
        else
            if (rem(it_z_m,3) == 0)
                mt_u = mt_u_1;
            end
            if (rem(it_z_m,3) == 1)
                mt_u = mt_u_2;
            end
            if (rem(it_z_m,3) == 2)
                mt_u = mt_u_3;
            end
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(4, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(4): mt_u_1, mt_u_2, mt_u_3;';
Elapsed time is 0.034070 seconds.

fl_exe_time =

    0.0299


Method 5-Cell Array for Storage

It turns out the optimal method is to store information in cell array. When we call element of cell array, that does not require data copying.

Method 4 and 5 speed costs are almost identical, meaning there is no overhead from the cell array.

In [10]:
tic;
cl_u_store = cell([it_z, 1]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            cl_u_store{it_z_m} = mt_u;
        else
            mt_u = cl_u_store{it_z_m};
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(5, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(5): mt_u = cl_u_store{it_z_m}';
Elapsed time is 0.027707 seconds.

fl_exe_time =

    0.0296


Timing Results

In [11]:
tb_exe_times = array2table([ar_fl_exe_times', ar_fl_exe_times'*it_z*it_iter]);
tb_exe_times.Properties.RowNames = ar_fl_exe_desc;
tb_exe_times.Properties.VariableNames = {'speedmat', 'speedfull'};
disp(tb_exe_times);
                                                             speedmat     speedfull
                                                             _________    _________

    rand([it_rown, it_coln, it_z])                            0.026652      3.9979 
    rand([it_rown, it_coln*it_z])                             0.037152      5.5728 
    ts_u_store(:, :, 1)                                      0.0045436     0.68154 
    mt_u_store(:, (it_coln*0+1):(it_coln*1+1))               0.0044851     0.67276 
    f_u(f_c(ar_coh, ar_kp, ar_bp))                           0.0086745      1.3012 
    ff_u_c_repeat_func(1): mt_u = f_u(f_c())                    1.5423      231.35 
    ff_u_c_repeat_func(2): f_u(ts_c_store(:,:,it_z_m))           1.648       247.2 
    ff_u_c_repeat_func(3): mt_u = ts_u_store(:,:,it_z_m);       1.5409      231.13 
    ff_u_c_repeat_func(4): mt_u_1, mt_u_2, mt_u_3;            0.029882      4.4823 
    ff_u_c_repeat_func(5): mt_u = cl_u_store{it_z_m}          0.029611      4.4416