Reusing existing C and U matrixes stored as Cell Arrays with Partial Replace

Back to Fan's Matlab Examples Table of Content

During Iteration Solution Procedure, sometimes only a subset of rows/columns need to be updated for some core matrix after each iteration.

Potentially, there could be significant speed gains if one does not need to fully recompute based on some N by M matrix, but can compute based on some N-n by M-m matrix, and update values in the N by M matrix with new values. See this file for examples when we fully reuse matrixes.

As stated here, we will store existing calculations in cell arrays.

Problem

The set of operations below is common in iterative solution methods for value/policy functions.

  1. Looping over iteration
  2. within iteration looping over shocks
  3. compute some matrix for consumption value
  4. compute some matrix for utility based on consumption.

Data and Parameter Set Up

In [1]:
clear all;
it_z = 15;
it_rown = 300; % 4GB if 1000
it_rown_update = floor(it_rown/7); % 1/10 of the rows are updated
ar_it_rows_replace = sort(datasample(1:it_rown, it_rown_update, 'Replace', false));
it_coln = round(((it_rown-1)*it_rown)/2 + it_rown);
it_coln = 3000;
it_iter = 50;
it_coll = 0;
% define Arrays
ar_coh = rand([1,it_coln]);
ar_kp = rand([1,it_rown]);
ar_bp = rand([1,it_rown]);

In [2]:
% ar_it_rows_replace

Define Some Function

In [3]:
f_c = @(coh, kp, bp) coh - (kp + bp/(1+0.02))
f_u = @(c) log(c)
f_c =

  function_handle with value:

    @(coh,kp,bp)coh-(kp+bp/(1+0.02))


f_u =

  function_handle with value:

    @(c)log(c)


Full and Partial Evaluation

Matrix evaluation, all rows, or some rows. Evaluating only subset of the rows takes less time.

In [4]:
mt_u = f_u(f_c(ar_coh, ar_kp', ar_bp'));
mt_u_partial = f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));
size(mt_u)
size(mt_u_partial)
ans =

         300        3000


ans =

          42        3000


In [5]:
mt_u = f_u(f_c(ar_coh, ar_kp', ar_bp'));
mt_u_partial = f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));

f_full_mt_eval = @() f_u(f_c(ar_coh, ar_kp', ar_bp'));
f_part_mt_eval = @() f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));

fl_exe_time = timeit(f_full_mt_eval)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'f_u(f_c(ar_coh, ar_kp, ar_bp))';

fl_exe_time = timeit(f_part_mt_eval)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace), ar_bp(ar_it_rows_replace)))';
fl_exe_time =

    0.0079


fl_exe_time =

    0.0014


Method 1--New Evaluation within Loop Every time

Compute consumption matrix every time.

In [6]:
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        mt_u = f_u(f_c(ar_coh, ar_kp', ar_bp'));
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_partrepeat_func(1, ar_it_rows_replace, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_partrepeat_func(1): mt_u = f_u(f_c())';
Elapsed time is 7.087624 seconds.

fl_exe_time =

    6.9068


Method 2-Compute Consumption only Once

Is this faster? Actually not. The consumption evaluation step was not time consuming. So there seems to be no gain from saving the consumption results.

In fact, this is substantiallly slowly, perhaps it is time consuming to retrieve stored matrix.

Note that this set up here works with 1 state variable, where our consumption grid is actually constant across periods. It does not work with 2 choice model with consumption grid is changing for the cash-on-hand single state version of 2 states and 2 choices model.

In [7]:
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            ts_c_store(:,:,it_z_m) = mt_c;
            mt_u = f_u(mt_c);
        else
            ts_c_store(ar_it_rows_replace,:,it_z_m) = f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)');            
            mt_u = f_u(ts_c_store(:,:,it_z_m));
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_partrepeat_func(2, ar_it_rows_replace, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_partrepeat_func(2): f_u(ts_c_store(:,:,it_z_m))';
Elapsed time is 8.496887 seconds.

fl_exe_time =

    8.5708


Method 3-Compute Utility only Once

Let's not update utility, compute that only once. Now we have significant time saving. There should be slightly higher memory burden, although not substantial.

There are no speed gains at all, this is because in the following command structure, every time when the tensor ts_u_store is sliced to retrieve mt_u, a new mt_u array is copyied.

The time it takes to generate the new mt_u matrix is as much as it takes to calculate f_u(f_c())). Just because we stored the data in ts_u_store, does not mean there is no computational burden when we slice ts_u_store.

In [8]:
ts_u_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            ts_u_store(:,:,it_z_m) = mt_u;
        else
            ts_u_store(ar_it_rows_replace,:,it_z_m) = f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));
            mt_u = ts_u_store(:,:,it_z_m);
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_partrepeat_func(3, ar_it_rows_replace, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_partrepeat_func(3): mt_u = ts_u_store(:,:,it_z_m);';
Elapsed time is 7.913867 seconds.

fl_exe_time =

    7.7258


Method 4-Each Shock Own Matrix

Rather than using a tensor, store values one by one, as a demonstration case, we separately name three mt_u matrixes. This method is dramatically faster.

But we do not want to have to name each of the matrix, that would involve hard-coding etc, not good.

In [9]:
tic;
for it_iter_n=1:1:it_iter
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            if (rem(it_z_m,3) == 0)
                mt_u_1 = mt_u;
            end
            if (rem(it_z_m,3) == 1)
                mt_u_2 = mt_u;
            end
            if (rem(it_z_m,3) == 2)
                mt_u_3 = mt_u;
            end
        else
            mt_updates = f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));
            if (rem(it_z_m,3) == 0)
                mt_u_1(ar_it_rows_replace,:) = mt_updates;
                mt_u = mt_u_1;
            end
            if (rem(it_z_m,3) == 1)
                mt_u_2(ar_it_rows_replace,:) = mt_updates;
                mt_u = mt_u_2;
            end
            if (rem(it_z_m,3) == 2)
                mt_u_3(ar_it_rows_replace,:) = mt_updates;
                mt_u = mt_u_3;
            end
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_partrepeat_func(4, ar_it_rows_replace, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_partrepeat_func(4): mt_u_1, mt_u_2, mt_u_3;';
Elapsed time is 1.299855 seconds.

fl_exe_time =

    1.2896


Method 5-Cell Array for Storage

This was fast when no matrix updating was required, now this is a little slower, but still pretty good.

The less changes we have to make during each iteration, and also the more iterations there are, the better this method becomes.

Note the key replacement command is : cl_u_store{it_z_m}(ar_it_rows_replace,:)

In [10]:
tic;
cl_u_store = cell([it_z, 1]);
tic;
for it_iter_n=1:1:it_iter    
    for it_z_m=1:1:it_z
        if (it_iter_n == 1)
            mt_c = f_c(ar_coh, ar_kp', ar_bp');
            mt_u = f_u(mt_c);
            cl_u_store{it_z_m} = mt_u;
        else        
            cl_u_store{it_z_m}(ar_it_rows_replace,:) = f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace)', ar_bp(ar_it_rows_replace)'));
            mt_u = cl_u_store{it_z_m};            
        end
    end
end
toc;

% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_partrepeat_func(5, ar_it_rows_replace, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_partrepeat_func(5): mt_u = cl_u_store{it_z_m}';
Elapsed time is 1.417557 seconds.

fl_exe_time =

    1.4229


Timing Results

In [11]:
tb_exe_times = array2table([ar_fl_exe_times', ar_fl_exe_times'*it_z*it_iter]);
tb_exe_times.Properties.RowNames = ar_fl_exe_desc;
tb_exe_times.Properties.VariableNames = {'speedmat', 'speedfull'};
disp(tb_exe_times);
                                                                              speedmat     speedfull
                                                                              _________    _________

    f_u(f_c(ar_coh, ar_kp, ar_bp))                                            0.0079088     5.9316  
    f_u(f_c(ar_coh, ar_kp(ar_it_rows_replace), ar_bp(ar_it_rows_replace)))    0.0013723     1.0292  
    ff_u_c_partrepeat_func(1): mt_u = f_u(f_c())                                 6.9068     5180.1  
    ff_u_c_partrepeat_func(2): f_u(ts_c_store(:,:,it_z_m))                       8.5708     6428.1  
    ff_u_c_partrepeat_func(3): mt_u = ts_u_store(:,:,it_z_m);                    7.7258     5794.3  
    ff_u_c_partrepeat_func(4): mt_u_1, mt_u_2, mt_u_3;                           1.2896     967.19  
    ff_u_c_partrepeat_func(5): mt_u = cl_u_store{it_z_m}                         1.4229     1067.2