Back to Fan's Matlab Examples Table of Content
During Iteration Solution Procedure, sometimes we reuse matrixes.
Potentially, there could be significant speed gains if one does not need to fully recompute based on some N by M matrix, but can reuse it. For example, if we have a choice and state grid that does not change, current utility does not change across policy and value function iterations (only EV() changes).
Potentially, this involves increasing the memory costs of the program. Perhaps ones needs to store N by M by J matrix.
One might want to store results from existing calculations, and retrieve stored results during next iteration if most values from previous iteration are the same.
It turns out one has to be very careful in this process, storing data in large 2 dimensional or 3 dimensional matrixes, and slicing for retrieval actually does not bring about speed gains for the utility evaluation problem. The time it takes to slice and copy subset of tensors is as long as the time it takes to evaluate the matrix.
We need to save and retrieve without copying during retrieval. This is achieved by using cell array.
Using proper cell arrays, we achieve dramatic speed gains.
The set of operations below is common in iterative solution methods for value/policy functions.
clear all;
it_z = 3;
it_rown = 300; % 4GB if 1000
it_coln = round(((it_rown-1)*it_rown)/2 + it_rown);
it_coln = 3000;
it_iter = 50;
it_coll = 0;
% define Arrays
ar_coh = rand([1,it_coln]);
ar_kp = rand([1,it_rown]);
ar_bp = rand([1,it_rown]);
f_c = @(coh, kp, bp) coh - (kp + bp/(1+0.02))
f_u = @(c) log(c)
Time it takes to generate tensor vs 2 dimensional matrix, and matrix subset extraction speed.
f_ts_u_store = @() rand([it_rown, it_coln, it_z]);
f_mt_u_store = @() rand([it_rown, it_coln*it_z]);
fl_exe_time = timeit(f_ts_u_store)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'rand([it_rown, it_coln, it_z])';
fl_exe_time = timeit(f_mt_u_store)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'rand([it_rown, it_coln*it_z])';
ts_u_store = f_ts_u_store();
mt_u_store = f_mt_u_store();
This is the critical timing question.
Is it faster to store and retrieve? Or is it faster to generate new? Purely comparing data retrieving and data generating times.
f_ts_get = @() ts_u_store(:, :, 1);
f_mt_long_get = @() mt_u_store(:, (it_coln*0+1):(it_coln*1+1));
fl_exe_time = timeit(f_ts_get)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ts_u_store(:, :, 1)';
fl_exe_time = timeit(f_mt_long_get)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'mt_u_store(:, (it_coln*0+1):(it_coln*1+1))';
% time to compute utility
f_mt_c = @() f_u(f_c(ar_coh, ar_kp', ar_bp'));
fl_exe_time = timeit(f_mt_c)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'f_u(f_c(ar_coh, ar_kp, ar_bp))';
Compute consumption matrix every time.
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter
for it_z_m=1:1:it_z
mt_u = f_u(f_c(ar_coh, ar_kp', ar_bp'));
end
end
toc;
% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(1, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(1): mt_u = f_u(f_c())';
Is this faster? Actually not. The consumption evaluation step was not time consuming. So there seems to be no gain from saving the consumption results.
In fact, this is substantiallly slowly, perhaps it is time consuming to retrieve stored matrix.
Note that this set up here works with 1 state variable, where our consumption grid is actually constant across periods. It does not work with 2 choice model with consumption grid is changing for the cash-on-hand single state version of 2 states and 2 choices model.
ts_c_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter
for it_z_m=1:1:it_z
if (it_iter_n == 1)
mt_c = f_c(ar_coh, ar_kp', ar_bp');
ts_c_store(:,:,it_z_m) = mt_c;
mt_u = f_u(mt_c);
else
mt_u = f_u(ts_c_store(:,:,it_z_m));
end
end
end
toc;
% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(2, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(2): f_u(ts_c_store(:,:,it_z_m))';
Let's not update utility, compute that only once. Now we have significant time saving. There should be slightly higher memory burden, although not substantial.
There are no speed gains at all, this is because in the following command structure, every time when the tensor ts_u_store is sliced to retrieve mt_u, a new mt_u array is copyied.
The time it takes to generate the new mt_u matrix is as much as it takes to calculate f_u(f_c())). Just because we stored the data in ts_u_store, does not mean there is no computational burden when we slice ts_u_store.
ts_u_store = zeros([it_rown, it_coln, it_z]);
tic;
for it_iter_n=1:1:it_iter
for it_z_m=1:1:it_z
if (it_iter_n == 1)
mt_c = f_c(ar_coh, ar_kp', ar_bp');
mt_u = f_u(mt_c);
ts_u_store(:,:,it_z_m) = mt_u;
else
mt_u = ts_u_store(:,:,it_z_m);
end
end
end
toc;
% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(3, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(3): mt_u = ts_u_store(:,:,it_z_m);';
Rather than using a tensor, store values one by one, as a demonstration case, we separately name three mt_u matrixes. This method is dramatically faster.
But we do not want to have to name each of the matrix, that would involve hard-coding etc, not good.
tic;
for it_iter_n=1:1:it_iter
for it_z_m=1:1:it_z
if (it_iter_n == 1)
mt_c = f_c(ar_coh, ar_kp', ar_bp');
mt_u = f_u(mt_c);
if (rem(it_z_m,3) == 0)
mt_u_1 = mt_u;
end
if (rem(it_z_m,3) == 1)
mt_u_2 = mt_u;
end
if (rem(it_z_m,3) == 2)
mt_u_3 = mt_u;
end
else
if (rem(it_z_m,3) == 0)
mt_u = mt_u_1;
end
if (rem(it_z_m,3) == 1)
mt_u = mt_u_2;
end
if (rem(it_z_m,3) == 2)
mt_u = mt_u_3;
end
end
end
end
toc;
% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(4, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(4): mt_u_1, mt_u_2, mt_u_3;';
It turns out the optimal method is to store information in cell array. When we call element of cell array, that does not require data copying.
Method 4 and 5 speed costs are almost identical, meaning there is no overhead from the cell array.
tic;
cl_u_store = cell([it_z, 1]);
tic;
for it_iter_n=1:1:it_iter
for it_z_m=1:1:it_z
if (it_iter_n == 1)
mt_c = f_c(ar_coh, ar_kp', ar_bp');
mt_u = f_u(mt_c);
cl_u_store{it_z_m} = mt_u;
else
mt_u = cl_u_store{it_z_m};
end
end
end
toc;
% timing the function above store in file in the same folder
f_ff_subset_update_func = @() ff_u_c_repeat_func(5, ar_coh, ar_kp', ar_bp', f_u, f_c, it_iter, it_rown, it_coln, it_z);
fl_exe_time = timeit(f_ff_subset_update_func)
it_coll = it_coll + 1;
ar_fl_exe_times(it_coll) = fl_exe_time;
ar_fl_exe_desc{it_coll} = 'ff_u_c_repeat_func(5): mt_u = cl_u_store{it_z_m}';
tb_exe_times = array2table([ar_fl_exe_times', ar_fl_exe_times'*it_z*it_iter]);
tb_exe_times.Properties.RowNames = ar_fl_exe_desc;
tb_exe_times.Properties.VariableNames = {'speedmat', 'speedfull'};
disp(tb_exe_times);