optimizer.sweeper.Sweeper

optimizer.sweeper.Sweeper(self, optimizer)

Sweep optimizer for tensor-train

Methods

Name Description
sweep TT-sweep optimization

sweep

optimizer.sweeper.Sweeper.sweep(
    nsweeps=2
    maxdim=30
    cutoff=0.01
    optax_solver=None
    opt_maxiter=1000
    opt_tol=None
    opt_batchsize=10000
    opt_lambda=0.0
    onedot=False
    use_CG=False
    use_scipy=False
    use_jax_scipy=False
    method='L-BFGS-B'
    wf=1.0
    ord='fro'
    auto_onedot=True
)

TT-sweep optimization

Parameters

Name Type Description Default
nsweeps int The number of sweeps. 2
maxdim (int, list[int]) the maximum rank of TT-sweep. 30
cutoff (float, list[float]) the ratio of truncated singular values for TT-sweep. When one-dot core is optimized, this parameter is not used. 0.01
optax_solver optax.GradientTransformation the optimizer for TT-sweep. Defaults to None. If None, the optimizer is not used. None
opt_maxiter int the maximum number of iterations for TT-sweep. 1000
opt_tol (float, list[float]) the convergence criterion of gradient for TT-sweep. Defaults to None, i.e., opt_tol = cutoff. None
opt_batchsize int the size of mini-batch for TT-sweep. 10000
opt_lambda float the L2 regularization parameter for TT-sweep. Only use_CG=True is supported. 0.0
onedot bool whether to optimize one-dot or two-dot core. Defaults to False, i.e. two-dot core optimization. False
use_CG bool whether to use conjugate gradient method for TT-sweep. Defaults to False. CG is suitable for one-dot core optimization. False
use_scipy bool whether to use scipy.optimize.minimize for TT-sweep. Defaults to False and use L-BFGS-B method. GPU is not supported. False
use_jax_scipy bool whether to use jax.scipy.optimize.minimize for TT-sweep. Defaults to False. This optimizer is only supports BFGS method, which exhausts GPU memory. False
method str the optimization method for scipy.optimize.minimize. Defaults to ‘L-BFGS-B’. Note that jax.scipy.optimize.minimize only supports ‘BFGS’. 'L-BFGS-B'
wf float the weight factor of force \(w_f\) in the loss function. 1.0
ord str the norm for scaling the initial core. Defaults to ‘fro’. ‘max, maximum absolute value, 'fro', Frobenius norm, are supported. |’fro’| | auto_onedot | [bool](bool) | whether to switch to one-dot core optimization automatically once the maximum rank is reached. Defaults to True. This will cause overfitting in the beginning of the optimization. |True`

Returns

Name Type Description
pl.DataFrame pl.DataFrame: the optimization trace with columns ['epoch', 'mse_train', 'mse_test', 'tt_norm', 'tt_ranks'].

We recommend to use optax_solver for initial optimization and use_CG=True for the last fine-tuning.

Two-dot optimization algorithm

  1. Construct original two-dot tensor \(B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}}\)

\[ B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}} = \sum_{\beta_p} W^{[p]}_{\beta_{p-1} i_p \beta_p} W^{[p+1]}_{\beta_p i_{p+1} \beta_{p+1}} \]

  1. Shift two-dot tensor \(B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}}\) by \(\Delta B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}}\)

\[ B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}} \leftarrow B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}} + \Delta B\substack{i_p i_{p+1}\\\beta_{p-1} \beta_{p+1}} \]

  1. Execute singular value decomposition (truncate small singular values as needed)

\[ B\substack{i_p i_{p+1}\\ \beta_{p-1} \beta_{p+1}} = \sum_{\beta_p,\beta_p^\prime}^{M^\prime} U\substack{i_p\\ \beta_{p-1}\beta_p} S\substack{\beta_p\enspace \\ \enspace\beta_p^\prime} V\substack{i_{p+1}\\ \beta_p^\prime \beta_{p+1}} \simeq \sum_{\beta_p,\beta_p^\prime}^{M} U\substack{i_p\\ \beta_{p-1}\beta_p} S\substack{\beta_p\enspace \\ \enspace\beta_p^\prime} V\substack{i_{p+1}\\ \beta_p^\prime \beta_{p+1}} \quad (M^\prime \le M) \]

  1. Update parameters

\[ W^{[p]}_{\beta_{p-1} i_p \beta_p} \leftarrow U\substack{i_p\\ \beta_{p-1}\beta_p} \] \[ W^{[p+1]}_{\beta_p i_{p+1} \beta_{p+1}} \leftarrow \sum_{\beta_p^\prime} S\substack{\beta_p\enspace \\ \enspace\beta_p^\prime} V\substack{i_{p+1}\\ \beta_p^\prime \beta_{p+1}} \]

  1. Shift the center site to the left or right (sweeping)