For very specific uses of solve_triangular, the parallelization performance seems to degrade after 1.4.1. Here is an short example: # Imports from multiprocessing import Pool import time import numpy ...
When dim>256, mat result is wrong. mat[j]=DeviceArray([[1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], [1., 1., 1., ..., 1., 1., 1.], ..., [1., 1., 1 ...