-
E. Agullo.
Overview of Task-based Sparse and Data-sparse Solvers on Top of Runtime Systems.
In Sparse Days,
Toulouse, France,
June 2016.
Keyword(s): Sparse.
Abstract:
The complexity of the hardware architectures of modern supercomputers led the community of developers of scientific libraries to adopt new parallel programming paradigms. Among them, task-based programming has certainly become one of the most popular as it allows for high productivity while ensuring high performance and portability by delegating tasks management to a runtime system. In this talk, we will present an overview of sparse solvers that have been designed in the context of the Matrices Over Runtime Systems @ Exascale (MORSE) and Solvers for Heterogeneous Architectures (SOLHAR) projects. We will present the design of new direct solvers implementing supernodal (PaStiX) and multifrontal (qr-mumps) methods, new Krylov solvers ensuring pipelining both at a numerical and software level, new sparse hybrid methods (MaPHyS) as well as data sparse libraries implementing fast multipole methods (ScalFMM) and hierarchical matrices (hmat, in collaboration with Airbus Group Innovations). For all these methods, we will highlight the challenges we have faced in terms of expressivity, granularity, scheduling and scalability and illustrate their performance on large academic and industrial test problems. |
@InProceedings{C:LaBRI::sparsedays2016,
author = {Agullo, E.},
title = {Overview of Task-based Sparse and Data-sparse Solvers on Top of Runtime Systems},
booktitle = "Sparse Days",
OPTcrossref = {},
OPTkey = {},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Toulouse, France},
month = jun,
OPTorganization = {},
OPTpublisher = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = {The complexity of the hardware architectures of modern supercomputers led the community of developers of scientific libraries to adopt new parallel programming paradigms. Among them, task-based programming has certainly become one of the most popular as it allows for high productivity while ensuring high performance and portability by delegating tasks management to a runtime system. In this talk, we will present an overview of sparse solvers that have been designed in the context of the Matrices Over Runtime Systems @ Exascale (MORSE) and Solvers for Heterogeneous Architectures (SOLHAR) projects. We will present the design of new direct solvers implementing supernodal (PaStiX) and multifrontal (qr-mumps) methods, new Krylov solvers ensuring pipelining both at a numerical and software level, new sparse hybrid methods (MaPHyS) as well as data sparse libraries implementing fast multipole methods (ScalFMM) and hierarchical matrices (hmat, in collaboration with Airbus Group Innovations). For all these methods, we will highlight the challenges we have faced in terms of expressivity, granularity, scheduling and scalability and illustrate their performance on large academic and industrial test problems.}
}
-
M. Faverge,
G. Pichon,
and P. Ramet.
Exploiting Kepler architecture in sparse direct solver with runtime systems.
In Proceedings of PMAA'2016,
Bordeaux, France,
July 2016.
Keyword(s): Sparse.
Abstract:
Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. More recently, many works have addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. Despite researches towards generic solutions to efficiently exploit those accelerators, their hardware evolution requires continual adaptation of the kernels running on those architectures. The recent Nvidia architectures, as Kepler, present a larger number of parallel units thus requiring more data to feed every computational units. A solution considered to supply enough computation has been to study problems with large number of small computations. The batched BLAS libraries proposed by Intel, Nvidia, or the University of Tennessee are examples of this solution. We discuss in this talk the use of the variable size batched matrix-matrix multiply to improve the performance of a the PaStiX sparse direct solver. Indeed, this kernel suits the supernodal method of the solver, and the multiple updates of variable sizes that occur during the numerical factorization. Performance results on a spectrum of matrices with different properties will be presented. |
@InProceedings{C:LaBRI::PMAA2016,
author = {Faverge, M. and Pichon, G. and Ramet, P.},
title = {Exploiting {K}epler architecture in sparse direct solver with runtime systems},
OPTcrossref = {},
OPTkey = {},
booktitle = {Proceedings of {PMAA}'2016},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Bordeaux, France},
month = jul,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = { Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. More recently, many works have addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. Despite researches towards generic solutions to efficiently exploit those accelerators, their hardware evolution requires continual adaptation of the kernels running on those architectures. The recent Nvidia architectures, as Kepler, present a larger number of parallel units thus requiring more data to feed every computational units. A solution considered to supply enough computation has been to study problems with large number of small computations. The batched BLAS libraries proposed by Intel, Nvidia, or the University of Tennessee are examples of this solution. We discuss in this talk the use of the variable size batched matrix-matrix multiply to improve the performance of a the PaStiX sparse direct solver. Indeed, this kernel suits the supernodal method of the solver, and the multiple updates of variable sizes that occur during the numerical factorization. Performance results on a spectrum of matrices with different properties will be presented.}
}
-
G. Pichon,
E. Darve,
M. Faverge,
P. Ramet,
and J. Roman.
Exploiting H-Matrices in Sparse Direct Solvers.
In SIAM Conference on Parallel Processing for Scientific Computing,
Paris, France,
April 2016.
Keyword(s): Low-rank compression.
Abstract:
In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss ways to enhance the overall performance of the solver. |
@InProceedings{C:LaBRI::pp16b,
author = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
title = {{Exploiting H-Matrices in Sparse Direct Solvers}},
OPTcrossref = {},
OPTkey = {},
booktitle = {{SIAM} Conference on Parallel Processing for Scientific Computing},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Paris, France},
month = apr,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Low-rank compression",
ABSTRACT = {In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss ways to enhance the overall performance of the solver.}
}
-
G. Pichon,
E. Darve,
M. Faverge,
P. Ramet,
and J. Roman.
On the use of low rank approximations for sparse direct solvers.
In SIAM Annual Meeting,
Boston, USA,
July 2016.
Keyword(s): Low-rank compression.
Abstract:
In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss the impact of reordering technic to enhance the low-rank compression. |
@InProceedings{C:LaBRI::an16,
author = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
title = {On the use of low rank approximations for sparse direct solvers},
OPTcrossref = {},
OPTkey = {},
booktitle = {{SIAM} Annual Meeting},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Boston, USA},
month = jul,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Low-rank compression",
ABSTRACT = {In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss the impact of reordering technic to enhance the low-rank compression.}
}
-
G. Pichon,
E. Darve,
M. Faverge,
P. Ramet,
and J. Roman.
Sparse Supernodal Solver Using Hierarchical Compression.
In Workshop on Fast Direct Solvers,
Purdue, USA,
November 2016.
Keyword(s): Low-rank compression.
Abstract:
In this talk, we present the PaStiX sparse supernodal solver, using hierarchical compression to reduce the burden on large blocks appearing during the nested dissection process. To improve the efficiency of our sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we investigate to BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm. We will also discuss ordering strategies to enhance data locality and compressibility. |
@InProceedings{C:LaBRI::purdue2016,
author = {Pichon, G. and Darve, E. and Faverge, M. and Ramet, P. and Roman, J.},
title = {Sparse Supernodal Solver Using Hierarchical Compression},
OPTcrossref = {},
OPTkey = {},
booktitle = {Workshop on Fast Direct Solvers},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Purdue, USA},
month = nov,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Low-rank compression",
ABSTRACT = {In this talk, we present the PaStiX sparse supernodal solver, using hierarchical compression to reduce the burden on large blocks appearing during the nested dissection process. To improve the efficiency of our sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we investigate to BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm. We will also discuss ordering strategies to enhance data locality and compressibility.}
}
-
G. Pichon,
M. Faverge,
P. Ramet,
and J. Roman.
Impact of Blocking Strategies for Sparse Direct Solvers on Top of Generic Runtimes.
In SIAM Conference on Parallel Processing for Scientific Computing,
Paris, France,
April 2016.
Keyword(s): Sparse.
Abstract:
Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity for BLAS kernels efficiency and runtime management. In this talk, we present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, we will comment the resulting gain in the PaStiX solver implemented over StarPU and PaRSEC. |
@InProceedings{C:LaBRI::pp16a,
author = {Pichon, G. and Faverge, M. and Ramet, P. and Roman, J.},
title = {Impact of Blocking Strategies for Sparse Direct Solvers on Top of Generic Runtimes},
OPTcrossref = {},
OPTkey = {},
booktitle = {{SIAM} Conference on Parallel Processing for Scientific Computing},
OPTpages = {},
year = {2016},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {Paris, France},
month = apr,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = {Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity for BLAS kernels efficiency and runtime management. In this talk, we present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, we will comment the resulting gain in the PaStiX solver implemented over StarPU and PaRSEC.}
}