-
P. Hénon,
B. Nkonga,
P. Ramet,
and J. Roman.
Using of the High Performance Sparse Solver PaStiX for the Complex Multiscale 3D Simulations performed by the FluidBox Fluid Mechanics Software.
In Proceedings of PMAA'2004,
Marseille, France,
October 2004.
Keyword(s): Sparse.
Abstract:
In this paper, we consider a hyperbolic system with multiple time step characteristics. Such a situation arises for example in combustion problems when the acoustic time is small compared to the characteristic time associated to the flame propagation. The problems investigated in this paper are characterized by a small Mach number. At the asymptotic limit, the initial hyperbolic system degenerates to an elliptic problem. Therefore, numerical methods proposed with the assumption of hyperbolicity of the system becomes hill conditioned at this limit. As a consequence, the iterative methods used in the numerical algorithm implemented in the software FluidBox, have a worse convergence behavior. Some physical preconditioning has been proposed to overcome this difficulty. However, in the context of parallel computing, a global preconditioning is unavoidable for performance efficiency.
The parallelization of FluidBox relies on a domain decomposition. A first version of FluidBox was using a block Jacobi or a block Gauss-Seidel preconditioner that are easily implementable in this framework. But to solve 3D problems up to several millions of unknowns on numerous processors, this kind of preconditioner becomes inefficient du to their lack of scalability and robustness. Hence, a collaboration inside the INRIA ScAlApplix project has been setup to use the high performance solver library PaStiX that provides both complete and incomplete factorizations on clusters of SMP nodes to solve large scale computations.
The aim of this work is then to investigate the performance of the combination of FluidBox and PaStiX (both developped in the INRIA ScAlApplix project) and also present the parallel assembly algorithm that allows a good load balance in this context. |
@InProceedings{C:LaBRI::PMAA2004b,
author = "H\'enon, P. and Nkonga, B. and Ramet, P. and Roman, J.",
title = "Using of the High Performance Sparse Solver PaStiX for the Complex Multiscale 3D Simulations performed by the FluidBox Fluid Mechanics Software",
booktitle = "Proceedings of {PMAA}'2004",
OPTcrossref = {},
OPTkey = {},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
year = "2004",
OPTorganization = {},
OPTpublisher = {},
address = {Marseille, France},
month = oct,
OPTpages = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = { In this paper, we consider a hyperbolic system with multiple time step characteristics. Such a situation arises for example in combustion problems when the acoustic time is small compared to the characteristic time associated to the flame propagation. The problems investigated in this paper are characterized by a small Mach number. At the asymptotic limit, the initial hyperbolic system degenerates to an elliptic problem. Therefore, numerical methods proposed with the assumption of hyperbolicity of the system becomes hill conditioned at this limit. As a consequence, the iterative methods used in the numerical algorithm implemented in the software FluidBox, have a worse convergence behavior. Some physical preconditioning has been proposed to overcome this difficulty. However, in the context of parallel computing, a global preconditioning is unavoidable for performance efficiency.
The parallelization of FluidBox relies on a domain decomposition. A first version of FluidBox was using a block Jacobi or a block Gauss-Seidel preconditioner that are easily implementable in this framework. But to solve 3D problems up to several millions of unknowns on numerous processors, this kind of preconditioner becomes inefficient du to their lack of scalability and robustness. Hence, a collaboration inside the INRIA ScAlApplix project has been setup to use the high performance solver library PaStiX that provides both complete and incomplete factorizations on clusters of SMP nodes to solve large scale computations.
The aim of this work is then to investigate the performance of the combination of FluidBox and PaStiX (both developped in the INRIA ScAlApplix project) and also present the parallel assembly algorithm that allows a good load balance in this context. }
}
-
P. Hénon,
F. Pellegrini,
P. Ramet,
J. Roman,
and Y. Saad.
Applying parallel direct solver skills to build robust and highly performant preconditioners.
In Proceedings of PARA'2004,
volume 3732 of LNCS,
Copenhagen, Denmark,
pages 601-619,
June 2004.
Springer Verlag.
Keyword(s): Sparse.
Abstract:
The purpose of our work is to provide a method which exploits the parallel blockwise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust preconditioners based on a parallel incomplete factorization. The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. |
@InProceedings{C:LaBRI::para2004,
author = {H\'enon, P. and Pellegrini, F. and Ramet, P. and Roman, J. and Saad, Y.},
title = {Applying parallel direct solver skills to build robust and highly performant preconditioners},
booktitle = {Proceedings of {PARA'2004}},
OPTcrossref = {},
OPTkey = {},
pages = {601--619},
year = {2004},
OPTeditor = {},
volume = {3732},
OPTnumber = {},
series = {LNCS},
address = {Copenhagen, Denmark},
month = jun,
OPTorganization = {},
publisher = "Springer Verlag",
OPTnote = {},
OPTannote = {},
URL = {http://www.labri.fr/~ramet/restricted/para2004.pdf},
ABSTRACT = {The purpose of our work is to provide a method which exploits the parallel blockwise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust preconditioners based on a parallel incomplete factorization. The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers.},
KEYWORDS = "Sparse"
}
-
P. Hénon,
F. Pellegrini,
P. Ramet,
J. Roman,
and Y. Saad.
High Performance Complete and Incomplete Factorizations for Very Large Sparse Systems by using Scotch and PaStiX softwares.
In Eleventh SIAM Conference on Parallel Processing for Scientific Computing,
San Francisco, USA,
February 2004.
Keyword(s): Sparse.
Abstract:
PaStiX is a scientific library that provides a high performance direct supernodal solver for very large sparse linear systems. It relies on a block factorization based on an hybrid ordering (Nested Dissection + Halo Approximate Minimum Degree) obtained using the Scotch library. Efficient static scheduling and memory management are used to solve irregular problems with more of 25 millions unknowns on clusters of SMP nodes. In order to solve larger 3D problems, we apply these blockwise algorithms to compute robust and efficient parallel ILU preconditioners. |
@InProceedings{C:LaBRI::ppsc2004a,
author = {H\'enon, P. and Pellegrini, F. and Ramet, P. and Roman, J. and Saad, Y.},
title = {High Performance Complete and Incomplete Factorizations for Very Large Sparse Systems by using {Scotch} and {PaStiX} softwares},
booktitle = {Eleventh {SIAM} Conference on Parallel Processing for Scientific Computing},
OPTcrossref = {},
OPTkey = {},
OPTpages = {},
year = {2004},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
address = {San Francisco, USA},
month = feb,
OPTorganization = {},
OPTpublisher = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = { PaStiX is a scientific library that provides a high performance direct supernodal solver for very large sparse linear systems. It relies on a block factorization based on an hybrid ordering (Nested Dissection + Halo Approximate Minimum Degree) obtained using the Scotch library. Efficient static scheduling and memory management are used to solve irregular problems with more of 25 millions unknowns on clusters of SMP nodes. In order to solve larger 3D problems, we apply these blockwise algorithms to compute robust and efficient parallel ILU preconditioners. }
}
-
P. Hénon,
P. Ramet,
and J. Roman.
A Blockwise Algorithm for Parallel Incomplete Cholesky Factorization.
In Proceedings of PMAA'2004,
Marseille, France,
October 2004.
Keyword(s): Sparse.
Abstract:
Solving large sparse linear systems by iterative methods has often been quite unsatisfactory when dealing with pratical "industrial" problems. The main difficulty encountered by such methods is their lack of robustness and, generally, the unpredictability and unconsistency of their performance over a wide sample of different problems; certain methods work quite well for certain types of problems but can fail completely on other problems.
Over the past few years, direct methods have made significant progress thanks to either the combinatorial analysis of the Gaussian elimination process and the parallel algorithmic of blockwise solvers optimized for modern parallel supercomputers. Its is now possible to solve practical three-dimensional problems in the order of several millions of equations in a very powerful way with the direct solvers that efficiently use the superscalar effects of modern processors.
However, direct methods may fail to solve very large three-dimensional problems, due to the large amount of memory needed for these cases.
In our work, we consider an approach which, we hope, will bridge the gap between the two classes of methods. The goal is to provide a method which exploits the parallel blockwise algorithmic used in the framework of the high performance sparse direct solvers for developping robust parallel incomplete factorization based preconditioners for iterative solvers.
The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Our approach consists in computing symbolically the block structure of the factors that would have been obtained with a complete factorization, and then deciding to drop off some blocks of this structure according to relevant criterions. Such incomplete factorization can take advantage of the latest breakthroughts in sparse direct methods and therefore be very competitive in CPU time while avoiding the memory limitation encountered by direct methods. By this way, we expect to be able to solve systems in the order of hundred millions of unknowns. |
@InProceedings{C:LaBRI::PMAA2004a,
author = "H\'enon, P. and Ramet, P. and Roman, J.",
title = "A Blockwise Algorithm for Parallel Incomplete Cholesky Factorization",
booktitle = "Proceedings of {PMAA}'2004",
OPTcrossref = {},
OPTkey = {},
OPTeditor = {},
OPTvolume = {},
OPTnumber = {},
OPTseries = {},
year = "2004",
OPTorganization = {},
OPTpublisher = {},
address = {Marseille, France},
month = oct,
OPTpages = {},
OPTnote = {},
OPTannote = {},
KEYWORDS = "Sparse",
ABSTRACT = { Solving large sparse linear systems by iterative methods has often been quite unsatisfactory when dealing with pratical "industrial" problems. The main difficulty encountered by such methods is their lack of robustness and, generally, the unpredictability and unconsistency of their performance over a wide sample of different problems; certain methods work quite well for certain types of problems but can fail completely on other problems.
Over the past few years, direct methods have made significant progress thanks to either the combinatorial analysis of the Gaussian elimination process and the parallel algorithmic of blockwise solvers optimized for modern parallel supercomputers. Its is now possible to solve practical three-dimensional problems in the order of several millions of equations in a very powerful way with the direct solvers that efficiently use the superscalar effects of modern processors.
However, direct methods may fail to solve very large three-dimensional problems, due to the large amount of memory needed for these cases.
In our work, we consider an approach which, we hope, will bridge the gap between the two classes of methods. The goal is to provide a method which exploits the parallel blockwise algorithmic used in the framework of the high performance sparse direct solvers for developping robust parallel incomplete factorization based preconditioners for iterative solvers.
The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Our approach consists in computing symbolically the block structure of the factors that would have been obtained with a complete factorization, and then deciding to drop off some blocks of this structure according to relevant criterions. Such incomplete factorization can take advantage of the latest breakthroughts in sparse direct methods and therefore be very competitive in CPU time while avoiding the memory limitation encountered by direct methods. By this way, we expect to be able to solve systems in the order of hundred millions of unknowns. }
}