BACK TO INDEX

All publications sorted by Books and proceedings
  1. Esragul Korkmaz. Improving the memory and time overhead of low-rank parallel linear sparse direct solvers. Theses, Université de Bordeaux, September 2022. Keyword(s): Low-Rank compression.
    Abstract:
    Through the recent improvements toward exascale supercomputer systems, huge computations can be performed in reasonable times by using massively parallelized operations. Unfortunately, the increase of the computational units in these systems does not lead to a rise in the memory available per core. Therefore, this memory limitation forces the scientists/engineers to not only efficiently parallelize the operations but also minimize the memory used. Many scientific and engineering applications have to solve large sparse linear systems of the type Ax = b. Although the direct methods are the most robust solutions for these systems, they are costly in terms of their memory usage and time-to-solution. In this respect, the low-rank representations have been recently introduced into these solvers to reduce the time and memory footprint. In this work, our goal is to improve the low-rank feature of the block low-rank (BLR) sparse supernodal direct solver PaStiX. For this purpose, we compare some compression methods to determine the fastest kernel, which keeps the representative data with the smallest rank possible. Then, we focus on improving supernodal solver by reducing the number of re-compression during the updates. Firstly, we study the separator reordering strategies to identify the poorly compressible blocks involved in these updates and reduce their occurrences. Secondly, we propose an orthogonal solution to predict thecompressibility of the blocks before the numerical factorization. This last approach relies on the use of the level of fill of a symbolic block incomplete factorization. Thanks to these optimizations, the memory usage has been reduced more effectively compared to the state of the art solvers while also improving the time to solution. This thesis is a requested first step toward a advanced sparse direct solver using hierarchical compression schemes.
    [bibtex-key = korkmaz:tel-03875858] [bibtex-entry]


  2. Gregoire Pichon. On the use of low-rank arithmetic to reduce the complexity of parallel sparse linear solvers based on direct factorization techniques. Theses, Université de Bordeaux, November 2018. Keyword(s): Low-Rank compression.
    Abstract:
    Solving sparse linear systems is a problem that arises in many scientific applications, and sparse direct solvers are a time consuming and key kernel for those applications and for more advanced solvers such as hybrid direct-iterative solvers. For those reasons, optimizing their performance on modern architectures is critical. However, memory requirements and time-to-solution limit the use of direct methods for very large matrices. For other approaches, such as iterative methods, general black-box preconditioners that can ensure fast convergence for a wide range of problems are still missing. In the first part of this thesis, we present two approaches using a Block Low-Rank (BLR) compression technique to reduce the memory footprint and/or the time-to-solution of a supernodal sparse direct solver. This flat, non-hierarchical, compression method allows to take advantage of the low-rank property of the blocks appearing during the factorization of sparse linear systems. The proposed solver can be used either as a direct solver at a lower precision or as a very robust preconditioner. The first approach, called Minimal-Memory, illustrates the maximum memory gain that can be obtained with the BLR compression method, while the second approach, called Just-In-Time, mainly focuses on reducing the computational complexity and thus the time-to-solution. In the second part, we present a reordering strategy that increases the block granularity to better take advantage of the locality for multicores and provide larger tasks to GPUs. This strategy relies on the block-symbolic factorization to refine the ordering produced by tools such as Metis or Scotch, but it does not impact the number of operations required to solve the problem. From this approach, we propose in the third part of this manuscript a new low-rank clustering technique that is designed to cluster unknowns within a separator to obtain the BLR partition, and demonstrate its assets with respect to widely used clustering strategies. Both reordering and clustering where designed for the flat BLR representation but are also a first step to move to hierarchical formats. We investigate in the last part of this thesis a modified nested dissection strategy that aligns separators with respect to their father to obtain more regular data structure.
    [bibtex-key = pichon:tel-01953908] [bibtex-entry]


  3. P. Ramet. Heterogeneous architectures, Hybrid methods, Hierarchical matrices for Sparse Linear Solvers. Habilitation à diriger des recherches, Université de Bordeaux, November 2017. Keyword(s): Sparse. [bibtex-key = ramet:tel-01668740] [bibtex-entry]


  4. A. Casadei. Optimizations of hybrid sparse linear solvers relying on Schur complement and domain decomposition approaches. PhD thesis, Université de Bordeaux, October 2015. Keyword(s): Sparse.
    Abstract:
    In this thesis, we focus on the parallel solving of large sparse linear systems. Our main interest is on direct-iterative hybrid solvers such as HIPS, MAPHYS, PDSLIN or SHYLU, which rely on domain decomposition and Schur complement approaches. Althrough these solvers are not as time and space consuming as direct methods, they still suffer from serious overheads. In a first part, we thus present the existing techniques for reducing the memory consumption, and we present a new method which does not impact the numerical robustness of the preconditioner. This technique reduces the memory peak by doing a special scheduling of computation, allocation, and freeing tasks in particular in the Schur coupling blocks of the matrix. In a second part, we focus on the load balancing of the domain decomposition in a parallel context. This problem consists in partitioning the adjacency graph of the matrix in as many domains as desired. We point out that a good load balancing for the most expensive steps of an hybrid solver such as MAPHYS relies on the balancing of both interior nodes and interface nodes of the domains. Through, until now, graph partitioners such as METIS or SCOTCH used to optimize only the first criteria (i.e. the balancing of interior nodes) in the context of sparse matrix ordering. We propose different variations of the existing algorithms to improve the balancing of interface nodes and interior nodes simultaneously. All our changes are implemented in the SCOTCH partitioner. We present our results on large collection of matrices coming from real industrial cases.
    [bibtex-key = t:LaBRI::AC15] [bibtex-entry]


  5. X. Lacoste. Scheduling and memory optimizations for sparse direct solver on multi-core/multi-gpu duster systems. PhD thesis, Université de Bordeaux, February 2015. Keyword(s): Sparse.
    Abstract:
    The ongoing hardware evolution exhibits an escalation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application developers to leave the traditional programming paradigms and explore alternative solutions. PaStiX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this thesis, we study the benefits and the limits of replacing the highly specialized internal scheduler of the PaStiX solver by two generic runtime systems: PaRSEC and StarPU. Thus, we have to describe the factorization algorithm as a tasks graph that we provide to the runtime system. Then it can decide how to process and optimize the graph traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PaStiX solver on top of its original internal scheduler, PaRSEC, and StarPU frameworks is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embedded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer. In this thesis, we also study the possibilities to build a distributed sparse linear solver on top of task-based runtime systems to target heterogeneous clusters. To permit an efficient and easy usage of these developments in parallel simulations, we also present an optimized distributed interface aiming at hiding the complexity of the construction of a distributed matrix to the user.
    [bibtex-key = t:LaBRI::XL15] [bibtex-entry]


  6. S. Moustafa. Massively Parallel Cartesian Discrete Ordinates Method for Neutron Transport Simulation. PhD thesis, Université de Bordeaux, December 2015. Keyword(s): Neutron. [bibtex-key = t:LaBRI::SM15] [bibtex-entry]


  7. Mathieu Chanaud. Conception d'un solveur haute performance de systèmes linéaires creux couplant des méthodes multigrilles et directes pour la résolution des équations de Maxwell 3D en régime harmonique discrétisées par éléments finis. PhD thesis, Université de Bordeaux, December 2011. Keyword(s): Sparse. [bibtex-key = t:LaBRI::MC09] [bibtex-entry]


  8. B. Lathuilière. Domain decomposition method for the Simplified Transport Equation in neutronic. PhD thesis, Université Sciences et Technologies - Bordeaux I, February 2010. Keyword(s): Neutron.
    Abstract:
    The reactivity computations are an essential component for the simulation of the core of a nuclear plant. These computations lead to generalized eigenvalue problems solved by the inverse power iteration algorithm. At each iteration, an algebraic linear system is solved through an inner/outer process. With the solver Cocagne developed at EDF, it is difficult to take into account very fine discretisation, due to the memory requirement and the computation time. In this thesis, a domain decomposition method based on the Schur dual technique is studied. Several placements in the inner/outer process are possible. Two of them are implemented and the results analyzed. The second one, which uses the specificities of the Raviart Thomas finite elements and of the alternating directions algorithm, leads to very promising results. From these results the industrialization of the method can be considered.
    [bibtex-key = t:LaBRI::BL10] [bibtex-entry]


  9. M. Faverge. Static-Dynamic Hybrid Scheduling in sparse linear algebra for large clusters of NUMA and multi-cores architectures. PhD thesis, Université Sciences et Technologies - Bordeaux I, December 2009. Keyword(s): Sparse.
    Abstract:
    New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. These are called NUMA architectures. Sparse direct solvers are a basic building block of many numerical simulation algorithms. They need to be adapted to these new architectures with Non Uniform Memory Accesses. We propose to introduce a dynamic scheduling designed for NUMA architectures in the PaStiX, solver. The data structures of the solver, as well as the patterns of communication have been modified to meet the needs of these architectures and dynamic scheduling. We are also interested in the dynamic adaptation of the computation grain to use efficiently multi-core architectures and shared memory. Experiments on several numerical test cases will be presented to prove the efficiency of the approach on different architectures.
    [bibtex-key = t:LaBRI::MF09] [bibtex-entry]


  10. J. Gaidamour. Design of a parallel hybrid direct/iterative sparse linear solver. PhD thesis, Université Sciences et Technologies - Bordeaux I, December 2009. Keyword(s): Sparse.
    Abstract:
    This thesis presents a parallel resolution method for sparse linear systems which combines effectively techniques of direct and iterative solvers using a Schur complement approach. A domain decomposition is built ; the interiors of the subdomains are eliminated by a direct method in order to use an iterative method only on the interface unknowns. The system on the interface (Schur complement) is solved thanks to an iterative method preconditioned by a global incomplete factorization. A special ordering on the Schur complement allows to build a scalable preconditioner. Algorithms minimizing the memory peak that appears during the construction of the preconditioner are presented. The memory is balanced thanks to a multiple domains per processors parallelization scheme. The methods are implemented in the Hips solver and parallel experimental results are presented on large industrial test cases.
    [bibtex-key = t:LaBRI::JG09] [bibtex-entry]


  11. P. Hénon. Distribution des Données et Régulation Statique des Calculs et des Communications pour la Résolution de Grands Systèmes Linéaires Creux par Méthode Directe. PhD thesis, LaBRI, Université Bordeaux, Talence, France, November 2001. Keyword(s): Sparse.
    Abstract:
    Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. In this work, we consider the block partitioning and scheduling problem for sparse parallel factorization without pivoting. We focus on the scalability of the parallel solver, and on the compromise between memory overhead and efficiency. We validate this study with parallel experiments on a large collection of irregular industrial problems.
    [bibtex-key = t:LaBRI::PH01] [bibtex-entry]


  12. D. Goudin. Mise en oe uvre d'une Bibliothèque d'Outils pour la Résolution Parallèle Hautes Performances par Méthode Directe de Grands Systèmes Linéaires Creux et application à un Code de Mécanique des Structures. PhD thesis, LaBRI, Université Bordeaux I, Talence, France, November 2000. Keyword(s): Sparse.
    Abstract:
    This thesis initially concerned the parallelization of the OSSAU software code from CEA/CESTA. The application domain of this software is the vectorized structural mechanics; the code itself is non linear in time and in two or three dimensions. This investigation leads to the conception and the implementation of a parallel high performance software processing chain for the assembly and the resolution of sparse linear systems by direct methods. The final objective is a validation of the OSSAU code for three dimension problems with several millions of unknowns.
    [bibtex-key = t:LaBRI::DG2k] [bibtex-entry]


  13. P. Ramet. Optimisation de la Communication et de la Distribution des Données pour des Solveurs Parallèles Directs en Algèbre Linéaire Dense et Creuse. PhD thesis, LaBRI, Université Bordeaux I, Talence, France, January 2000. Keyword(s): Overlap, Sparse.
    Abstract:
    This thesis deals with the high performance computation problems and more specifically with those of scientific parallel computation for irregular real-world applications. In the first part, we describe a method for overlapping communications on parallel computers with distributed memory. This method has resulted in a generic computation scheme for the optimal packet size. We also tackle the problem of finding the optimal computation grain for the Cholesky factorization algorithm for dense matrices. The goal of this study is to exploit the irregularity induced by the matrix symmetry. Based on this work we have developped a portable software library providing an efficient application context for these techniques. The second part of this thesis presents and analyses a general algorithm for the computation of an efficient static scheduling of block computations, developped especially for a parallel direct sparse linear factorization based on a combination of 1D and 2D block distributions. Our solver uses a supernodal Fan-In approach and is fully driven by our static scheduling algorithm. Compared to the existing parallel direct solvers our solver shows very favorable performance results.
    [bibtex-key = t:LaBRI::PR2k] [bibtex-entry]


  14. Matthias Hoelzl, Guido Huijsmans, Stanislas Pamela, Marina Bécoulet, Eric Nardon, Francisco Javier Artola, Boniface Nkonga, Calin Vlad Atanasiu, Vinodh Bandaru, Ashish Bhole, Daniele Bonfiglio, Andres Cathey, Olivier Czarny, Anastasia Dvornova, Tamas Fehér, Alexandre Fil, Emmanuel Franck, Shimpei Futatani, Marta Gruca, Hervé Guillard, Willem J. Haverkort, Ihor Holod, Di Hu, S.K. Kim, Sven Q. Korving, Leon Kos, Isabel Krebs, Lukas Kripner, Guillaume Latu, Franklin Liu, Peter Merkel, Dmytro Meshcheriakov, Verena Mitterauer, Serhiy Mochalskyy, Jorge A. Morales, Richard Nies, Nikita Nikulsin, François Orain, Jane Pratt, Rohan Ramasamy, Pierre Ramet, Cédric Reux, Konsta Särkimäki, N. Schwarz, Prabal Singh Verma, Siobhan Smith F., Cristian Sommariva, Erika Strumberger, Daan C. van Vugt, M. Verbeek, Egbert Westerhof, Fabian Wieschollek, and Jeffery Zielinski. The JOREK non-linear extended MHD code and applications to large-scale instabilities and their control in magnetically confined fusion plasmas. Nuclear Fusion, 61(6):065001, May 2021. Keyword(s): Fusion. [bibtex-key = hoelzl:hal-03352509] [bibtex-entry]


  15. Salli Moustafa, François Févotte, Mathieu Faverge, Laurent Plagne, and Pierre Ramet. Efficient Parallel Solution of the 3D Stationary Boltzmann Transport Equation for Diffusive Problems. Journal of Computational Physics, 388:335 - 349, March 2019. [bibtex-key = moustafa:hal-02080624] [bibtex-entry]


  16. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse supernodal solver using block low-rank compression: Design, performance and analysis. International Journal of Computational Science and Engineering, 27:255 - 270, July 2018. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01824275] [bibtex-entry]


  17. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Reordering Strategy for Blocking Optimization in Sparse Linear Solvers. SIAM Journal on Matrix Analysis and Applications, 38(1):226 - 248, 2017. Keyword(s): Sparse. [bibtex-key = pichon:hal-01485507] [bibtex-entry]


  18. S. Moustafa, I. Dutka-Malen, L. Plagne, A. Poncot, and P. Ramet. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver. Annals of Nuclear Energy, 2014. ISSN: 0306-4549. Keyword(s): Neutron. [bibtex-key = A:LaBRI::ane2014] [bibtex-entry]


  19. O. Coulaud, L. Giraud, P. Ramet, and X. Vasseur. Developments in Parallel, Distributed, Grid and Cloud Computing for Engineering, chapter Augmentation and Deflation in Krylov subspace methods, pages 249-275. Saxe-Coburg Publications, Kippen, Stirlingshire, United Kingdom, 2013. ISBN: 978-1-874672-62-3. [bibtex-key = A:LaBRI::PARENG13] [bibtex-entry]


  20. M. Barrault, B. Lathuilière, P. Ramet, and J. Roman. Efficient parallel resolution of the simplified transport equations in mixed-dual formulation. Journal of Computational Physics, 230(5):2004 - 2020, 2011. ISSN: 0021-9991. Keyword(s): Neutron. [bibtex-key = A:LaBRI::jcp2011] [bibtex-entry]


  21. R. Abgrall, R. Huart, and P. Ramet. Numerical simulation of unsteady MHD flows and applications. MagnetoHydroDynamics Journal, 45(2):225-232, 2009. Keyword(s): Fusion. [bibtex-key = A:LaBRI::MHD2009] [bibtex-entry]


  22. G. Huysmans, S. Pamela, E. van der Plas, and P. Ramet. Non-linear MHD simulations of edge localized modes (ELMs). Plasma Physics and Controlled Fusion, 51(12):124012, 2009. Keyword(s): Fusion.
    Abstract:
    Non-linear MHD simulations of edge localized modes (ELMs) show features in qualitative agreement with the experimental observations such as the formation and speed of filaments, features in the radial profiles and the fine structure observed in the power deposition profiles at the divertor target. The density perturbation predominantly follows the ballooning mode convection cells leading to density filaments. The temperature perturbation, due to the large parallel conduction, follows the magnetic field perturbation. Simulations of pellets injected in the H-mode pedestal show that the high pressure in the high density plasmoid can become large enough to drive ballooning type modes forming a single helical structure located at the pellet (plasmoid) position.
    [bibtex-key = A:LaBRI::EPS2009] [bibtex-entry]


  23. P. Hénon, P. Ramet, and J. Roman. On finding approximate supernodes for an efficient ILU(k) factorization. Parallel Computing, 34:345-362, 2008. Keyword(s): Sparse.
    Abstract:
    Among existing preconditioners, the level-of-fill ILU has been quite popular as a general-purpose technique. Experimental observations have shown that, when coupled with block techniques, these methods can be quite effective in solving realistic problems arising from various applications. In this work, we consider an extension of this kind of method which is suitable for parallel environments. Our method is developed from the framework of high performance sparse direct solvers. The main idea we propose is to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. These requirements lead to a robust class of parallel preconditioners based on generalized versions of block ILU techniques.
    [bibtex-key = A:LaBRI::HRR07] [bibtex-entry]


  24. P. Hénon and Y. Saad. A Parallel Multilevel ILU Factorization based on a Hierarchical Graph Decomposition. SIAM Journal of Scientific Computing, 2006. Keyword(s): Sparse. [bibtex-key = A:LaBRI::sisc06] [bibtex-entry]


  25. Olivier Coulaud, Michael. Dussère, Pascal Hénon, Erik Lefebvre, and Jean Roman. Optimization of a kinetic laser-plasma interaction code for large parallel systems. Parallel Computing, 29(9):1175-1190, 2003. ISSN: 0167-8191. [bibtex-key = A:LaBRI::PMAA2002c] [bibtex-entry]


  26. P. Hénon, P. Ramet, and J. Roman. PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems. Parallel Computing, 28(2):301-321, January 2002. Keyword(s): Sparse.
    Abstract:
    Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. This paper considers the block partitioning and scheduling problem for sparse parallel factorization without pivoting. There are two major aims to this study: the scalability of the parallel solver, and the compromise between memory overhead and efficiency. Parallel experiments on a large collection of irregular industrial problems validate our approach.
    [bibtex-key = A:LaBRI::HRR01a] [bibtex-entry]


  27. E. Caron, S. Chaumette, S. Contassot-Vivier, F. Desprez, E. Fleury, C. Gomez, M. Goursat, E. Jeannot, D. Lazure, F. Lombard, J.M. Nicod, L. Philippe, M. Quinson, P. Ramet, J. Roman, F. Rubi, S. Steer, F. Suter, and G. Utard. Scilab to Scilab//, the OURAGAN Project. Parallel Computing, 11(27):1497-1519, October 2001. [bibtex-key = A:LaBRI::CCC+01] [bibtex-entry]


  28. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Parallel Sparse Linear Algebra and Application to Structural Mechanics. Numerical Algorithms, 24:371-391, 2000. Keyword(s): Sparse.
    Abstract:
    The framework of this paper is the parallelization of a plasticity algorithm that uses an implicit method and an incremental approach. More precisely, we will focus on some specific parallel sparse linear algebra algorithms which are the most time-consuming steps to solve efficiently such an engineering application. First, we present a general algorithm which computes an efficient static scheduling of block computations for parallel sparse linear factorization. The associated solver, based on a supernodal fan-in approach, is fully driven by this scheduling. Second we describe a scalable parallel assembly algorithm based on a distribution of elements induced by the previous distribution for the blocks of the sparse matrix. We give an overview of these algorithms and present performance results on an IBM SP2 for a collection of grid and irregular problems.
    [bibtex-key = A:LaBRI::GHPRRP3] [bibtex-entry]


  29. Robert Falgout, Matthieu Lecouvez, Pierre Ramet, and Clément Richefort. Toward a Multigrid Method for the Indefinite Helmholtz Equation. In SIAM Conference on Computational Science and Engineering (CSE23), Amsterdam, Netherlands, February 2023. Keyword(s): Multigrid. [bibtex-key = falgout:hal-04046630] [bibtex-entry]


  30. Clément Richefort, Matthieu Lecouvez, Robert Falgout, and Pierre Ramet. Toward a multilevel method for the Helmholtz equation. In 21st SIAM Copper Mountain Conference on Multigrid Method, Copper Mountain, CO, United States, April 2023. Keyword(s): Multigrid. [bibtex-key = richefort:hal-04046622] [bibtex-entry]


  31. Esragul Korkmaz, Mathieu Faverge, Grégoire Pichon, and Pierre Ramet. Deciding Non-Compressible Blocks in Sparse Direct Solvers using Incomplete Factorization. In HiPC 2021 - 28th IEEE International Conference on High Performance Computing, Data, and Analytics, Bangalore, India, pages 1-10, December 2021. IEEE. Keyword(s): Low-rank compression. [bibtex-key = korkmaz:hal-03361299] [bibtex-entry]


  32. P. Ramet. Study of the recent developments around the PaStiX solver for the EoCoE project: distributed mem- ory, runtime systems, and low-rank. In 32nd International Conference on Parallel Computational Fluid Dynamics, Nice, France, May 2021. Keyword(s): Sparse.
    Abstract:
    As the core of a large number of simulation tools, the resolution of large linear systems often represents the dominant part of the computing time. Massively parallel versions are needed to maintain advances in multi-physics and multi-scale simulations, especially when targeting exascale platforms. The aim is therefore to address the major challenge of designing and building numerically robust solvers on runtime systems that can scale up and push back the limits of existing industrial codes of the EoCoE project. In this talk, we will study the recent changes made to the solver with matrices issued from the project such as the block low-rank compression factorization, the capacity to exploit modern GPU accelerators through runtime systems, and the scalability on distributed memory.
    [bibtex-key = C:LaBRI::cfd21] [bibtex-entry]


  33. Changjiang Gou, Ali Al Zoobi, Anne Benoit, Mathieu Faverge, Loris Marchal, Grégoire Pichon, and Pierre Ramet. Improving mapping for sparse direct solvers: A trade-off between data locality and load balancing. In EuroPar 2020 - 26th International European Conference on Parallel and Distributed Computing, Warsaw / Virtual, Poland, pages 1-16, August 2020. Keyword(s): Load balancing. [bibtex-key = gou:hal-02973315] [bibtex-entry]


  34. G. Pichon, E. Korkmaz, M. Faverge, and P. Ramet. Recent Developments Around the Block Low-Rank PaStiX Solver. In 19th SIAM Conference on Parallel Processing for Scientific Computing, Seattle, USA, February 2020. Note: Minisymposium on Low-Rank Compression-Based Fast Sparse Direct Solvers. Keyword(s): Low-rank compression. [bibtex-key = C:LaBRI::pp20] [bibtex-entry]


  35. M. Faverge, G. Pichon, and P. Ramet. Exploiting Parameterized Task-graph in Sparse Direct Solvers. In SIAM Conference on Computational Science and Engineering (CSE19), Spokane, United States, February 2019. Keyword(s): Low-rank compression.
    Abstract:
    Task-based programming models have been widely studied in the context of dense linear algebra, but remains less studied for the more complex sparse solvers. In this talk, we will present the use of two different programming models: Sequential Task Flow from StarPU, and Parameterized Task Graph from PaRSEC to parallelize the factorization step of the PaStiX sparse direct solver. We will present how those programming models have been used to integrate more complex and finer parallelism to take into account new architectures with many computational units. Efficiency of such solutions on homogeneous and heterogeneous architectures with a spectrum of matrices from different applications will be shown. We will also present how such solutions enable, without extra cost to the programmer, better performance on irregular computations such as in the block low-rank implementation of the solver.
    [bibtex-key = faverge:hal-01956963] [bibtex-entry]


  36. Esragul Korkmaz, Mathieu Faverge, Grégoire Pichon, and Pierre Ramet. Rank Revealing QR Methods for Sparse Block Low Rank Solvers. In COMPAS 2019 - Conférence d'informatique en Parallélisme, Architecture et Système, Anglet, France, June 2019. Keyword(s): Randomized. [bibtex-key = korkmaz:hal-02326084] [bibtex-entry]


  37. Esragul Korkmaz, Mathieu Faverge, Grégoire Pichon, and Pierre Ramet. Rank Revealing QR Methods for Sparse Block Low Rank Solvers. In Sparse Days 2019, Toulouse, France, July 2019. Keyword(s): Randomized. [bibtex-key = korkmaz:hal-02326070] [bibtex-entry]


  38. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Block Low-rank Algebraic Clustering for Sparse Direct Solvers. In SIAM Conference on Computational Science and Engineering (CSE19), Spokane, United States, February 2019. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we adress the Block Low-Rank (BLR) clustering problem, to cluster unknowns within separators appearing during the factorization of sparse matrices. We show that methods considering only intra-separators connectivity (i.e., k-way or recursive bissection) as well as methods managing only interaction between separators have some limitations. The new strategy we propose consider interactions between a separator and its children to pre-select some interactions while reducing the number of off-diagonal blocks. We demonstrate how this method enhance the BLR strategies in the sparse direct supernodal solver PaStiX, and discuss how it can be extended to low-rank formats with more than one level of hierarchy.
    [bibtex-key = pichon:hal-01956962] [bibtex-entry]


  39. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers. In PMAA 2018 - 10th International Workshop on Parallel Matrix Algorithms and Applications, Zurich, Swiss, June 2018. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we present new ordering heuristics to perform block low-rank clustering in supernodes issued from the nested dissection. As kway partitioning within supernodes does not take into account interactions between supernodes, there is room to improve compression rates. We combine kway partitioning with a reordering strategy that aims at minimizing the number of off-diagonal blocks in the symbolic structure and show that both methods are limited. In addition, we propose a selection of some non-compressible vertices to handle the corresponding blocks in full-rank and reduce the burden on managing low-rank blocks with high ranks.
    [bibtex-key = pichon:hal-01956960] [bibtex-entry]


  40. G. Pichon. Utilisation de la compression Block Low-Rank pour accélérer un solveur direct creux supernodal. In COMPAS 2017, Sophia Antipolis, France, June 2017. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01585660] [bibtex-entry]


  41. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver Using Block Low-Rank Compression. In 18th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2017), Orlando, United States, June 2017. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01502215] [bibtex-entry]


  42. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver Using Hierarchical Compression over Runtime System. In SIAM Conference on Computation Science and Engineering, Atlanta, USA, February 2017. Keyword(s): Sparse.
    Abstract:
    In this talk, we present the PaStiX sparse supernodal solver, using hierarchical compression to reduce the burden on large blocks appearing during the nested dissection process. We compare the numerical stability, and the performance in terms of memory consumption and time to solution of different approaches by selecting when the compression of the factorized matrix occurs. In order to improve the efficiency of the sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we investigate the BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm.
    [bibtex-key = C:LaBRI::siam2017a] [bibtex-entry]


  43. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver exploiting Low-Rankness Property. In Sparse Days 2017, Toulouse, France, September 2017. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we will present recent advances on PaStiX, a supernodal sparse direct solver, which has been enhanced by the introduction of Block Low-Rank compression. We will describe different strategies leading to memory consumption gain and/or time-to-solution reduction. Finally, the implementation on top of runtime systems (Parsec, StarPU), will be compared with the static scheduling used in previous experiments.
    [bibtex-key = pichon:hal-01585622] [bibtex-entry]


  44. G. Pichon, M. Faverge, and P. Ramet. Exploiting Modern Manycore Architecture in Sparse Direct Solver with Runtime Systems. In SIAM Conference on Computation Science and Engineering, Atlanta, USA, February 2017. Keyword(s): Sparse.
    Abstract:
    Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. More recently, many works have addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. Despite researches towards generic solutions to efficiently exploit those accelerators, their hardware evolution requires continual adaptation of the kernels running on those architectures. The recent Nvidia architectures, as Kepler, present a larger number of parallel units thus requiring more data to feed every computational units. A solution considered to supply enough computation has been to study problems with large number of small computations. The batched BLAS libraries proposed by Intel, Nvidia, or the University of Tennessee are examples of this solution. We discuss in this talk the use of the variable size batched matrix-matrix multiply to improve the performance of a the PaStiX sparse direct solver. Indeed, this kernel suits the supernodal method of the solver, and the multiple updates of variable sizes that occur during the numerical factorization. Performance results on a spectrum of matrices with different properties will be presented.
    [bibtex-key = C:LaBRI::siam2017b] [bibtex-entry]


  45. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Impact of Blocking Strategies for Sparse Direct Solvers on Top of Generic Runtimes. In SIAM Conference on Computation Science and Engineering, Atlanta, USA, February 2017. Keyword(s): Sparse.
    Abstract:
    Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity for BLAS kernels efficiency and runtime management. In this talk, we present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, we will comment the resulting gain in the PaStiX solver implemented over StarPU and PaRSEC.
    [bibtex-key = C:LaBRI::siam2017c] [bibtex-entry]


  46. E. Agullo. Overview of Task-based Sparse and Data-sparse Solvers on Top of Runtime Systems. In Sparse Days, Toulouse, France, June 2016. Keyword(s): Sparse.
    Abstract:
    The complexity of the hardware architectures of modern supercomputers led the community of developers of scientific libraries to adopt new parallel programming paradigms. Among them, task-based programming has certainly become one of the most popular as it allows for high productivity while ensuring high performance and portability by delegating tasks management to a runtime system. In this talk, we will present an overview of sparse solvers that have been designed in the context of the Matrices Over Runtime Systems @ Exascale (MORSE) and Solvers for Heterogeneous Architectures (SOLHAR) projects. We will present the design of new direct solvers implementing supernodal (PaStiX) and multifrontal (qr-mumps) methods, new Krylov solvers ensuring pipelining both at a numerical and software level, new sparse hybrid methods (MaPHyS) as well as data sparse libraries implementing fast multipole methods (ScalFMM) and hierarchical matrices (hmat, in collaboration with Airbus Group Innovations). For all these methods, we will highlight the challenges we have faced in terms of expressivity, granularity, scheduling and scalability and illustrate their performance on large academic and industrial test problems.
    [bibtex-key = C:LaBRI::sparsedays2016] [bibtex-entry]


  47. M. Faverge, G. Pichon, and P. Ramet. Exploiting Kepler architecture in sparse direct solver with runtime systems. In Proceedings of PMAA'2016, Bordeaux, France, July 2016. Keyword(s): Sparse.
    Abstract:
    Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. More recently, many works have addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. Despite researches towards generic solutions to efficiently exploit those accelerators, their hardware evolution requires continual adaptation of the kernels running on those architectures. The recent Nvidia architectures, as Kepler, present a larger number of parallel units thus requiring more data to feed every computational units. A solution considered to supply enough computation has been to study problems with large number of small computations. The batched BLAS libraries proposed by Intel, Nvidia, or the University of Tennessee are examples of this solution. We discuss in this talk the use of the variable size batched matrix-matrix multiply to improve the performance of a the PaStiX sparse direct solver. Indeed, this kernel suits the supernodal method of the solver, and the multiple updates of variable sizes that occur during the numerical factorization. Performance results on a spectrum of matrices with different properties will be presented.
    [bibtex-key = C:LaBRI::PMAA2016] [bibtex-entry]


  48. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Exploiting H-Matrices in Sparse Direct Solvers. In SIAM Conference on Parallel Processing for Scientific Computing, Paris, France, April 2016. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss ways to enhance the overall performance of the solver.
    [bibtex-key = C:LaBRI::pp16b] [bibtex-entry]


  49. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. On the use of low rank approximations for sparse direct solvers. In SIAM Annual Meeting, Boston, USA, July 2016. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we describe a preliminary fast direct solver using HODLR library to compress large blocks appearing in the symbolic structure of the PaStiX sparse direct solver. We will present our general strategy before analyzing the practical gains in terms of memory and floating point operations with respect to a theoretical study of the problem. Finally, we will discuss the impact of reordering technic to enhance the low-rank compression.
    [bibtex-key = C:LaBRI::an16] [bibtex-entry]


  50. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver Using Hierarchical Compression. In Workshop on Fast Direct Solvers, Purdue, USA, November 2016. Keyword(s): Low-rank compression.
    Abstract:
    In this talk, we present the PaStiX sparse supernodal solver, using hierarchical compression to reduce the burden on large blocks appearing during the nested dissection process. To improve the efficiency of our sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we investigate to BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm. We will also discuss ordering strategies to enhance data locality and compressibility.
    [bibtex-key = C:LaBRI::purdue2016] [bibtex-entry]


  51. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Impact of Blocking Strategies for Sparse Direct Solvers on Top of Generic Runtimes. In SIAM Conference on Parallel Processing for Scientific Computing, Paris, France, April 2016. Keyword(s): Sparse.
    Abstract:
    Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity for BLAS kernels efficiency and runtime management. In this talk, we present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, we will comment the resulting gain in the PaStiX solver implemented over StarPU and PaRSEC.
    [bibtex-key = C:LaBRI::pp16a] [bibtex-entry]


  52. A. Casadei and P. Ramet. Towards a recursive graph bipartitioning algorithm for well balanced domain decomposition. In Mini-Symposium on Combinatorial Issues in Sparse Matrix Computation at ICIAM'15 conference, Pekin, China, August 2015. Keyword(s): Sparse.
    Abstract:
    In the context of hybrid sparse linear solvers based on domain decomposition and Schur complement approaches, getting a domain decomposition tool leading to a good balancing of both the internal node set size and the interface node set size is a critical point for parallel computation. We propose several variations of the existing algorithms in the multilevel Scotch partitioner and we illustrate the improved results on a collection of graphs coming from numerical scientific applications.
    [bibtex-key = C:LaBRI::iciam15b] [bibtex-entry]


  53. A. Casadei, P. Ramet, and J. Roman. Towards a recursive graph bipartitioning algorithm for well balanced domain decomposition. In Mini-Symposium on Partitioning for Complex Objectives at SIAM CSE'15 conference, Salt Lake City, USA, March 2015. Keyword(s): Sparse. [bibtex-key = C:LaBRI::cse15b] [bibtex-entry]


  54. M. Faverge, G. Pichon, P. Ramet, and J. Roman. Blocking strategy optimizations for sparse direct linear solver on heterogeneous architectures. In Sparse Days, Saint Girons, France, June 2015. Keyword(s): Sparse.
    Abstract:
    In the context of solving sparse linear systems, an ordering process partitions the matrix graph to minimize both fill-in and computational cost. We found that the ordering strategy used within supernodes might be enhanced to reduce the number of off-diagonal blocks, and then increases block sizes and kernel performance. This turns to be into the same complexity as the factorization algorithm, but allows for more efficient BLAS kernels. On the other side, supernodes that are too large need to be split to create more parallelism. The regular splitting strategy when applied locally impacts significantly the number of off-diagonal blocks and might have negative effect on the efficiency. In this talk, we present both a new strategy to improve supernodes ordering and splitting strategy that both enlarge the off-diagonal block sizes without changing the computational cost of the factorization. Performance improvement gains on the supernodal solver PaStiX are shown on multi-cores and heterogeneous architectures.
    [bibtex-key = C:LaBRI::sparsedays2015] [bibtex-entry]


  55. M. Faverge, G. Pichon, P. Ramet, and J. Roman. On the use of H-Matrix Arithmetic in PaStiX: a Preliminary Study. In Workshop on Fast Solvers, Toulouse, France, June 2015. Keyword(s): Low-rank compression.
    Abstract:
    When solving large sparse linear systems, both the amount of memory needed and the computational cost represent a burden to efficiency. In order to solve larger systems, low-rank strategies are used to reduce the overall complexity of a solver. In this talk, we present a preliminary study of the use of H-Matrix arithmetic in a supernodal solver. We also present a new feature in PaStiX, a reordering strategy to reduce the number of off-diagonal blocks in the symbolic factorization. It allows BLAS kernels to be more efficient, and those ideas could be explored in the context of a low-rank strategy.
    [bibtex-key = C:LaBRI::CIMI15] [bibtex-entry]


  56. X. Lacoste, M. Faverge, and P. Ramet. A task-based sparse direct solver suited for large scale hierarchical/heterogeneous architectures. In Mini-Symposium on Task-based Scientific Computing Applications at SIAM CSE'15 conference, Salt Lake City, USA, March 2015. Keyword(s): Sparse. [bibtex-key = C:LaBRI::cse15a] [bibtex-entry]


  57. S. Moustafa, M. Faverge, L. Plagne, and P. Ramet. 3D Cartesian Transport Sweep for Massively Parallel Architectures with PARSEC. In 29th IEEE International Parallel & Distributed Processing Symposium, IPDPS'15, Hyderabad, India, pages 581-590, May 2015. ISSN: 1530-2075. Keyword(s): Neutron. [bibtex-key = moustafa:hal-01078362] [bibtex-entry]


  58. G. Pichon, A. Haidar, M. Faverge, and J. Kurzak. Divide and Conquer Symmetric Tridiagonal Eigensolver for Multicore Architectures. In IEEE International Parallel & Distributed Processing Symposium (IPDPS 2015), Hyderabad, India, May 2015. [bibtex-key = pichon:hal-01078356] [bibtex-entry]


  59. P. Ramet. On the design of parallel linear solvers for large scale problems. In Mini-Symposium on Recent advances in matrix computations for extreme-scale computers at ICIAM'15 conference, Pekin, China, August 2015. Keyword(s): Sparse.
    Abstract:
    In this talk we will discuss our research activities on the design of parallel linear solvers for large scale problems that range from dense linear algebra, to parallel sparse direct solver and hybrid iterative-direct approaches. In particular we will describe the implementations designed on top of runtime systems that should provide both code and performance portabilities. Finally, we will present some preliminary results on the integration of h-matrice kernels in our sparse direct solver framework.
    [bibtex-key = C:LaBRI::iciam15a] [bibtex-entry]


  60. E. Agullo, M. Faverge, L. Giraud, A. Guermouche, P. Ramet, and J. Roman. Toward parallel scalable linear solvers suited for large scale hierarchical parallel platforms. In WCCM-ECCM-ECFD workshop on Enabling Technologies and their Application for Advancing Computational Mechanics, Barcelona, Spain, July 2014. Keyword(s): Sparse. [bibtex-key = C:LaBRI::ecfd14a] [bibtex-entry]


  61. A. Casadei, P. Ramet, and J. Roman. An improved recursive graph bipartitioning algorithm for well balanced domain decomposition. In 21st IEEE International Conference on High Performance Computing (HiPC), Goa, India, pages 1-10, December 2014. Keyword(s): Sparse.
    Abstract:
    In the context of hybrid sparse linear solvers based on domain decomposition and Schur complement approaches, getting a domain decomposition tool leading to a good balancing of both the internal node set size and the interface node set size for all the domains is a critical point for load balancing and efficiency issues in a parallel computation context. For this purpose, we revisit the original algorithm introduced by Lipton, Rose and Tarjan which performed the recursion for nested dissection in a particular manner. From this specific recursive strategy, we propose in this paper several variations of the existing algorithms in the multilevel Scotch partitioner that take into account these multiple criteria and we illustrate the improved results on a collection of graphs corresponding to finite element meshes used in numerical scientific applications.
    [bibtex-key = C:LaBRI::hipc14] [bibtex-entry]


  62. A. Casadei, P. Ramet, and J. Roman. Nested Dissection with Balanced Halo. In SIAM Workshop on Combinatorial Scientific Computing, Lyon, France, July 2014. Keyword(s): Sparse. [bibtex-key = C:LaBRI::CSC14] [bibtex-entry]


  63. C. Dudley, E. Darve, S. Ambikasaran, and A. H. Aminfar. Fast Algorithms for Dense Linear Algebra. In Proceedings of PMAA'2014, Lugano, Swiss, July 2014. Keyword(s): Sparse.
    Abstract:
    In recent years there has been a resurgence in direct methods to solve linear systems. These methods can have many advantages compared to iterative solvers; in particular their accuracy and performance is less sensitive to the distribution of eigenvalues. However, they typically have a larger computational cost in cases where iterative solvers converge in few iterations. We will discuss a recent trend of methods that address this cost and can make these direct solvers competitive. Techniques involved include hierarchical matrices, hierarchically semi-separable matrices, fast multipole method, etc.
    [bibtex-key = C:LaBRI::PMAA2014] [bibtex-entry]


  64. C. Dudley, E. Darve, S. Ambikasaran, and A. H. Aminfar. Fast direct linear solvers for the boundary element method. In WCCM-ECCM-ECFD workshop on Fast Direct Solvers, Barcelona, Spain, July 2014. Keyword(s): Sparse. [bibtex-key = C:LaBRI::ecfd14b] [bibtex-entry]


  65. X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca. Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. In Proceedings of HCW'2014 workshop of IPDPS, Phoenix, USA, pages 29-38, May 2014. Keyword(s): Sparse.
    Abstract:
    The ongoing hardware evolution exhibits an es- calation in the number, as well as in the heterogeneity, of computing resources. The pressure to maintain reasonable levels of performance and portability forces application de- velopers to leave the traditional programming paradigms and explore alternative solutions. PASTIX is a parallel sparse direct solver, based on a dynamic scheduler for modern hierarchical manycore architectures. In this paper, we study the benefits and limits of replacing the highly specialized internal scheduler of the PASTIX solver with two generic runtime systems: PARSEC and STARPU. The tasks graph of the factorization step is made available to the two runtimes, providing them the opportunity to process and optimize its traversal in order to maximize the algorithm efficiency for the targeted hardware platform. A comparative study of the performance of the PASTIX solver on top of its native internal scheduler, PARSEC, and STARPU frameworks, on different execution environments, is performed. The analysis highlights that these generic task-based runtimes achieve comparable results to the application-optimized embed- ded scheduler on homogeneous platforms. Furthermore, they are able to significantly speed up the solver on heterogeneous environments by taking advantage of the accelerators while hiding the complexity of their efficient manipulation from the programmer.
    [bibtex-key = C:LaBRI::hcw14] [bibtex-entry]


  66. S. Moustafa, M. Faverge, L. Plagne, and P. Ramet. Parallel 3D Sweep Kernel with PARSEC. In 16th IEEE International Conference on High Performance and Communications, workshop on HPC-CFD in Energy/Transport Domains, Paris, France, August 2014. Keyword(s): Neutron. [bibtex-key = C:LaBRI::sweep2013] [bibtex-entry]


  67. A. Casadei, L. Giraud, P. Ramet, and J. Roman. Towards Domain Decomposition with Balanced Halo. In Workshop Celebrating 40 Years of Nested Dissection, Waterloo, Canada, July 2013. Keyword(s): Sparse. [bibtex-key = C:LaBRI::ND40a] [bibtex-entry]


  68. O. Coulaud, L. Giraud, P. Ramet, and X. Vasseur. Augmentation and Deflation in Krylov subspace methods. In Proceedings of PARENG'2013, Pecs, Hungary, March 2013. [bibtex-key = C:LaBRI::PARENG13] [bibtex-entry]


  69. X. Lacoste. Work stealing and granularity optimizations for a sparse solver on manycores. In Sparse Days, Toulouse, France, June 2013. Keyword(s): Sparse. [bibtex-key = C:LaBRI::sparsedays2013] [bibtex-entry]


  70. X. Lacoste, M. Faverge, and P. Ramet. Sparse Linear Algebra over DAG Runtimes. In SIAM Conference on Computation Science and Engineering, Boston, USA, February 2013. Keyword(s): Sparse. [bibtex-key = C:LaBRI::siam2013] [bibtex-entry]


  71. S. Moustafa, I. Dutka-Malen, L. Plagne, A. Poncot, and P. Ramet. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver. In Joint International Conference on Supercomputing in Nuclear Applications + Monte Carlo, Paris, France, October 2013. Keyword(s): Neutron. [bibtex-key = C:LaBRI::neutron2013] [bibtex-entry]


  72. P. Ramet. From hybrid architectures to hybrid solvers. In Workshop Celebrating 40 Years of Nested Dissection, Waterloo, Canada, July 2013. Keyword(s): Sparse. [bibtex-key = C:LaBRI::ND40b] [bibtex-entry]


  73. G. Bosilca, M. Faverge, X. Lacoste, I. Yamazaki, and P. Ramet. Toward a supernodal sparse direct solver over DAG runtimes. In Proceedings of PMAA'2012, Londres, UK, June 2012. Keyword(s): Sparse.
    Abstract:
    The current trend in the high performance computing shows a dramatic increase in the number of cores on the shared memory compute nodes. Algorithms, especially those related to linear algebra, need to be adapted to these new computer architectures in order to be efficient. PaStiX is a sparse parallel direct solver, that incorporates a dynamic scheduler for strongly hierarchical modern architectures. In this work, we study the replacement of this internal highly integrated scheduling strategy by two generic runtime frameworks: DAGuE and StarPU. Those runtimes will give the opportunity to execute the factorization tasks graph on emerging computers equipped with accelerators. As for previous work done in dense linear algebra, we will present the kernels used for GPU computations inspired by the MAGMA library and the DAG algorithm used with those two runtimes. A comparative study of the performances of the supernodal solver with the three different schedulers is performed on ma nycore architectures and the improvements obtained with accelerators will be presented with the StarPU runtime. These results demonstrate that these DAG runtimes provide uniform programming interfaces to obtain high performance on different architectures on irregular problems as sparse direct factorizations.
    [bibtex-key = C:LaBRI::PMAA2012] [bibtex-entry]


  74. A. Casadei and P. Ramet. Memory Optimization to Build a Schur Complement. In SIAM Conference on Applied Linear Algebra, Valence, Spain, June 2012. Keyword(s): Sparse. [bibtex-key = C:LaBRI::la12a] [bibtex-entry]


  75. M. Faverge and P. Ramet. Fine Grain Scheduling for Sparse Solver on Manycore Architectures. In 15th SIAM Conference on Parallel Processing for Scientific Computing, Savannah, USA, February 2012. Keyword(s): Sparse.
    Abstract:
    The emergence of many-cores architectures introduces variations in computation costs, which makes precise cost models hard to realize. Static schedulers based on cost models, like the one used in the sparse direct solver extsc{PaStiX}, are no longer adapted. We describe the dynamic scheduler developed for the super-nodal method of extsc{PaStiX} to correct the imperfections of the static model. The solution presented exploit the elimination tree of the problem to keep the data locality during the execution.
    [bibtex-key = C:LaBRI::siam2012] [bibtex-entry]


  76. X. Lacoste and P. Ramet. Sparse direct solver on top of large-scale multicore systems with GPU accelerators. In SIAM Conference on Applied Linear Algebra, Valence, Spain, June 2012. Keyword(s): Sparse. [bibtex-key = C:LaBRI::la12b] [bibtex-entry]


  77. Y. Suzuki, N. Kushida, T. Tatekawa, N. Teshima, Y. Caniou, R. Guivarch, M. Dayde, and P. Ramet. Development of an International Matrix-Solver Prediction System on a French-Japanese International Grid Computing Environment. In Joint International Conference on Supercomputing in Nuclear Applications and Monte Carlo 2010 (SNA + MC2010), Tokyo, Japan, October 2010. [bibtex-key = C:LaBRI::sna-mc-2010] [bibtex-entry]


  78. M. Barrault, B. Lathuilière, P. Ramet, and J. Roman. A Non Overlapping Parallel Domain Decomposition Method Applied to The Simplified Transport Equations. In International Conference on Mathematics, Computational Methods & Reactor Physics, New-York, USA, May 2009. Keyword(s): Neutron. [bibtex-key = C:LaBRI::neutron2009] [bibtex-entry]


  79. M. Faverge. A NUMA Aware Scheduler for a Parallel Sparse Direct Solver. In Journées Informatique Massivement Multiprocesseur et Multicoeur, Rocquencourt, France, February 2009. Keyword(s): Sparse. [bibtex-key = c:LaBRI::i3m] [bibtex-entry]


  80. M. Faverge. Dynamic Scheduling for Sparse Direct Solver on NUMA and Multicore Architectures. In Sparse Days, Toulouse, France, June 2009. Keyword(s): Sparse. [bibtex-key = C:LaBRI::sparsedays2009] [bibtex-entry]


  81. M. Faverge. Vers un solveur de systèmes linéaires creux adapté aux machines NUMA. In ACTES RenPar'2009, Toulouse, France, September 2009. Keyword(s): Sparse. [bibtex-key = c:LaBRI::renpar09] [bibtex-entry]


  82. G. Huysmans, Pamela S., E. van der Plas, and P. Ramet. Non-Linear MHD simulations of Edge Localised Modes. In 36th EPS Plasma Physics Conference, Sofia, Bulgaria, June 2009. Keyword(s): Fusion. [bibtex-key = C:LaBRI::EPS2009] [bibtex-entry]


  83. P. Ramet. Dynamic Scheduling for Sparse Direct Solver on NUMA and Multicore Architectures. In ComplexHPC meeting, Lisbon, Portugal, October 2009. Keyword(s): Sparse. [bibtex-key = C:LaBRI::COST2009] [bibtex-entry]


  84. R. Abgrall, O. Coulaud, P. Hénon, Huart R., Huysmans G., Latu G., B. Nkonga, Pamela S., and P. Ramet. Numerical simulation of tokamak plasmas. In 7th PAMIR International Conference on Fundamental and Applied MHD, Presqu'ile de Giens, France, September 2008. Keyword(s): Fusion. [bibtex-key = C:LaBRI::fusion2008] [bibtex-entry]


  85. M. Barrault, B. Lathuilière, P. Ramet, and J. Roman. A Domain Decomposition Method Applied to Large Eigenvalue Problems in Neutron Physics. In Proceedings of PMAA'2008, Neuchatel, Swiss, June 2008. Keyword(s): Neutron.
    Abstract:
    The simulation of the neutron transport inside a nuclear reactor leads to the computation of the lowest eigen pair of a simplified transport operator. This computation is done by a power inverse algorithm accelerated by a Chebyshev polynomials based process. At each iteration, a large linear system is solved inexactly by a block Gauss-Seidel algorithm. For our applications, one Gauss-Seidel iteration is already sufficient to ensure the right convergence of the inverse power algorithm. For the approximate resolution of the linear system at each inverse power iteration, we propose a non overlapping domain decomposition based on the introduction of Lagrange multipliers in order to: - get a parallel algorithm, which allows to circumvent memory consumption problem and to reduce the computational time; - deal with different numerical approximations in each subdomain; - minimize the code modifications in our industrial solver. When the Chebyshev acceleration process is switched off, the method performs well on up to 100 processors for an industrial test case. It exhibits a good efficiency which allows us to realize some computations beyond the reach of standard workstations. Besides, we study the efficiency of the Chebyshev acceleration process in our domain decomposition method.
    [bibtex-key = C:LaBRI::PMAA2008a] [bibtex-entry]


  86. M. Barrault, B. Lathuilière, P. Ramet, and J. Roman. A domain decomposition method applied to the simplified transport equations. In IEEE 11th International Conference on Computational Science and Engineering, Sao Paulo, Brazil, pages 91-97, July 2008. Keyword(s): Neutron.
    Abstract:
    The simulation of the neutron transport inside a nuclear reactor leads to the computation of the lowest eigen pair of a simplified transport operator. Whereas the sequential solution at our disposal today is really efficient, we are not able to run some industrial cases due to the memory consumption and the computational time. This problem brings us to study parallel strategies. In order to re-use an important part of the solver and to bypass some limitations of conforming cartesian meshes, we propose a non overlapping domain decomposition based on the introduction of Lagrange multipliers. The method performs well on up to $100$ processors for an industrial test case.
    [bibtex-key = C:LaBRI::Neutron2008] [bibtex-entry]


  87. M. Barrault, B. Lathuilière, P. Ramet, and J. Roman. A domain decomposition method for the resolution of an eigenvalue problem in neutron physics. In International Symposium on Iterative Methods in Scientific Computing (IMACS), Lille, France, March 2008. Keyword(s): Neutron. [bibtex-key = C:LaBRI::neutron2008] [bibtex-entry]


  88. Y. Caniou, J.-S. Gay, and P. Ramet. Tunable parallel experiments in a GridRPC framework: application to linear solvers. In VECPAR'08, 8th International Meeting High Performance Computing for Computational Science, volume 5336 of LNCS, Toulouse, France, pages 430-436, June 2008. Springer Verlag.
    Abstract:
    The use of scientific computing centers becomes more and more difficult on modern parallel architectures. Users must face a large variety of batch systems (with their own specific syntax) and have to set many parameters to tune their applications (e.g., processors and/or threads mapping, memory resource constraints). Moreover, finding the optimal performance is not the only criteria when a pool of jobs is submitted on the Grid (for numerical parametric analysis for instance) and one must focus on the wall-time completion. In this work we tackle the problem by using the D IET Grid middleware that integrates an adaptable PaStiX service to solve a set of experiments issued from the simulations of the ASTER project.
    [bibtex-key = C:LaBRI::vecpar08-diet] [bibtex-entry]


  89. M. Faverge, X. Lacoste, and P. Ramet. A NUMA Aware Scheduler for a Parallel Sparse Direct Solver. In Proceedings of PMAA'2008, Neuchatel, Swiss, June 2008. Keyword(s): Sparse.
    Abstract:
    Over the past few years, parallel sparse direct solvers made significant progress and are now able to solve efficiently industrial three-dimensional problems with several millions of unknowns. An hybrid MPI-thread implementation of our direct solver PaStiX is already well suited for SMP nodes or new multi-core architectures and drastically reduced the memory overhead and improved scalability. In the context of distributed NUMA architectures, a dynamic scheduler based on a work-stealing algorithm has been developed to fill in communication idle times. On these architectures, it is important to take care of NUMA effects and to preserve memory affinity during the work-stealing. The scheduling of communications also needs to be adapted, especially to ensure the overlap by computations. Experiments on numerical test cases will be presented to prove the efficiency of the approach on NUMA architectures. If memory is not large enough to treat a given problem, disks must be used to store data that cannot fit in memory (out-of-core storage). The idle-times due to disk access have to be managed by our dynamic scheduler to prefetch and save datasets. Thus, we design and study specific scheduling algorithms in this particular context.
    [bibtex-key = C:LaBRI::PMAA2008b] [bibtex-entry]


  90. M. Faverge and P. Ramet. Dynamic Scheduling for sparse direct Solver on NUMA architectures. In Proceedings of PARA'2008, Trondheim, Norway, May 2008. Keyword(s): Sparse.
    Abstract:
    Over the past few years, parallel sparse direct solvers made significant progress and are now able to efficiently work on problems with several millions of equations. This paper presents some improvements on our sparse direct solver PaStiX1 for distributed Non-Uniform Memory Access architectures. We show results on two preliminary works: a memory allocation scheme more adapted to these architectures and a better overlap of communication by computation. We also present a dynamic scheduler that takes care of memory affinity and data locality.
    [bibtex-key = C:LaBRI::para08] [bibtex-entry]


  91. G. Huysmans, R. Abgrall, M. Becoulet, Huart R., B. and Nkonga, Pamela S., and P. Ramet. Non-Linear MHD code developement for ELM simulations. In Poster session, 35th EPS Plasma Physics Conference, Hersonissos, Greece, June 2008. Keyword(s): Fusion. [bibtex-key = C:LaBRI::EPS2008] [bibtex-entry]


  92. N. Kushida, Y. Suzuki, N. Teshima, N. Nakajima, Y. Caniou, M. Dayde, and P. Ramet. Toward an International Sparse Linear Algebra Expert System by Interconnecting the ITBL Computational Grid with the Grid-TLSE Platform. In VECPAR'08, 8th International Meeting High Performance Computing for Computational Science, volume 5336 of LNCS, Toulouse, France, pages 424-429, June 2008. Springer Verlag. [bibtex-key = C:LaBRI::vecpar08-redimps] [bibtex-entry]


  93. O. Czarny, G. Huysmans, P. Hénon, and P. Ramet. Improvement of existing solvers for the simulation of MHD instabilities. In Numerical flow models for controlled fusion, Porquerolles, France, April 2007. Keyword(s): Fusion. [bibtex-key = C:LaBRI::fusion2007] [bibtex-entry]


  94. P. Hénon, P. Ramet, and J. Roman. A supernode amalgamation algorithm for an efficient block incomplete factorization. In Proceedings of PPAM'2007, CTPSM07 Workshop, Gdansk, Poland, September 2007. Keyword(s): Sparse. [bibtex-key = C:LaBRI::PPAM07] [bibtex-entry]


  95. P. Ramet. High performances methods for solving large sparse linear systems - Direct and Incomplete Factorization. In Second NExt Grid Systems and Techniques, REDIMSPS Workshop, Tokyo, Japan, May 2007. Keyword(s): Sparse. [bibtex-key = C:LaBRI::REDIMSOPS] [bibtex-entry]


  96. B. Braconnier, B. Nkonga, M. Papin, P. Ramet, Ricchiuto M., J. Roman, and R. Abgrall. Efficient solution technique for low Mach number compressible multiphase problems. In Proceedings of PMAA'2006, Rennes, France, September 2006. [bibtex-key = C:LaBRI::PMAA2006b] [bibtex-entry]


  97. P. Hénon, P. Ramet, and J. Roman. On finding approximate supernodes for an efficient ILU(k) factorization. In Proceedings of PMAA'2006, Rennes, France, September 2006. Keyword(s): Sparse. [bibtex-key = C:LaBRI::PMAA2006a] [bibtex-entry]


  98. P. Hénon, P. Ramet, and J. Roman. Partitioning and Blocking Issues for a Parallel Incomplete Factorization. In Proceedings of PARA'2006, volume 4699 of LNCS, Umea, Sweden, pages 929-937, June 2006. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    The purpose of this work is to provide a method which exploits the parallel block-wise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust and efficient preconditioners based on a parallel incomplete factorization.
    [bibtex-key = C:LaBRI::para2006] [bibtex-entry]


  99. P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. Blocking Issues for an Efficient Parallel Block ILU Preconditioner. In International SIAM Conference On Preconditioning Techniques For Large Sparse Matrix Problems In Scientific And Industrial Applications, Atlanta, USA, May 2005. Keyword(s): Sparse. [bibtex-key = C:LaBRI::pre05] [bibtex-entry]


  100. P. Hénon, P. Ramet, and J. Roman. On using an hybrid MPI-Thread programming for the implementation of a parallel sparse direct solver on a network of SMP nodes. In Proceedings of Sixth International Conference on Parallel Processing and Applied Mathematics, Workshop HPC Linear Algebra, volume 3911 of LNCS, Poznan, Poland, pages 1050-1057, September 2005. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    Since the last decade, most of the supercomputer architectures are based on cluster of SMP nodes. In those architectures the exchanges between processors are made through shared memory when the processor are located on a same SMP node and through the network otherwise. Generally, the MPI implementations provided by the constructor on those machines are adapted to this situation and take advantage of the share memory to treat messages between processors in a same SMP node. Nevertheless, this transparent approach to exploit shared memory do not avoid the storage of buffers needed in asynchronous communications. In the parallel direct solvers the storage of these buffers can become a bottleneck. In this paper, we propose an hybrid thread-MPI implementation of a direct solver and analyse the benefits of this approach in term of memory and run-time performances.
    [bibtex-key = C:LaBRI::ppam05] [bibtex-entry]


  101. P. Hénon, B. Nkonga, P. Ramet, and J. Roman. Using of the High Performance Sparse Solver PaStiX for the Complex Multiscale 3D Simulations performed by the FluidBox Fluid Mechanics Software. In Proceedings of PMAA'2004, Marseille, France, October 2004. Keyword(s): Sparse.
    Abstract:
    In this paper, we consider a hyperbolic system with multiple time step characteristics. Such a situation arises for example in combustion problems when the acoustic time is small compared to the characteristic time associated to the flame propagation. The problems investigated in this paper are characterized by a small Mach number. At the asymptotic limit, the initial hyperbolic system degenerates to an elliptic problem. Therefore, numerical methods proposed with the assumption of hyperbolicity of the system becomes hill conditioned at this limit. As a consequence, the iterative methods used in the numerical algorithm implemented in the software FluidBox, have a worse convergence behavior. Some physical preconditioning has been proposed to overcome this difficulty. However, in the context of parallel computing, a global preconditioning is unavoidable for performance efficiency. The parallelization of FluidBox relies on a domain decomposition. A first version of FluidBox was using a block Jacobi or a block Gauss-Seidel preconditioner that are easily implementable in this framework. But to solve 3D problems up to several millions of unknowns on numerous processors, this kind of preconditioner becomes inefficient du to their lack of scalability and robustness. Hence, a collaboration inside the INRIA ScAlApplix project has been setup to use the high performance solver library PaStiX that provides both complete and incomplete factorizations on clusters of SMP nodes to solve large scale computations. The aim of this work is then to investigate the performance of the combination of FluidBox and PaStiX (both developped in the INRIA ScAlApplix project) and also present the parallel assembly algorithm that allows a good load balance in this context.
    [bibtex-key = C:LaBRI::PMAA2004b] [bibtex-entry]


  102. P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and Y. Saad. Applying parallel direct solver skills to build robust and highly performant preconditioners. In Proceedings of PARA'2004, volume 3732 of LNCS, Copenhagen, Denmark, pages 601-619, June 2004. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    The purpose of our work is to provide a method which exploits the parallel blockwise algorithmic approach used in the framework of high performance sparse direct solvers in order to develop robust preconditioners based on a parallel incomplete factorization. The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers.
    [bibtex-key = C:LaBRI::para2004] [bibtex-entry]


  103. P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and Y. Saad. High Performance Complete and Incomplete Factorizations for Very Large Sparse Systems by using Scotch and PaStiX softwares. In Eleventh SIAM Conference on Parallel Processing for Scientific Computing, San Francisco, USA, February 2004. Keyword(s): Sparse.
    Abstract:
    PaStiX is a scientific library that provides a high performance direct supernodal solver for very large sparse linear systems. It relies on a block factorization based on an hybrid ordering (Nested Dissection + Halo Approximate Minimum Degree) obtained using the Scotch library. Efficient static scheduling and memory management are used to solve irregular problems with more of 25 millions unknowns on clusters of SMP nodes. In order to solve larger 3D problems, we apply these blockwise algorithms to compute robust and efficient parallel ILU preconditioners.
    [bibtex-key = C:LaBRI::ppsc2004a] [bibtex-entry]


  104. P. Hénon, P. Ramet, and J. Roman. A Blockwise Algorithm for Parallel Incomplete Cholesky Factorization. In Proceedings of PMAA'2004, Marseille, France, October 2004. Keyword(s): Sparse.
    Abstract:
    Solving large sparse linear systems by iterative methods has often been quite unsatisfactory when dealing with pratical "industrial" problems. The main difficulty encountered by such methods is their lack of robustness and, generally, the unpredictability and unconsistency of their performance over a wide sample of different problems; certain methods work quite well for certain types of problems but can fail completely on other problems. Over the past few years, direct methods have made significant progress thanks to either the combinatorial analysis of the Gaussian elimination process and the parallel algorithmic of blockwise solvers optimized for modern parallel supercomputers. Its is now possible to solve practical three-dimensional problems in the order of several millions of equations in a very powerful way with the direct solvers that efficiently use the superscalar effects of modern processors. However, direct methods may fail to solve very large three-dimensional problems, due to the large amount of memory needed for these cases. In our work, we consider an approach which, we hope, will bridge the gap between the two classes of methods. The goal is to provide a method which exploits the parallel blockwise algorithmic used in the framework of the high performance sparse direct solvers for developping robust parallel incomplete factorization based preconditioners for iterative solvers. The idea is then to define an adaptive blockwise incomplete factorization that is much more accurate (and numerically more robust) than the scalar incomplete factorizations commonly used to precondition iterative solvers. Our approach consists in computing symbolically the block structure of the factors that would have been obtained with a complete factorization, and then deciding to drop off some blocks of this structure according to relevant criterions. Such incomplete factorization can take advantage of the latest breakthroughts in sparse direct methods and therefore be very competitive in CPU time while avoiding the memory limitation encountered by direct methods. By this way, we expect to be able to solve systems in the order of hundred millions of unknowns.
    [bibtex-key = C:LaBRI::PMAA2004a] [bibtex-entry]


  105. O. Beaumont, P. Ramet, and J. Roman. Asymptotically optimal algorithm for Laplace task graphs on heterogeneous platforms. In Proceedings of Fifth International Conference on Parallel Processing and Applied Mathematics, Workshop HeteroPar, volume 3019 of LNCS, Czestochowa, Poland, pages 880-887, September 2003. Springer Verlag. Keyword(s): Overlap.
    Abstract:
    In this paper, we focus on the scheduling of Laplace task graph on a general platform where both communication links and processing units are heterogeneous. In this context, it is known that deriving optimal algorithm, in the sense of makespan minimization, is NP-Complete, and several inapproximation results have been proved. Nevertheless, we provide an asymtotically optimal algorithm in this general context. Moreover, we expect that this methodolgy can be extended to more general task graphs, especially for nested loops where the inner-most loop is parallel.
    [bibtex-key = C:LaBRI::ppam03] [bibtex-entry]


  106. D. Goudin, P. Hénon, M. Mandallena, K. Mer, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Outils numériques parallèles pour la résolution de très grands problèmes d'électromagnétisme. In Séminaire sur l'Algorithmique Numérique Appliquée aux Problèmes Industriels, Calais, France, May 2003. Keyword(s): Sparse.
    Abstract:
    L'\'emergence des machines parall\`eles t\'eraflopiques permet maintenant de traiter des probl\`emes de diffraction de tr\`es grande taille li\'es \`a des calculs hautes fr\'equences par des m\'ethodes exactes. Une premi\`ere \'etape de ce travail a constist\'e \`a parall\'eliser des codes s\'equentiels existants qui utilisaient des m\'ethodes num\'eriques classiques. Ces codes \'etaient bas\'es sur un couplage fort par compl\'ement de Schur entre une formulation int\'egrale (\'el\'ements-finis surfaciques pour traiter la condition de radiation \`a l'infini) et une formulation \'el\'ements-finis volumiques (pour calculer les champs \`a l'int\'erieur de l'objet). L'adapation au parall\'elisme a consist\'e \`a parall\'eliser la factorisation du syst\`eme lin\'eaire creux (\`a l'aide de la biblioth\'eque EMILIO/PaStiX, la r\'esolution du syst\`eme int\'egral plein (sym\'etrique, complexe) et le couplage par compl\'ement de Schur (consistant en des descentes-remont\'ees sur le syst\`eme creux associ\'e). Les deux premi\`eres \'etapes cit\'ees ont donn\'e pleinement satisfaction, mais le compl\'ement de Schur, \`a cause du tr\'es grand nombre descentes-remont\'ees li\'ees aux inconnues surfaciques, limite l'efficacit\'e globale de cette approche \`a des probl\`emes de taille interm\'ediaire. Par exemple, pour le cas test du ``Sphere-Cone'' comportant $3.10^6$ d'inconnues volumiques et $23.10^3$ inconnues surfaciques, sur 32 n\oe uds ES45 (quadri-processeurs), l'assemblage volumique repr\'esente 6s, la factorisation du syst\`eme creux est obtenue en 376s et le compl\'ement de Schur en 21500s (en complexe double pr\'ecision). Pour d\'epasser ces limites, une \'evolution globale de l'algorithme a \'et\'e n\'ecessaire~: - un couplage it\'eratif entre les r\'esolutions surfacique et volumique a \'et\'e mis en place; - pour augmenter le nombre d'inconnues surfaciques une m\'ethode multip\^ole multi-niveaux parall\'ele a \'et\'e d\'evelopp\'ee; - enfin, pour atteindre des tailles de probl\`eme de plusieurs dizaines de millions d'inconnues volumiques, des \'evolutions ont \'et\'e apport\'ees dans la biblioth\`eque EMILIO/PaStiX. En particulier, une algorithmique sp\'ecifique a \'et\'e mise en place pour utiliser au mieux les clusters de n\oe uds SMP et pour pr\'eserver une scalabilit\'e m\'emoire raisonnable. Enfin, le sch\'ema actuel pr\'esente un couplage direct-it\'eratif \`a plusieurs niveaux. L'\'evolution envisag\'ee sera un couplage it\'eratif-it\'eratif, puis sera bas\'ee sur des m\'ethodes hybrides globales (interm\'ediaire entre le direct et l'it\'eratif)~: une version de PaStiX \'evolu\'ee permettra de calculer un pr\'econdition\-neur par bloc ILU(k) parall\`ele.
    [bibtex-key = c:LaBRI::GHMMPRRP03] [bibtex-entry]


  107. P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. An efficient hybrid MPI/Thread implementation on a network of SMP nodes for the parallel sparse direct solver PaStix: ordering / scheduling / memory managment / out-of-core issues, and application to preconditioning. In Sparse Days, Saint Girons, France, June 2003. Keyword(s): Sparse. [bibtex-key = C:LaBRI::sparsedays2003] [bibtex-entry]


  108. P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. Towards High Performance Hybrid Direct-Iterative Solvers for Large Sparse Systems. In International SIAM Conference On Preconditioning Techniques For Large Sparse Matrix Problems In Scientific And Industrial Applications, Napa Valley, USA, October 2003. Keyword(s): Sparse. [bibtex-key = C:LaBRI::pre03] [bibtex-entry]


  109. P. Hénon, P. Ramet, and J. Roman. Efficient algorithms for direct resolution of large sparse system on clusters of SMP nodes. In SIAM Conference on Applied Linear Algebra, Williamsburg, USA, July 2003. Keyword(s): Sparse.
    Abstract:
    In previous works, we have described an efficient static scheduling based on a mixed 1D/2D block distribution with local aggregation for a parallel supernodal version of sparse $LL^{T}$ factorization. In this paper, we present new algorithms suitable for overcoming architectures based on clusters of SMP nodes and also technics to keep a good memory scalability. This algorithms are implemented in the PaStiX library that achieved high performances (resolution of a $26.10^6$ unknown system on 192 ES45 quadriprocessors with 35 percent of peak performance).
    [bibtex-key = C:LaBRI::siam2003] [bibtex-entry]


  110. P. Hénon and Y. Saad. A Parallel ILU factorization based on a Hierarchical Interface Decomposition algorithm. In International SIAM Conference On Preconditioning Techniques For Large Sparse Matrix Problems In Scientific And Industrial Applications, Napa Valley, USA, October 2003. Keyword(s): Sparse. [bibtex-key = C:LaBRI::pre03ph] [bibtex-entry]


  111. P. Hénon and Y. Saad. PHIDAL: A Parallel Hierarchical Interface Decomposition Algorithm for solving sparse linear systems. In Sparse Days and Grid Computing, Saint Girons, France, June 2003. Keyword(s): Sparse. [bibtex-key = C:LaBRI::sparsedays2003ph] [bibtex-entry]


  112. O. Beaumont, V. Boudet, F. Desprez, P. Ramet, J. Roman, and C. Travers. Modélisation de pipelines hétérogènes. In GRID'2002, Aussois, France, December 2002. Keyword(s): Overlap. [bibtex-key = c:LaBRI::grid2002] [bibtex-entry]


  113. O. Coulaud, M. Dussère, P. Hénon, and J. Roman. Optimisation of a kinetic laser-plasma interaction code for massively parallel systems. In Proceedings of PMAA'2002, Neuchatel, Swiss, pages 249-275, November 2002. [bibtex-key = C:LaBRI::PMAA2002b] [bibtex-entry]


  114. P. Hénon and P. Ramet. Optimisation de l'occupation mémoire pour un solveur parallèle creux direct hautes performances de type supernodal. In ACTES RenPar'2002, Hamamet, Tunisia, April 2002. Keyword(s): Sparse.
    Abstract:
    Cet article d\'ecrit une partie de nos travaux sur la r\'esolution directe parall\`ele de grands syst\`emes lin\'eaires creux. Il pr\'esente sa mise en \oe uvre sur des architectures \`a base de n\oe uds SMP, et plus particuli\`erement des techniques de gestion efficace de la m\'emoire.
    [bibtex-key = c:LaBRI::HR02] [bibtex-entry]


  115. P. Hénon, P. Ramet, and J. Roman. Parallel factorization of very large sparse SPD systems on a network of SMP nodes. In Proceedings of PMAA'2002, Neuchatel, Swiss, November 2002. Keyword(s): Sparse. [bibtex-key = C:LaBRI::PMAA2002a] [bibtex-entry]


  116. P. Hénon and P. Ramet. PaStiX: Un solveur parallèle direct pour des matrices creuses symétriques définies positives basé sur un ordonnancement statique performant et sur une gestion mémoire efficace. In ACTES RenPar'2001, Paris, France, April 2001. Keyword(s): Sparse.
    Abstract:
    La r\'esolution de grands syst\`emes lin\'eaires creux est un point crucial dans de nombreuses applications industrielles et scientifiques. Notre travail porte sur le partionnement et la distribution de grandes matrices creuses pour la factorisation $LDL^t$ en parall\`ele sur machine de type MIMD. Nous pr\'esentons dans cet article notre technique de factorisation parall\`ele bas\'ee sur un ordonnancement statique des calculs et des communications, et nous la validons sur des syst\`emes de plus d'un million d'inconnues pour des probl\`emes d'\'el\'ements finis 3D.
    [bibtex-key = c:LaBRI::HR01] [bibtex-entry]


  117. P. Hénon, P. Ramet, and J. Roman. PaStiX: A Parallel Direct Solver for Sparse SPD Matrices based on Efficient Static Scheduling and Memory Managment. In Tenth SIAM Conference on Parallel Processing for Scientific Computing, Portsmouth, USA, March 2001. Keyword(s): Sparse.
    Abstract:
    Solving large sparse symmetric positive definite systems of linear equations is a crucial and time-consuming step, arising in many scientific and engineering applications. In this work, we consider the block partitioning and scheduling problem for sparse parallel factorization without pivoting. We focus on the scalability of the parallel solver, and on the compromise between memory overhead and efficiency. We validate this study with parallel experiments on a large collection of irregular industrial problems.
    [bibtex-key = C:LaBRI::siam2001] [bibtex-entry]


  118. D. Goudin. Assemblage parallèle d'une matrice et/ou d'un second membre: Application à la Parallélisation d'un Code de Mécanique des Structures. In ACTES RenPar'2000, Besancon, France, 2000. Keyword(s): Sparse.
    Abstract:
    Cet article d\'ecrit un algorithme d'assemblage parall\`ele pouvant \^etre utilis\'e lors de la phase de r\'esolution de grands syst\'emes lin\'eaires creux. La premi\`ere partie traite des probl\'emes pos\`es par un code de m\'ecanique des structures nomm\'e OSSAU d\'evelopp\'e au CEA. La deuxi\'eme partie est consacr\'ee \`a la description de notre algorithme et \`a son int\'egration dans la cha\^ine logicielle EMILIO, d\'evelopp\'ee au LaBRI sein du th\'eme ALiENor.
    [bibtex-key = c:LaBRI::dg2ka] [bibtex-entry]


  119. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. Résolution parallèle de grands systèmes linéaires creux. In Proceedings of JSFT'2000, Monastir, Tunisia, October 2000. Keyword(s): Sparse.
    Abstract:
    Cet article pr\'esente un tour d'horizon concernant les principes et techniques pouvant \^etre utilis\'ees pour la r\'esolution de grands syst\`emes lin\'eaires creux. Il pr\'esente en particulier les travaux r\'ealis\'es au LaBRI au sein du th\`eme ALiENor consacr\'es \`a la r\'esolution parall\`ele haute performance par m\'ethodes directes.
    [bibtex-key = c:LaBRI::jsft] [bibtex-entry]


  120. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J-J. Pesque. Algèbre Linéaire Creuse Hautes Performances : Application à la Mécanique des Structures. In iHPerf'2000, Aussois, France, December 2000. Keyword(s): Sparse. [bibtex-key = c:LaBRI::ihperf_2k] [bibtex-entry]


  121. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Description of the EMILIO Software Processing Chain and Application to Structural Mechanics. In Proceedings of PMAA'2K, Neuchatel, Swiss, August 2000. Keyword(s): Sparse. [bibtex-key = C:LaBRI::PMAA_2K2] [bibtex-entry]


  122. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Parallel Sparse Linear Algebra and Application to Structural Mechanics. In European ACTC Workshop, Paris, France, May 2000. Keyword(s): Sparse. [bibtex-key = C:LaBRI::actc] [bibtex-entry]


  123. D. Goudin and J. Roman. A scalable parallel assembly for irregular meshes based on a block distribution for a parallel block direct solver. In Proceedings of PARA'2000, volume 1947 of LNCS, Bergen, Norway, 2000. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    This paper describes a distribution of elements for irregular finite element meshes as well as the associated parallel assembly algorithm, in the context of parallel solving of the resulting sparse linear system using a direct block solver. These algorithms are integrated in the software processing chain EMILIO being developped at LaBRI for structural mechanics applications. Some illustrative numerical experiments on IBM SP2 validate this study.
    [bibtex-key = C:LaBRI::gopara] [bibtex-entry]


  124. P. Hénon, P. Ramet, and J. Roman. PaStiX: A High-Performance Parallel Direct Solver for Sparse Symmetric Definite Systems. In Proceedings of PMAA'2K, Neuchatel, Swiss, August 2000. Keyword(s): Sparse. [bibtex-key = C:LaBRI::PMAA_2K1] [bibtex-entry]


  125. P. Hénon, P. Ramet, and J. Roman. PaStiX: A Parallel Sparse Direct Solver Based on a Static Scheduling for Mixed 1D/2D Block Distributions. In Proceedings of Irregular'2000 workshop of IPDPS, volume 1800 of LNCS, Cancun, Mexico, pages 519-525, May 2000. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    We present and analyze a general algorithm which computes an efficient static scheduling of block computations for a parallel $L.D.L^{t}$ factorization of sparse symmetric positive definite systems based on a combination of 1D and 2D block distributions. Our solver uses a supernodal fan-in approach and is fully driven by this scheduling. We give an overview of the algorithm and present performance results and comparisons with PSPASES on an IBM-SP2 with 120 MHz Power2SC nodes for a collection of irregular problems.
    [bibtex-key = C:LaBRI::hrr2k] [bibtex-entry]


  126. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Algèbre Linéaire Creuse Parallèle pour les Méthodes Directes : Application à la Parallélisation d'un Code de Mécanique des Structures. In Journées sur l'Algèbre Linéaire Creuse et ses Applications Industrielles, Rennes, France, 1999. Keyword(s): Sparse. [bibtex-key = c:LaBRI::GHPRRP1] [bibtex-entry]


  127. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, J. Roman, and J.-J. Pesque. Parallel Sparse Linear Algebra and Application to Structural Mechanics. In SPWorld'99, Montpellier, France, 1999. Keyword(s): Sparse. [bibtex-key = C:LaBRI::spworld] [bibtex-entry]


  128. P. Hénon, P. Ramet, and J. Roman. A Mapping and Scheduling Algorithm for Parallel Sparse Fan-In Numerical Factorization. In Proceedings of Euro-Par'99, volume 1685 of LNCS, Toulouse, France, pages 1059-1067, September 1999. Springer Verlag. Keyword(s): Sparse.
    Abstract:
    We present and analyze a general algorithm which computes efficient static schedulings of block computations for parallel sparse linear factorization. Our solver, based on a supernodal fan-in approach, is fully driven by this scheduling. We give an overview of the algorithms and present performance results on a 16-node IBM-SP2 with 66 MHz Power2 thin nodes for a collection of grid and irregular problems.
    [bibtex-key = C:LaBRI::hrr99a] [bibtex-entry]


  129. P. R. Amestoy, F. Desprez, P. Ramet, and J. Roman. Optimisation des Communications et Régulation de Charge pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux. In ICaRE'97, Aussois, France, pages 467-488, 1997. Keyword(s): Overlap. [bibtex-key = c:LaBRI::PR4] [bibtex-entry]


  130. P. Ramet. Calcul de la suite optimale de taille de paquets pour la factorisation de Cholesky. In ACTES RenPar'9, Lausanne, Swiss, pages 111-114, 1997. Springer Verlag. Keyword(s): Overlap.
    Abstract:
    L'utilisation des machines parall\`eles \`a m\'emoire distribu\'ee apporte un gain important en performances et en taille m\'emoire mais am\`ene en contre partie un surco\^ut en communications. Afin d'obtenir des programmes performants et extensibles, il convient de masquer ce surco\^ut. Plusieurs solutions existent. La premi\`ere consiste en un choix judicieux de la distribution des donn\'ees qui r\'eduira au maximum le nombre et la taille des communications. De plus, si les d\'ependances le permettent, on essaiera d'initialiser les communications au plus t\^ot et de mani\`ere asynchrone : pendant l'ex\'ecution de ces communications, on effectuera d'autres calculs, sans rapport avec l'\'echange de donn\'ees.
    [bibtex-key = c:LaBRI::PR3] [bibtex-entry]


  131. F. Desprez, P. Ramet, and J. Roman. Optimal Grain Size Computation for Pipelined Algorithms. In Proceedings of Euro-Par'96, number 1123 of LNCS, Lyon, France, pages 165-172, 1996. Springer Verlag. Keyword(s): Overlap.
    Abstract:
    In this paper, we present a method for overlapping communications on parallel computers for pipelined algorithms. We first introduce a general theoretical model which leads to a generic computation scheme for the optimal packet size. Then, we use the OPIUM library, which provides an easy-to-use and efficient way to compute, in the general case, this optimal packet size, on the column $LU$ factorization; the implementation and performance measures are made on an Intel Paragon.
    [bibtex-key = C:LaBRI::PR1] [bibtex-entry]


  132. P. Ramet. Calcul de la taille optimale des paquets pour les algorithmes macro-pipelines. In ACTES RenPar'8, Bordeaux, France, pages 21-24, 1996. Keyword(s): Overlap.
    Abstract:
    Le cadre g\'en\'eral de ce travail concerne les machines parall\`eles M.I.M.D. \`a m\'emoire distribu\'ee. Dans ce contexte, il est admis que la bonne d\'emarche de gestion du parall\'elisme passe par l'utilisation syst\'ematique de bibliot\`eques performantes de calcul et de communication avec la possibilit\'e de faire du recouvrement pour masquer, dans le temps, les communications par des calculs. Le th\`eme central de ce travail concerne le recouvrement calcul/communication, avec en particulier le calcul, \'eventuellement adaptatif, de la taille optimale des paquets \`a communiquer.
    [bibtex-key = c:LaBRI::PR2] [bibtex-entry]


  133. Esragul Korkmaz, Mathieu Faverge, Grégoire Pichon, and Pierre Ramet. Reaching the Quality of SVD for Low-Rank Compression Through QR Variants. Research Report RR-9476, Inria Bordeaux - Sud Ouest, July 2022. Keyword(s): Randomized. [bibtex-key = korkmaz:hal-03718312] [bibtex-entry]


  134. Esragul Korkmaz, Mathieu Faverge, Grégoire Pichon, and Pierre Ramet. Deciding Non-Compressible Blocks in Sparse Direct Solvers using Incomplete Factorization. Research Report RR-9396, Inria Bordeaux - Sud Ouest, 2021. Keyword(s): Low-rank compression. [bibtex-key = korkmaz:hal-03152932] [bibtex-entry]


  135. Changjiang Gou, Ali AL ZOOBI, Anne Benoit, Mathieu Faverge, Loris Marchal, Grégoire Pichon, and Pierre Ramet. Improving mapping for sparse direct solvers: A trade-off between data locality and load balancing. Research Report RR-9328, Inria Rhône-Alpes, February 2020. Keyword(s): Load balancing. [bibtex-key = gou:hal-02491495] [bibtex-entry]


  136. Cédric Augonnet, David Goudin, Matthieu Kuhn, Xavier Lacoste, Raymond Namyst, and Pierre Ramet. A hierarchical fast direct solver for distributed memory machines with manycore nodes. Research Report, CEA/DAM ; Total E&P ; Université de Bordeaux, October 2019. Keyword(s): H-Mat. [bibtex-key = augonnet:cea-02304706] [bibtex-entry]


  137. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Supernodes ordering to enhance Block Low-Rank compression in sparse direct solvers. Research Report RR-9238, Inria Bordeaux Sud-Ouest, December 2018. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01961675] [bibtex-entry]


  138. M. Faverge, S. Moustafa, F. Févotte, L. Plagne, and P. Ramet. Efficient Parallel Solution of the 3D Stationary Boltzmann Transport Equation for Diffusive Problems. Research Report RR-9116, Inria ; EDF Lab, September 2017. Keyword(s): Neutron. [bibtex-key = faverge:hal-01630208] [bibtex-entry]


  139. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver Using Block Low-Rank Compression. Research Report RR-9022, Inria Bordeaux Sud-Ouest, January 2017. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01450732] [bibtex-entry]


  140. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Sparse Supernodal Solver Using Block Low-Rank Compression: design, performance and analysis. Research Report RR-9130, Inria Bordeaux Sud-Ouest, December 2017. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01660665] [bibtex-entry]


  141. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Reordering strategy for blocking optimization in sparse linear solvers. Research Report RR-8860, Inria Bordeaux Sud-Ouest ; LaBRI - Laboratoire Bordelais de Recherche en Informatique ; Bordeaux INP ; Université de Bordeaux, February 2016. Keyword(s): Sparse. [bibtex-key = pichon:hal-01276746] [bibtex-entry]


  142. M. Alaya, M. Faverge, X. Lacoste, A. Péré-Laperne, J. Péré-Laperne, P. Ramet, and T. Terraz. Simul'Elec and PASTIX interface specifications. Technical Report RT-0458, INRIA Bordeaux ; AlgoTech, April 2015. [bibtex-key = alaya:hal-01142204] [bibtex-entry]


  143. M. Faverge, X. Lacoste, P. Ramet, and T. Terraz. Etude de la factorisation directe hétérogène et de la factorisation incomplète sur solveur PaStiX appliquées à des systèmes issus de problèmes du CEA/CESTA. Technical report, C.E.A. / C.E.S.T.A, 2015. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::cesta15] [bibtex-entry]


  144. A. Casadei, P. Ramet, and J. Roman. An improved recursive graph bipartitioning algorithm for well balanced domain decomposition. Research Report RR-8582, August 2014. Keyword(s): Sparse. [bibtex-key = casadei:hal-01056749] [bibtex-entry]


  145. X. Lacoste, M. Faverge, P. Ramet, S. Thibault, and G. Bosilca. Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes. Research Report RR-8446, INRIA, January 2014. Keyword(s): Sparse. [bibtex-key = lacoste:hal-00925017] [bibtex-entry]


  146. H. Sellama, G. Huijsmans, and P. Ramet. Adaptive mesh refinement for numerical simulation of MHD instabilities in tokamaks: JOREK code. Research Report RR-8635, INRIA Bordeaux, November 2014. [bibtex-key = sellama:hal-01088094] [bibtex-entry]


  147. O. Coulaud, L. Giraud, P. Ramet, and X. Vasseur. Deflation and augmentation techniques in Krylov lienar solvers. Research Report RR-8265, INRIA, February 2013. [bibtex-key = coulaud:hal-00803225] [bibtex-entry]


  148. A. Casadei and P. Ramet. Memory Optimization to Build a Schur Complement in an Hybrid Solver. Research Report RR-7971, INRIA, 2012. Keyword(s): Sparse. [bibtex-key = astrid:hal-00700053] [bibtex-entry]


  149. X. Lacoste, P. Ramet, M. Faverge, Y. Ichitaro, and J. Dongarra. Sparse direct solvers with accelerators over DAG runtimes. Research Report RR-7972, INRIA, 2012. Keyword(s): Sparse. [bibtex-key = lacoste:hal-00700066] [bibtex-entry]


  150. M. Faverge, X. Lacoste, and P. Ramet. A NUMA Aware Scheduler for a Parallel Sparse Direct Solver. Technical report, 2010. Keyword(s): Sparse.
    Abstract:
    {O}ver the past few years, parallel sparse direct solvers made significant progress and are now able to solve efficiently industrial three-dimensional problems with several millions of unknowns. {T}o solve efficiently these problems, {P}a{S}ti{X} and {WSMP} solvers for example, provide an hybrid {MPI}-thread implementation well suited for {SMP} nodes or multi-core architectures. {I}t enables to drastically reduce the memory overhead of the factorization and improve the scalability of the algorithms. {H}owever, today's modern architectures introduce new hierarchical memory accesses that are not handle in these solvers. {W}e present in this paper three improvements on {P}a{S}ti{X} solver to improve the performance on modern architectures : memory allocation, communication overlap and dynamic scheduling and some results on numerical test cases will be presented to prove the efficiency of the approach on {NUMA} architectures.
    [bibtex-key = FAVERGE:2010:INRIA-00549827:1] [bibtex-entry]


  151. G. Caramel and P. Ramet. Optimisation des performances des outils de calcul de neutronique des coeurs. Technical report, E.D.F. / SINETICS, 2007. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::cr07b] [bibtex-entry]


  152. G. Caramel, P. Hénon, and P. Ramet. Etude de faisabilité pour la parallélisation d'un code de mécanique des fluides en version non structurée. Technical report, C.E.A. / C.E.S.T.A, 2005. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::hr05b] [bibtex-entry]


  153. F. Huard, P. Hénon, and P. Ramet. Intégration dans ODYSSEE de la chaine logicielle EMILIO. Technical report, C.E.A. / C.E.S.T.A, 2005. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::hr05d] [bibtex-entry]


  154. P. Hénon, P. Ramet, and J. Roman. Evaluation des performances de la version SMP du solveur PaStiX de la chaine logicielle EMILIO dans l'environnement du code ODYSSEE. Technical report, C.E.A. / C.E.S.T.A, 2005. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::hrr05] [bibtex-entry]


  155. A. Goureman, P. Ramet, and J. Roman. Développement de la phase d'assemblage de la chaîne EMILIO (distribution du maillage et multi-threading). Technical report, C.E.A. / C.E.S.T.A, 2004. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::grr04b] [bibtex-entry]


  156. P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. Etude sur l'applicabilité de méthodes itératives nouvelles aux problèmes du CESTA. Technical report, C.E.A. / C.E.S.T.A, 2004. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::hprr04b] [bibtex-entry]


  157. S. Christy, P. Ramet, and J. Roman. Développement de la phase d'assemblage de la chaîne EMILIO pour un solveur parallèle 2D. Technical report, C.E.A. / C.E.S.T.A, 2003. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::crr03b] [bibtex-entry]


  158. P. Hénon, D. Lecas, P. Ramet, and J. Roman. Amélioration et Extension du Solveur Direct Parallèle pour Grandes Matrices Creuses du CESTA. Technical report, C.E.A. / C.E.S.T.A, 2003. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::hlrr03] [bibtex-entry]


  159. D. Goudin, P. Hénon, F. Pellegrini, P. Ramet, and J. Roman. Mise en oeuvre d'une Bibliothèque d'Outils pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux Symétriques Définis Positifs sur Machine Parallèle. Technical report, C.E.A. / C.E.S.T.A, 2001. Note: Manuel utilisateur de la chaîne EMILIO. Keyword(s): Sparse. [bibtex-key = n:LaBRI::all01] [bibtex-entry]


  160. P. Ramet and J. Roman. Analyse et Etude de Faisabilité de la Résolution par Méthode Directe sur Machine Parallèle de Grands Systèmes Linéaires Symétriques Définis positifs pour des Problèmes d'Électromagnétisme avec Couplage Éléments Finis -- Équations Intégrales. Technical report, C.E.A. / C.E.S.T.A, 2001. Note: Rapport Final. Keyword(s): Sparse. [bibtex-key = f:LaBRI::rr01] [bibtex-entry]


  161. D. Goudin. Mise en oeuvre d'une Bibliothèque d'Outils pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux Symétriques Définis Positifs sur Machine Parallèle. Technical report, C.E.A. / C.E.S.T.A, 2000. Note: Rapport Final de la Deuxième Partie. Keyword(s): Sparse. [bibtex-key = f:LaBRI::go2ka] [bibtex-entry]


  162. D. Goudin. Mise en oeuvre d'une Bibliothèque d'Outils pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux Symétriques Définis Positifs sur Machine Parallèle. Technical report, C.E.A. / C.E.S.T.A, 1999. Note: Rapport Final de la Première Partie. Keyword(s): Sparse. [bibtex-key = f:LaBRI::go99a] [bibtex-entry]


  163. D. Goudin. Mise en oeuvre d'une Bibliothèque d'Outils pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux Symétriques Définis Positifs sur Machine Parallèle. Technical report, C.E.A. / C.E.S.T.A, 1999. Note: Rapport Semestriel de la Deuxième Partie. Keyword(s): Sparse. [bibtex-key = f:LaBRI::go99b] [bibtex-entry]


  164. D. Goudin. Mise en oeuvre d'une Bibliothèque d'Outils pour la Résolution par Méthode Directe de Grands Systèmes Linéaires Creux Symétriques Définis Positifs sur Machine Parallèle. Technical report, C.E.A. / C.E.S.T.A, 1998. Note: Premier Rapport Semestriel. Keyword(s): Sparse. [bibtex-key = f:LaBRI::go98a] [bibtex-entry]


  165. Grégoire Pichon, E Darve, Mathieu Faverge, Esragul Korkmaz, Pierre Ramet, and Jean Roman. Sparse supernodal solver using block low-rank compression: Design, performance and analysis. JOREK development meeting, November 2019. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-02326407] [bibtex-entry]


  166. Pierre Ramet. Utilisation de techniques de compression low-rank pour un solveur parallèle direct creux. Journées Ondes du Sud-Ouest, March 2019. [bibtex-key = ramet:hal-02081108] [bibtex-entry]


  167. G. Pichon, E. Darve, M. Faverge, P. Ramet, and J. Roman. Utilisation de la compression Block Low-Rank pour accélérer un solveur direct creux supernodal. COMPAS 2018 - SOLHAR final meeting, Toulouse, France, July 2018. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01956959] [bibtex-entry]


  168. G. Pichon, M. Faverge, P. Ramet, and J. Roman. Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX. JCAD 2018 - Journées Calcul et Données, Lyon, France, October 2018. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01956928] [bibtex-entry]


  169. P. Ramet. Heterogeneous architectures, Hybrid methods, Hierarchical matrices for Sparse Linear Solvers. Seminar at Stanford, April 2018.
    Abstract:
    Sparse direct solvers is a time consuming operation required by many scientific applications to simulate physical problems. By its important overall cost, many studies tried to optimize the time to solution of those solvers on multi-core and distributed architectures. In this talk, we will present recent advances on PaStiX (https://gitlab.inria.fr/solverstack/pastix), a supernodal sparse direct solver, which has been enhanced by the introduction of Block Low-Rank compression. We will compare the numerical stability, and the performance in terms of memory consumption and time to solution of different approaches by selecting when the compression of the factorized matrix occurs. Many works have also addressed heterogeneous architectures to exploit accelerators such as GPUs or Intel Xeon Phi with interesting speedup. The new implementation on top of runtime systems (PaRSEC, StarPU), will be compared with the static scheduling used in previous experiments. Among the preprocessing steps of a sparse direct solver, reordering and block symbolic factorization are two major steps to reach a suitable granularity. In this talk, we will present a reordering strategy to increase off-diagonal block sizes. It enhances BLAS kernels and allows to handle larger tasks, reducing runtime overhead. Finally, in order to improve the efficiency of the sparse update kernel for both BLR (block low rank) and HODLR (hierarchically off-diagonal low-rank), we are currently investigating the BDLR (boundary distance low-rank) method to preselect rows and columns in the low-rank approximation algorithm.
    [bibtex-key = c:LaBRI::stanford18] [bibtex-entry]


  170. G. Pichon, E. Darve, M. Faverge, S. Lanteri, P. Ramet, and J. Roman. Sparse supernodal solver with low-rank compression for solving the frequency-domain Maxwell equations discretized by a high order HDG method. Journées jeunes chercheur-e-s - Résolution de problèmes d'ondes harmoniques de grande taille, November 2017. Keyword(s): Low-rank compression. [bibtex-key = pichon:hal-01660653] [bibtex-entry]


  171. H. Beaugendre, L. Lestandi, and P. Ramet. Benchmarking of the linear solver PaStiX for integration in LESCAPE. Internship Inria, March 2015. [bibtex-key = c:LaBRI::lescape] [bibtex-entry]


  172. M. Faverge, X. Lacoste, and P. Ramet. PaStiX: Parallel Sparse Matrix Package. JDEV2015 : Journées Développement Logiciel, July 2015. [bibtex-key = c:LaBRI::JDEV] [bibtex-entry]


  173. M. Faverge, G. Pichon, P. Ramet, and J. Roman. Blocking strategy optimization for sparse direct linear solvers on heterogeneous architectures. SOLHAR meeting, Lyon, France, June 2015. Keyword(s): Sparse. [bibtex-key = c:LaBRI::pastix-solhar3] [bibtex-entry]


  174. M. Faverge, G. Pichon, P. Ramet, and J. Roman. Blocking strategy optimizations for sparse direct linear solver on heterogeneous architectures. Workshop INRIA-CNPq, HOSCAR meeting, Sophia-Antipolis, France, September 2015. Keyword(s): Sparse.
    Abstract:
    In the context of solving sparse linear systems, an ordering process partitions the matrix graph to minimize both fill-in and computational cost. We found that the ordering strategy used within supernodes might be enhanced to reduce the number of off-diagonal blocks, and then increases block sizes and kernel performance. This turns to be into the same complexity as the factorization algorithm, but allows for more efficient BLAS kernels. On the other side, supernodes that are too large need to be split to create more parallelism. The regular splitting strategy when applied locally impacts significantly the number of off-diagonal blocks and might have negative effect on the efficiency. In this talk, we present both a new strategy to improve supernodes ordering and splitting strategy that both enlarge the off-diagonal block sizes without changing the computational cost of the factorization. Performance improvement gains on the supernodal solver PaStiX are shown on multi-cores and heterogeneous architectures.
    [bibtex-key = c:LaBRI::HOSCAR2015] [bibtex-entry]


  175. Y. Laizet, A. Moreau, J.-M. Frigerio, Ph. Chaumeil, P. Gay, P. Ramet, D. Sherman, and A. Franc. Biodiversiton : application du HPC à l'étude de la biodiversité. Seminar at MCIA (Mésocentre de Calcul Intensif Aquitain), March 2015. [bibtex-key = c:LaBRI::MCIA15] [bibtex-entry]


  176. P. Ramet. On the design of parallel linear solvers for large scale problems. Formation CNRS, Journée problème de Poisson, Paris, France, January 2015. [bibtex-key = c:LaBRI::CNRS15] [bibtex-entry]


  177. P. Ramet. Solveurs Directs. Maison de la Simulation, Formation PATC, Algèbre Linéaire Creuse Parallèle, Paris, France, April 2015. [bibtex-key = c:LaBRI::MDS15] [bibtex-entry]


  178. E. Agullo, M. Faverge, L. Giraud, A. Guermouche, P. Ramet, and J. Roman. Toward parallel scalable linear solvers suited for large scale hierarchical parallel platforms. Workshop INRIA-CNPq, HOSCAR meeting, Gramado, Brazil, September 2014. Keyword(s): Sparse.
    Abstract:
    In this talk we will discuss the current and future research activities on the design of parallel scalable linear systems for large scale problems that range from dense linear algebra, to parallel sparse direct solver and hybrid iterative-direct approaches that attempt to go beyond the best capabilities one can expect from sparse direct solvers. In particular we will describe the current activities on the implementations designed on top of runtime systems that should provide both code and performance portabilities across different parallel platforms. Finally, we will present some preliminary results to address the resilience issues on extreme scale computers; for that purpose we consider numerical alternatives that do not intensively rely on checkpoint restart mechanisms.
    [bibtex-key = c:LaBRI::HOSCAR2014] [bibtex-entry]


  179. E. Agullo and P. Ramet. Task-based linear solvers for modern architectures. 7th ITER International School, High Performance Computing in Fusion Science, Aix-en-Provence, France, August 2014. [bibtex-key = c:LaBRI::ITER14] [bibtex-entry]


  180. E. Darve, A. H. Aminfar, C. Dudley, P. Ramet, and M. Faverge. Fast Algorithms for Dense Linear Algebra. CPU - Cluster of excellence, Bordeaux, France, July 2014. [bibtex-key = c:LaBRI::CPUb] [bibtex-entry]


  181. X. Lacoste, M. Faverge, and P. Ramet. Distributed sparse matrix factorization on top of tasks based runtime systems. SOLHAR meeting, Bordeaux, France, November 2014. Keyword(s): Sparse. [bibtex-key = c:LaBRI::pastix-solhar2] [bibtex-entry]


  182. S. Moustafa, M. Faverge, L. Plagne, and P. Ramet. 3D Cartesian Transport Sweep for Massively Parallel Architectures on top of PaRSEC. 9th Scheduling for Large Scale Systems Workshop, Lyon, France, July 2014. Keyword(s): Neutron. [bibtex-key = c:LaBRI::sweep-scheduling] [bibtex-entry]


  183. S. Moustafa, M. Faverge, L. Plagne, and P. Ramet. 3D Cartesian Transport Sweep for Massively Parallel Architectures on top of PaRSEC. SOLHAR meeting, Toulouse, France, June 2014. Keyword(s): Neutron. [bibtex-key = c:LaBRI::sweep-solhar] [bibtex-entry]


  184. P. Ramet. From hybrid architectures to hybrid solvers. CPU - Cluster of excellence, Bordeaux, France, July 2014. [bibtex-key = c:LaBRI::CPUa] [bibtex-entry]


  185. P. Ramet. Hybrid methods, Hybrid architectures, Hybrid compressions for sparse direct solvers. Seminar at MCIA (Mésocentre de Calcul Intensif Aquitain), February 2014. [bibtex-key = c:LaBRI::MCIA14] [bibtex-entry]


  186. P. Ramet. Solveurs Directs. Maison de la Simulation, Formation PATC, Algèbre Linéaire Creuse Parallèle, Paris, France, March 2014. [bibtex-key = c:LaBRI::MDS14] [bibtex-entry]


  187. X. Lacoste, M. Faverge, and P. Ramet. Sparse Linear Algebra over DAG Runtimes. SOLHAR meeting, Bordeaux, France, November 2013. Keyword(s): Sparse. [bibtex-key = c:LaBRI::pastix-solhar] [bibtex-entry]


  188. P. Ramet. From hybrid architectures to hybrid solvers. Seminar at Stanford, July 2013. [bibtex-key = c:LaBRI::stanford13a] [bibtex-entry]


  189. P. Ramet. Hybrid methods, Hybrid architectures, Hybrid compressions for sparse direct solvers. Seminar at Stanford, November 2013. [bibtex-key = c:LaBRI::stanford13b] [bibtex-entry]


  190. P. Ramet. Méthodes directes et hybrides pour des solveurs creux adapatés aux machines multiCPUs/multiGPUs. 3ième Ecole Thématique de Simulation Numérique, Frejus, France, July 2013. [bibtex-key = c:LaBRI::ETSN2013] [bibtex-entry]


  191. P. Ramet. Parallel Numerical Solvers on top of Runtime Systems. SuperComputing'2013, Denver, USA, November 2013. Keyword(s): Sparse. [bibtex-key = c:LaBRI::SC2013] [bibtex-entry]


  192. P. Ramet. Solveurs Directs. Maison de la Simulation, Formation PATC, Algèbre Linéaire Creuse Parallèle, Paris, France, March 2013. [bibtex-key = c:LaBRI::MDS13] [bibtex-entry]


  193. E. Agullo, G. Bosilca, B. Bramas, C. Castagnede, O. Coulaud, E. Darve, J. Dongarra, M. Faverge, N. Furmento, G. Giraud, X. Lacoste, J. Langou, H. Ltaief, M. Messner, R. Namyst, P. Ramet, T. Takahashi, S Thibault, S. Tomov, and I. Yamazaki. Matrices over Runtime Systems at Exascale. SuperComputing'2012, Salt Lake City, USA, November 2012. Keyword(s): Sparse. [bibtex-key = c:LaBRI::SC2012] [bibtex-entry]


  194. M. Boulet, G. Meurant, D. Goudin, J.-J. Pesque, M. Chanaud, L. Giraud, P. Hénon, P. Ramet, and J. Roman. Résolution des systèmes linéaires sur calculateurs pétaflopiques. CHOCS volume 41: revue scientifique et technique de la Direction des Applications Militaires, January 2012. [bibtex-key = c:LaBRI::CHOCS] [bibtex-entry]


  195. X. Lacoste, M. Faverge, and P. Ramet. Scheduling for Sparse Solver on Manycore Architectures. Workshop INRIA-CNPq, HOSCAR meeting, Petropolis, Brazil, September 2012. Keyword(s): Sparse.
    Abstract:
    The emergence of many-cores architectures introduces variations in computation costs, which makes precise cost models hard to realize. Static schedulers based on cost models, like the one used in the sparse direct solver PaStiX, are no longer adapted. We describe the dynamic scheduler developed for the super-nodal method of PaStiX to correct the imperfections of the static model. The solution presented exploit the elimination tree of the problem to keep the data locality during the execution.
    [bibtex-key = c:LaBRI::HOSCAR2012b] [bibtex-entry]


  196. X. Lacoste, M. Faverge, and P. Ramet. Sparse direct solvers with accelerators over DAG runtimes. Workshop INRIA-CNPq, HOSCAR meeting, Sophia-Antipolis, France, July 2012. Keyword(s): Sparse.
    Abstract:
    The current trend in the high performance computing shows a dramatic increase in the number of cores on the shared memory compute nodes. Algorithms, especially those related to linear algebra, need to be adapted to these new computer architectures in order to be efficient. PaStiX is a sparse parallel direct solver, that incorporates a dynamic scheduler for strongly hierarchical modern architectures. In this work, we study the replacement of this internal highly integrated scheduling strategy by two generic runtime frameworks: DAGuE and StarPU. Those runtimes will give the opportunity to execute the factorization tasks graph on emerging computers equipped with accelerators. As for previous work done in dense linear algebra, we will present the kernels used for GPU computations inspired by the MAGMA library and the DAG algorithm used with those two runtimes. A comparative study of the performances of the supernodal solver with the three different schedulers is performed on manycore architectures and the improvements obtained with accelerators will be presented with the StarPU runtime. These results demonstrate that these DAG runtimes provide uniform programming interfaces to obtain high performance on different architectures on irregular problems as sparse direct factorizations.
    [bibtex-key = c:LaBRI::HOSCAR2012a] [bibtex-entry]


  197. P. Ramet. Sparse direct solver on top of large-scale multicore systems with GPU accelerators. CEMRACS'2012, Méthodes numériques et algorithmes pour architectures pétaflopiques, Marseille, France, August 2012. [bibtex-key = c:LaBRI::CEMRACS12] [bibtex-entry]


  198. P. Ramet. Linear algebra and sparse direct methods. Séminaires de l'école MFN 2011 sur les méthodes et algorithmes pour le calcul hautes performances, Roscoff, France, June 2011. [bibtex-key = c:LaBRI::MFN2011] [bibtex-entry]


  199. P. Ramet. PaStiX: sparse direct/hybrid solver on many CPU/GPU clusters. Friday Lunch 11/11/11, ICL, UTK, USA, November 2011. [bibtex-key = c:LaBRI::UTK11] [bibtex-entry]


  200. P. Ramet. Solveurs Directs. Maison de la Simulation, Formation en Algèbre Linéaire Creuse Parallèle, Bordeaux, France, November 2011. [bibtex-key = c:LaBRI::MDS11] [bibtex-entry]


  201. P. Hénon and P. Ramet. Scalable direct and iterative solvers. SuperComputing'2010, New Orleans, USA, November 2010. Keyword(s): Sparse. [bibtex-key = c:LaBRI::SC2010] [bibtex-entry]


  202. P. Hénon and P. Ramet. Scalable direct and iterative solvers, June 2010. Note: Workshop INRIA-UUIC, Bordeaux, France. Keyword(s): Sparse. [bibtex-key = C:LaBRI::UUIC] [bibtex-entry]


  203. P. Ramet. Formation Parallélisme. CEMRACS'2010, Modèles numériques pour la fusion, Marseille, France, August 2010. [bibtex-key = c:LaBRI::CEMRACS10] [bibtex-entry]


  204. P. Ramet. Ordonnancement dynamique dans le solveur PaStiX pour des machines NUMA et multicoeurs. Formation CNRS, Solveurs de Systèmes Linéaires de grande taille : les avancées récentes, Lyon, France, November 2010. [bibtex-key = c:LaBRI::CNRS10] [bibtex-entry]


  205. P. Hénon, P. Ramet, and J. Roman. A supernode amalgamation algorithm for an efficient block incomplete factorization, July 2008. Note: Mini-Workshop on parallel iterative solvers and domain decomposition techniques, Minneapolis, USA. Keyword(s): Sparse. [bibtex-key = C:LaBRI::MN08] [bibtex-entry]


  206. P. Ramet. Résolution de Systèmes Linéaires, Algorithmes et Parallélisme. Formation CNRS, Informatique Scientifique pour le Calcul, Sète, France, October 2008. [bibtex-key = c:LaBRI::CNRS08] [bibtex-entry]


  207. P. Ramet and J. Roman. Méthodes directes hautes performances de résolution en algèbre linéaire creuse. Ecole CEA-EDF-INRIA sur le calcul scientifique intensif, Rocquencourt, France, November 2006. [bibtex-key = c:LaBRI::CEI06] [bibtex-entry]


  208. P. Hénon, P. Ramet, and J. Roman. Parallel Complete and Incomplete Blockwise Factorisations for Very Large Sparse Systems. SuperComputing'2004, Pittsburgh, USA, November 2004. Keyword(s): Sparse. [bibtex-key = c:LaBRI::SC2004] [bibtex-entry]


  209. P. Hénon, P. Ramet, and J. Roman. A Parallel Direct Solver for Very Large Sparse SPD Systems. Poster session at IPDPS'2003, April 2003. Keyword(s): Sparse. [bibtex-key = c:LaBRI::ipdps2003] [bibtex-entry]


  210. P. Hénon, P. Ramet, and J. Roman. A Parallel Direct Solver for Very Large Sparse SPD Systems. SuperComputing'2002, Baltimore, USA, November 2002. Keyword(s): Sparse. [bibtex-key = c:LaBRI::SC2002] [bibtex-entry]



BACK TO INDEX




Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.




Last modified: Tue Apr 4 11:58:35 2023
Author: ramet.


This document was translated from BibTEX by bibtex2html