Since the last decade, most of the supercomputer architectures are based on cluster of SMP nodes. In those architectures the exchanges between processors are made through shared memory when the processor are located on a same SMP node and through the network otherwise. Generally, the MPI implementations provided by the constructor on those machines are adapted to this situation and take advantage of the share memory to treat messages between processors in a same SMP node. Nevertheless, this transparent approach to exploit shared memory do not avoid the storage of buffers needed in asynchronous communications. In the parallel direct solvers the storage of these buffers can become a bottleneck. In this paper, we propose an hybrid thread-MPI implementation of a direct solver and analyse the benefits of this approach in term of memory and run-time performances.