• Home
  • Accessing Our Facilities
    • Apply for Access
    • HPC Resource List
    • Our Staff
    • Our Research Projects
    • Our Research Software

    • Contributions & Costings
    • HPC Driving Test
  • Documentation
    • Documentation Home
    • Getting Started
    • Advanced Topics
    • Training & Workshops
    • FAQ
    • Policies & Procedures
    • Using the Wiki

    • Data & Report Terminology
    • About this website
  • My Account
    • My HPC Projects
HPC Support
Trace: • 031

Why does my MPI job complain about connection errors or addresses already in use?

Depending on the number of MPI processes you attempt to start in your Slurm jobs, you may see errors such as:

mca_btl_tcp_endpoint_start_connect] bind on local address (aaa.bb.cc.dd:0) failed: Address already in use (98)

Or

pmix_ptl_base: send_msg: write failed: Broken pipe (32) [sd = xyz]                                                       

Or words to the effect of:

/usr/share/openmpi/help-mpi-btl-tcp.txt: Too many open files.  Sorry!

This occurs because MPI processes open network sockets to talk to each other. Depending on the MPI communications mechanism your application is configured to use, the number of required network sockets can vary.

Why This Happens

In most cases this is because your job has ran out of available open network sockets. Network sockets share the same (consumeable) system resources as open files; Linux tracks the use of these resources in order to attempt ensure that the system can still read and write essential system files and respond to incoming network requests (e.g. logins). The maximum number of network sockets on a Linux system is 65536 - but, this is shared with all of the essential system network services such as SSH, NFS, email server, etc. The real number available to users on a Linux system is lower.

You can verify how many of these your account can open with the following command:

$ ulimit -n
1024

This shows that you are allowed to open 1024 files or network sockets (this is the normal Linux default).

Requesting Increased Sockets or Files

You may request an increase to the number of open files or sockets by supplying an argument to the ulimit -n command, for example, to increase to 2048:

$ ulimit -n 2048
$ ulimit -n
2048

This limit change only persists within the session you run it.

What is the maximum system limit for files or sockets?

You can check what the system is currently configured to allow with the following command:

$ sysctl net.ipv4.ip_local_port_range
net.ipv4.ip_local_port_range = 32768	60999

This indicates that sockets in the numbered range 32768 - 60999 are allowed. This gives a figure of 28231 - a seemingly huge number, but this is shared by all users and all processes on the system which want to open a network socket.

If you attempt to request an increase to the open files number (via ulimit -n) beyond what is allowed for individual users, you will see an error. For example:

$ ulimit -n 3001
-bash: ulimit: open files: cannot modify limit: Operation not permitted

Here we can see that 3001 open files or sockets is more than is allowable for a single user.

Think about your use case!

It is not normal to allow a single user to use more than a few thousand open files / network sockets. Tens of thousands would be an extremely unusual use case. If you do hit this issue, then you should probably reconsider how you are approaching the problem.

MPI Edge Cases

There are some specific examples in MPI jobs that may want to open many, many thousands of network sockets. If, for example, you are using the alltoall inter-process communication method in your MPI code, then the number of network sockets you need can be calculated as:

c x ((n x c) - c)

Where:

  • c is the number of MPI processes per node
  • n is the number of nodes

So in the case of using all 256 cores on a node, and running on two nodes:

256 x ((2 x 256) - 256) == 65536

Four nodes would increase that to:

256 x ((4 x 256) - 256) == 196608

This makes it impossible to use the alltoall inter-process communication method beyond a small number of MPI processes on a node, as it requires more available network sockets on each host than is possible (at an absolute minimum 1024 network sockets are always reserved for essential network services).

This problem is explored in more detail in a useful OpenMPI github error report:

  • https://github.com/open-mpi/ompi/issues/7246

Note that, as discussed, there is no specific fix for this; there are a number of possible workarounds, but as explained in the socket calculation code above, there is an absolute maximum ceiling for the size of jobs which use alltoall communications.

Configuring your MPI jobs

If any MPI code you are using expects to use alltoall communication, you should configure it to use an alternative - or configure the maximum number of MPI processes to a sensible value which does not require tens of thousands of open network sockets.

In most cases every single MPI process does not need to communicate with every other process running across every other node in the entire job set.


Back to FAQ index

Previous Next

HPC Support

Table of Contents

Table of Contents

  • Why does my MPI job complain about connection errors or addresses already in use?
      • Why This Happens
      • Requesting Increased Sockets or Files
      • What is the maximum system limit for files or sockets?
      • MPI Edge Cases

Main Content Sections

  • Documentation Home
  • Getting Started
  • Advanced Topics
  • Training & Workshops
  • FAQ
  • Policies & Procedures
  • Using the Wiki
  • Contact us & Get Help

Documentation Tools

  • Wiki Login
  • RSE-HPC Team Area
Developed and operated by
Research Software Engineering
Copyright © Newcastle University
Contact us @rseteam