Check if the port for torchrun is open via ncat
Top
Questions to David Rotermund
This is my script (connection_test.sh) that tests if a connection between two computer of a given port is possible:
You need the master_ip and master_port:
master_ip="10.10.10.10"
master_port="40001"
python_file="main.py"
ip_check=`ip addr | grep $master_ip | wc -l`
if [[ $ip_check == "1" ]]
then
echo "Master"
echo "OK OK OK OK OK OK OK" | ncat -l -p $master_port
else
echo "Client"
ncat $master_ip $master_port
The script needs to be started on the computer with the master ip first.
If this fails, the port is already used or your firewall settings is blocking it.
The source code is Open Source and can be found on GitHub.