Containers for Scientific, AI, and Engineering Applications

This post is to help customers understand how containers can help scientists and engineers get their work done, and how TotalCAE container technology has been used to solve real world problems for customers.

What is a container?

A container in this context is a way to encapsulate applications and even operating systems into a “file” that is portable across systems.

It is similar to a virtual machine in that it enables you to package complex software packages into a single portable file that you can copy, email, and execute on different systems. However unlike virtual machines, containers can be started in less than a second and are very lightweight.

When would I use a container as an engineer?

Running bleeding edge open source scientific codes.

Many open source codes are very difficult to build, and often are using the most bleeding edge operating systems, libraries, compilers and such that may not be available to you on your on-premise system, or could be incompatible for various reasons.

This is very true of machine learning libraries, some of which are pre-packaged as containers and are under rapid change. Optimized scientific containers can easily be pulled from a container register such as the NVIDIA NGC container repository, or self-hosted registry managed by TotalCAE.

Users and their IT administrators are faced with a difficult issue of wanting to run the latest bleeding edge of a package that requires a new or different operating system and software, but not wanting to face the risk of upgrading a perfectly good working system.

For example running the bleeding edge versions of the popular open source rendering application blender may require you to be on a more bleeding edge version of the operating system.

Containerized Application, Operating System, and Libraries in a Single File

In the old days, dual booting or running virtual machines, running different nodes with different operating systems were all solutions to this conundrum.

Today however, one can “encapsulate” this application inside of a completely different operating system container so you can run this application without changing the host system it is running on at all. This eliminates all risk, since there are no changes being made to the system to run the application.

To run a different operating system for specific codes

In one case a customer had thousands of nodes running on an operating system on a HPC system near end of life. One critical meshing application was required to run at the latest release version, but the meshing application was only supported on the very latest release of the Linux operating system.

That application was put in a container running a vendor supported operating system. This avoided the risk and thousands of person hours of time effort re-certifying, testing, and deploying a large operating system upgrade to a huge HPC system that was near end of life for just one application, while still enabling this critical application to run on top of the unmodified, unsupported operating system.

Work around lack of internet access

Many of our customers have firewall policies that do not allow external internet access from their HPC systems. This is problematic when many open source installation programs require access to the internet to download various parts and dependencies. Installing a useful program with hundreds of dependencies from various sources and projects is often impractical without internet access.

With a container, the entire application can be installed normally off-premise with internet access, and then the single “file” copied to the customer environment to run as a container.

Distribute a homegrown code internally without having to do a lot of support.

We have some customers that have extremely smart technical people around the globe that have written homegrown applications on their Linux workstation to solve a problem. Often these engineers wrote this tool for themselves, and do not have the time to configure and test the application so it can work for other environments , or for external partners who may benefit from it.

A solution in this case is for the engineer to install their application into a container, and just distribute the single container file. This frees them from having to deal with the unpleasantries of supporting the complexity of different Linux environments from their own environment they use, and the program will execute the same in other environments the same as their own.

When would I NOT use a container?

You really don’t need a container for commercial engineering codes. ANSYS, SIMULIA , Siemens products, and all the popular CAE vendors are quite easy to install multiple releases on standard supported operating systems.

Below is a list of reasons NOT to run a container if the application:

Runs fine in a standard operating system you already have installed.
Is simple to install multiple releases.
Has few or any dependencies.
Does not require internet access as part of the install process if you don’t have it.
Is the same install for all the users.
Is easily supported as a normal install by your CAE IT department.

How do you run a container?

The application and operating system are all bundled into a single file, that can be executed like a normal application. For example, to run this blender rendering application discussed earlier contained in container file called blender.img, type:

singularity run blender.img -b file.blend -a

The “singularity run blender.img” is the the part running the container, and the “-b file.blend -a” are options to blender.

If you do a process listing, you will just see “blender”. From your IT administrators point of view, the container looks like any other process.

How do I fetch an existing container?

Perhaps you want to grab a container already created by NVIDIA or others that is pre-compiled and optimized. For example, the popular PyTorch open source machine learning library can be grabbed and executed from the NVIDIA NGC repository in two steps:

singularity build pytorch.simg docker://nvcr.io/nvidia/pytorch:19.07-py3 singularity exec --nv pytorch.simg python pytorch.py

How do you create a container?

While using a container is as easy as executing a command like show above, creating a container is a bit more involved and requires some Linux skills. Normally either TotalCAE will containerize your custom or open source application, or a developer type person creates the container for normal users to use.

For a detailed technical look at how this blender container was created in this example, see our more technical deep dive on Singularity containers

Why not just Docker?

It is important to note that containers used in engineering solutions described in this post are often different from Docker containers your enterprise IT may be using. Containers used in engineering ( such as Singularity) meet the special needs of multi-tenant HPC. Docker containers give the container root on the host system, which for multi-tenant HPC clusters is generally not allowed.

Can I run containers on cloud?

YES! Containers run on AWS, Azure, and GCP without modification.

Running distributed MPI jobs across multiple containers

Running tightly coupled distributed CAE applications in multiple containers is possible, but note in this scenario the application typically requires modification to run this way, especially commercial engineering codes.

While we rarely see customers having a need for this, TotalCAE has enabled numerous commercial applications to run this way on specialized container platforms.

Next Steps

If you are wanting to run the latest latest AI/ML/DL open source applications , need to package a complex internal code to distribute internally, or need to access open source packages but can’t due to no internet access, reach out to TotalCAE and we can help you adopt this solution.