This post is to help clients that are interested in a Build vs. Buy decision when trying to adopt HPC for their CAE simulation.

There are four options when deciding how to go from a workstation environment to a HPC cluster environment, either on-premise or in the cloud.

On-Premise Build vs. Buy Options

  1. BUILD a general HPC cluster from a vendor like Dell, HPE or Lenovo and try to integrate, update and manage the system and your CAE software on your own.
  2. BUY a completely managed turn-key TotalCAE HPC Appliance that gives you an on-premise managed SaaS solution.

Cloud Build vs. Buy Options

  1. BUILD a cloud system using various cloud toolkits from AWS or Azure and manage the system, security, billing, management and CAE software yourself.
  2. BUY a completely managed turn-key SaaS solution for your engineering applications. (TotalCAE Cloud on AWS or Azure or a single vendor cloud solution such as  Dassault Systèmes 3DEXPERIENCE cloud, or ANSYS Cloud)

The following section will dive into the pros and cons of these choices.

Option 1: On-Premise Build

HPC is Complex with Lots of Pieces that Must Fit Together

Historically, for on-premise builds one could simply buy a general purpose HPC cluster from any hardware vendor. Those vendors may leave you with something initially working, give you a 5 minute tour once the cluster is off the ground and then basically leave you on your own to figure our how to use your applications. Some might check in on you a few times a year but are not an active partner in your daily success of using the system to complete your engineering.

Many HPC vendors drop HPC systems off and depart.

Invariably, something happens or changes when you have a tight deadline where your model is not solving properly and it is unclear if the issue is the cluster, operating system, CAE application version, InfiniBand, the model or even where to start to get the issue fixed.

Something will eventually go wrong at the worst time if the system is left unmanaged.

This of course leaves the engineer frustrated as they just want to focus on engineering and not understanding all the HPC IT and application complexity to solve their problems.

Deadlines, Deadlines…

Inevitably this leaves many clients in this build scenario with a system that is unreliable, untrusted, and often afraid to make any changes due to new business requirements because it may de-stabilize the system that is not fully understood.

Option 2: On-Premise BUY using a TotalCAE Managed HPC Appliance

TotalCAE makes turn-key HPC systems that are purpose-built for the CAE applications you are running. TotalCAE does not drop off the system and leave. TotalCAE manages your owned or leased HPC cluster using a managed subscription for the life of the system. For any issue, TotalCAE is there proactively working with your team to ensure your engineers keep engineering.

Single Point of Contact

The TotalCAE platform is included with each system and is everything you need for job submission, scheduling, monitoring and reporting. This allows us to respond to any unique business challenges with a system designed and created by TotalCAE for engineers. TotalCAE is not just assembling random 3rd party components together to work; the whole system is purpose-built by TotalCAE to support engineering workloads in just a few mouse clicks.

Option 3: Build on “The Cloud”

Cloud Unicorn

Clients often assume that using the cloud will make HPC easier and is simpler than on-premise HPC which is not the case. Cloud DIY can be even more complex than on-premise HPC unless you are using a Software as a Service (SaaS) offering like TotalCAE offers.

A “simple” base HPC cloud architecture is shown below but all these components must be managed and working together for this to be a usable solution for engineers.

Cloud systems from AWS and Azure provide programmable blocks to develop HPC platforms as a base but what to do with these blocks is still needed. There are very few IT teams that can in a reasonable time frame develop and build a turn-key SaaS HPC environment from the ground up with the required infrastructure, job and project billing, CAE application integration, mix of on-demand, reserved and spot access, license aware scheduling, and have it be reliable and production worthy.

TotalCAE provides a three click “upload model, download result” experience on top of these AWS/Azure building blocks as shown in option 4.

Option 4: Buy TotalCAE Cloud (SaaS)

With TotalCAE, the engineer doesn’t see or even know about all of the behind the scenes complexity. Engineers simply upload their model, push a button and receive an email when the results are ready. Everything else is handled automatically in just a few clicks.

You can get started with a completely working system in typically less than one day enabling engineers to focus on engineering, not IT. The TotalCAE Portal makes it simple to submit, run and manage hundreds of CAE applications.

The TotalCAE Portal

Builders Beware: “Volunteer HPC Admin”

Please, just let me do engineering.

A number of our clients had initially relished the excitement of building the HPC and had been convinced by someone that a senior engineer or an existing IT person could easily add in to their daily tasks managing the IT environment themselves.

Soon these “volunteer HPC” admins quickly get overwhelmed as the go-to person for all HPC upkeep, application updates, troubleshooting and development while trying to keep up with their day job (and your core business) workload.

These overworked and underperforming situations often drive clients to find someone like TotalCAE who focus solely on HPC for engineers resulting in an improved HPC simulation environment for the whole team.

Our Engineers are Building Self Driving Cars, so what is so hard about Building HPC SaaS?

Our engineering clients are super smart, talented folks that are experts in building space rockets, designing autonomous electric cars and creating next generation medical devices. Building, updating and maintaining a reliable HPC on-premise or cloud SaaS is a different kind of domain knowledge that also requires a lot of dedicated time to become an expert to create, manage, and upkeep a reliable system. It is an alphabet soup of things to learn such as:

  1. Programming Experience (Python, YAML, Terraform, JSON, API’s, Git, SQL)
  2. HPC Experience (InfiniBand, MPI, OpenMP, HPC Schedulers)
  3. Expert Linux Systems Administration Experience
  4. Cloud (AMI, S3, VPC, IAM, NAT, ARN, AZ, ASG, EFA, EBS, etc.)
  5. HPC Application Experience (ISV application like ANSYS HFSS, how to run distributed, MPI, HPC support, RSM, Workbench Integration, using the native HPC fabric, application aware monitoring)

TotalCAE has been perfecting our HPC SaaS platform with thousands of hours of development and client feedback over the last 15 years into the easiest way to run CAE workloads on our managed HPC clusters and cloud. Your company can use our expertise to work for you, so you can focus on building the next self driving car, space rocket, or life saving medical device and not doing your HPC IT.

Even if your company has the talent, time, and resources to do it yourself, what competitive advantage does re-inventing an existing commercial solution give the company? Will the company be able to maintain and evolve the solution that is not your core-focus ?

“You should build the things that differentiates your company, and buy all else”

These opportunity costs can be a big drag on engineering companies, when the most talented individuals are not focused on building new products, but on building supporting tools that already exist in the marketplace.

Conclusion

There are many factors that go into your company’s decision making process when reviewing the options for HPC build vs. buy, on-premise or in the cloud. Ultimately, the wealth of HPC and Cloud technologies available today are very powerful building blocks but by themselves do not make a stable, operational, easy to use HPC environment for engineers.

By leveraging best of breed on-premise HPC and Cloud, while hiding the complexity, TotalCAE provides a low-risk, high-reward solution for engineering departments that want to take their HPC simulation to the next level without any of the IT hassle, cheaper than they can do it themselves.

If your company is looking to adopt HPC either on-premise, or on-demand in the Cloud for your engineering simulation goals reach out to us at info@totalcae.com and we can put our 15+ years of HPC, CAE and Cloud experience to work for you.