Accelerating Cancer Research: How Cloud303 Optimized Stem Pharm’s Bioinformatics Workflows on AWS and Cut Costs by 40%

High – Performance Computing

  • 30 September 2023
Share this post
AWS Funding Secured by Cloud303
  • Partner Opportunity Acceleration
  • Well-Architected
  • Migration Acceleration Program 2.0

About the Customer

Stem Pharm is an innovative biotechnology company, specializing in the field of cancer research. Their primary focus lies in cancer genomics, proteomics, and cellular analysis to develop cutting-edge solutions for diagnosing and treating various forms of cancer. Facing challenges in data storage, processing, and security, they sought to optimize their bioinformatics workflows in a cost-effective, secure, and scalable environment.

Summary

Stem Pharm is a biotechnology company focused on developing cutting-edge solutions to diagnose and treat cancer. They specialize in cancer genomics, proteomics, and cell analysis. They needed a cost-effective and elastic environment for processing large amounts of BCL files, and Cloud303, an AWS partner, was brought in to help.

Problem Statement

StemPharm was struggling with the optimization of cost and performance while processing large amounts of BCL files in their bioinformatics workflows. They required an elastic environment that could handle their growing demands and be cost-effective at the same time. The company needed to securely store and process genomic data and metadata while ensuring accessibility and confidentiality for their research team. StemPharm required a solution that could handle the processing of different applications in sequence, while also providing a secure portal with authentication for access to images, genomic files, and metadata stored in S3.

Why Cloud303?

  • Demonstrated Expertise in HPC Cloud303 possesses specialized expertise in HPC, which is crucial for applications that require complex computational processes. This includes genomics sequencing, molecular modeling, and advanced simulations.
  • Robust Infrastructure The infrastructure provided by Cloud303 is tailored to meet the stringent performance, reliability, and scalability needs of HPC. Our team offers a robust ecosystem that can handle large-scale and intricate computations.
  • Exceptional Support and Security Cloud303 offers round-the-clock exceptional support, along with proven security protocols, to ensure that the sensitive data and complex workloads are managed in compliance with industry standards.
  • Proven Track Record Cloud303 has a strong history of successful partnerships within the life sciences industry. Our commitment to excellence, reliability, and client-focused solutions have made us a trusted partner.

Engagement Overview

Cloud303's engagements follow a streamlined five-phase lifecycle: Requirements, Design, Implementation, Testing, and Maintenance. Initially, a comprehensive assessment is conducted through a Well-Architected Review to identify client needs. This is followed by a scoping call to fine-tune the architectural design, upon which a Statement of Work (SoW) is agreed and signed.

The implementation phase kicks in next, closely adhering to the approved designs. Rigorous testing ensures that all components meet the client's specifications and industry standards. Finally, clients have the option to either manage the deployed solutions themselves or to enroll in Cloud303's Managed Services for ongoing maintenance, an option many choose due to their high satisfaction with the services provided.

Solution Provided

To address Stem Pharm's requirements for handling their advanced human neural organoids and the associated complex data, Cloud303 - an AWS Premier Consulting Partner - devised a comprehensive solution leveraging various AWS life sciences services. The architecture was designed to support the storage, processing, and analysis of imaging, genomic, transcriptomic, and metadata from StemPharm's organoid models.

Data Storage and Management

Amazon S3 was utilized as the primary storage solution for StemPharm's data, including images, genomic files, transcriptomic data, and metadata. S3's scalability, durability, and security features ensured that data was stored safely and accessible when needed. S3 buckets were organized in a hierarchical structure to enable efficient data retrieval and management.

Data Processing and Analysis

AWS Batch was employed to manage and optimize the execution of various bioinformatics applications in sequence. AWS Batch enabled dynamic allocation of compute resources based on the workload requirements, ensuring cost-effective and efficient processing.

Workflow Orchestration and Optimization

Nextflow pipeline scripts were developed to orchestrate data processing from S3 buckets. The pipeline scripts were designed to automatically manage the execution of multiple applications in sequence, such as bulk and single-cell transcriptomic analysis, image processing, and cell type-specific analysis, while also handling error recovery and parallelization for optimal processing.

Compute Resources

Amazon EC2 instances, including c5.9xlarge, p3.2xlarge, and g4dn.xlarge, were provisioned to handle the diverse processing requirements of StemPharm's organoid data. EC2 instances were selected based on the specific requirements of each application in the Nextflow pipeline, ensuring optimal performance and cost-efficiency.

Data Security and Access Control

A secure portal was developed to provide authentication and access control mechanisms for StemPharm's data stored in Amazon S3. AWS Identity and Access Management (IAM) was used to define user roles and permissions, ensuring that only authorized personnel could access specific data, images, and files.

Monitoring and Reporting

AWS CloudWatch and CloudTrail were implemented to monitor the performance and usage of AWS resources. Custom dashboards were created to provide real-time insights into the status of the data processing pipeline, enabling StemPharm to quickly identify and address any issues.

By utilizing this architecture, StemPharm was able to securely store, process, and analyze the complex data generated from their advanced human neural organoids. The combination of AWS life sciences services and Nextflow pipeline scripts enabled seamless data processing, while the integration of machine learning and AI technologies empowered StemPharm to gain deeper insights into their organoid models for neurological drug discovery research.

Engineer Quote

It was incredibly rewarding to work with Stem Pharm. We customized an AWS architecture that really hones in on their specific challenges—cost optimization and high-throughput data processing for cancer genomics. By using AWS Batch and Nextflow, we've automated and streamlined their bioinformatics pipelines. It's not just about technology; it's about enabling faster, more impactful cancer research.

Tim Furlong Principal Solutions Architect (HPC and Life Sciences), Cloud303

Outcomes

Stem Pharm was able to optimize its costs significantly by moving its bioinformatics processing to the AWS cloud. The company was able to achieve a cost reduction of 40% compared to their on-premises infrastructure. This cost optimization was achieved through the use of AWS Batch, which allowed Stem Pharm to utilize only the computing resources they needed at any given time, and also helped reduce operational costs associated with maintaining an on-premises infrastructure.

The use of AWS Batch, Nextflow pipeline scripts, and different applications and workflows, enabled Stem Pharm to significantly increase their bioinformatics processing speed. The processing time for BCL files was reduced by 60%, from an average of 12 hours to 5 hours, enabling Stem Pharm to deliver results to their clients much faster. The development of a secure portal for Stem Pharm ensured that their data was secure and accessible only to authorized personnel. The portal provided authentication and access control mechanisms that allowed Stem Pharm to control who could access their data, images, and genomic files stored in S3 buckets.

The EC2 instances that were used included c5.9xlarge, p3.2xlarge, and g4dn.xlarge. The EBS volume type used was gp3, and the S3 storage class used was Standard. The AWS Batch compute environment was optimized for EC2 instances, and the pricing is $0.01 per vCPU-second. The 12-month TCO analysis showed that the total cost for EC2 instances, EBS volumes, S3 storage, and AWS Batch is $44,618.92, $2,400.00, $2,760.00, and $60,000.00, respectively, resulting in a total 12-month TCO of $109,778.92.

The processing time for BCL files was reduced by 60%, from an average of 12 hours to 5 hours, enabling Stem Pharm to deliver results to their clients much faster.