Stem Pharm is a biotechnology company focused on developing cutting-edge solutions to diagnose and treat cancer. They specialize in cancer genomics, proteomics, and cell analysis. They needed a cost-effective and elastic environment for processing large amounts of BCL files, and Cloud303, an AWS partner, was brought in to help.
StemPharm was struggling with the optimization of cost and performance while processing large amounts of BCL files in their bioinformatics workflows. They required an elastic environment that could handle their growing demands and be cost-effective at the same time. The company needed to securely store and process genomic data and metadata while ensuring accessibility and confidentiality for their research team. StemPharm required a solution that could handle the processing of different applications in sequence, while also providing a secure portal with authentication for access to images, genomic files, and metadata stored in S3.
Cloud303's engagements follow a streamlined five-phase lifecycle: Requirements, Design, Implementation, Testing, and Maintenance. Initially, a comprehensive assessment is conducted through a Well-Architected Review to identify client needs. This is followed by a scoping call to fine-tune the architectural design, upon which a Statement of Work (SoW) is agreed and signed.
The implementation phase kicks in next, closely adhering to the approved designs. Rigorous testing ensures that all components meet the client's specifications and industry standards. Finally, clients have the option to either manage the deployed solutions themselves or to enroll in Cloud303's Managed Services for ongoing maintenance, an option many choose due to their high satisfaction with the services provided.
To address Stem Pharm's requirements for handling their advanced human neural organoids and the associated complex data, Cloud303 - an AWS Premier Consulting Partner - devised a comprehensive solution leveraging various AWS life sciences services. The architecture was designed to support the storage, processing, and analysis of imaging, genomic, transcriptomic, and metadata from StemPharm's organoid models.
Data Storage and Management
Amazon S3 was utilized as the primary storage solution for StemPharm's data, including images, genomic files, transcriptomic data, and metadata. S3's scalability, durability, and security features ensured that data was stored safely and accessible when needed. S3 buckets were organized in a hierarchical structure to enable efficient data retrieval and management.
Data Processing and Analysis
AWS Batch was employed to manage and optimize the execution of various bioinformatics applications in sequence. AWS Batch enabled dynamic allocation of compute resources based on the workload requirements, ensuring cost-effective and efficient processing.
Workflow Orchestration and Optimization
Nextflow pipeline scripts were developed to orchestrate data processing from S3 buckets. The pipeline scripts were designed to automatically manage the execution of multiple applications in sequence, such as bulk and single-cell transcriptomic analysis, image processing, and cell type-specific analysis, while also handling error recovery and parallelization for optimal processing.
Compute Resources
Amazon EC2 instances, including c5.9xlarge, p3.2xlarge, and g4dn.xlarge, were provisioned to handle the diverse processing requirements of StemPharm's organoid data. EC2 instances were selected based on the specific requirements of each application in the Nextflow pipeline, ensuring optimal performance and cost-efficiency.
Data Security and Access Control
A secure portal was developed to provide authentication and access control mechanisms for StemPharm's data stored in Amazon S3. AWS Identity and Access Management (IAM) was used to define user roles and permissions, ensuring that only authorized personnel could access specific data, images, and files.
Monitoring and Reporting
AWS CloudWatch and CloudTrail were implemented to monitor the performance and usage of AWS resources. Custom dashboards were created to provide real-time insights into the status of the data processing pipeline, enabling StemPharm to quickly identify and address any issues.
By utilizing this architecture, StemPharm was able to securely store, process, and analyze the complex data generated from their advanced human neural organoids. The combination of AWS life sciences services and Nextflow pipeline scripts enabled seamless data processing, while the integration of machine learning and AI technologies empowered StemPharm to gain deeper insights into their organoid models for neurological drug discovery research.
It was incredibly rewarding to work with Stem Pharm. We customized an AWS architecture that really hones in on their specific challenges—cost optimization and high-throughput data processing for cancer genomics. By using AWS Batch and Nextflow, we've automated and streamlined their bioinformatics pipelines. It's not just about technology; it's about enabling faster, more impactful cancer research.
Stem Pharm was able to optimize its costs significantly by moving its bioinformatics processing to the AWS cloud. The company was able to achieve a cost reduction of 40% compared to their on-premises infrastructure. This cost optimization was achieved through the use of AWS Batch, which allowed Stem Pharm to utilize only the computing resources they needed at any given time, and also helped reduce operational costs associated with maintaining an on-premises infrastructure.
The use of AWS Batch, Nextflow pipeline scripts, and different applications and workflows, enabled Stem Pharm to significantly increase their bioinformatics processing speed. The processing time for BCL files was reduced by 60%, from an average of 12 hours to 5 hours, enabling Stem Pharm to deliver results to their clients much faster. The development of a secure portal for Stem Pharm ensured that their data was secure and accessible only to authorized personnel. The portal provided authentication and access control mechanisms that allowed Stem Pharm to control who could access their data, images, and genomic files stored in S3 buckets.
The EC2 instances that were used included c5.9xlarge, p3.2xlarge, and g4dn.xlarge. The EBS volume type used was gp3, and the S3 storage class used was Standard. The AWS Batch compute environment was optimized for EC2 instances, and the pricing is $0.01 per vCPU-second. The 12-month TCO analysis showed that the total cost for EC2 instances, EBS volumes, S3 storage, and AWS Batch is $44,618.92, $2,400.00, $2,760.00, and $60,000.00, respectively, resulting in a total 12-month TCO of $109,778.92.