The FDA’s Data Exploring Tools: ‘Array Track HCA and PCA Packages’

by | Dec 31, 2020 | Data, FDA, Medical Devices, Quality

Clustering is one of the modern approaches to segregate and group data based on similar attributes. It is utilized in many cutting-edge technologies such as AI (Machine Learning) and Cloud Computing to derive key hypotheses, analysis, and add intelligence to ‘search’. An example of such a tool is the FDA’s ArrayTrack’s Hierarchical Cluster Analysis (HCA) and Principal Component Analysis (PCA) packages. ArrayTrack is a database that is developed by the National Center for Toxicological Research (NCTR) for saving microarray data and experimental attributes derived during pharmacogenomics or toxicogenomic studies. Both packages are utilized primarily in clustering of gene-related data elements and deriving conclusions by analyzing the clustered datasets.

Gene expression is the process by which the data in a gene is utilized to build or direct a protein’s molecular structure. Indeed, these proteins dictate the respective cell functionality. Hence, studying or analyzing a significant number of genes (thousands or more) in a specific cell would unravel the actual cell’s functionality. In such cases, the HCA package is significantly useful. This package groups the genomic data having similar gene expressions or genes with similar data components. If scientists run an analysis on such clustered genomic data, the results might unravel novel molecular relationships and cell behaviors which might be employed to devise new medical food and drug products.1

The second package is the PCA that provides the results in the form of 2D and 3D visuals. The principal components (PCs) are defined as the cluster of genes formed using the linear combination. Utilizing variance, a statistical method that is used to measure the spread between two numbers in a dataset, this package generates the spread of a specific gene across all the genes in the current dataset.2 In other words, the algorithm provides the result of how all genes are distributed in the current gene dataset. The analysis of the plot of the first three PCs provides the majority of variance in the dataset and using this data, scientists can reach conclusions stating which genes are a majority or which set is loosely/highly distributed in the present dataset. Therefore, besides being an analytical tool, the PCA package is also an incredible data-exploring algorithm.

Surely, when data clustering tools such as HCA or PCA are integrated with medical devices, manufacturers should make sure their software tool follows the required FDA guidelines. Initially, the clustering tool needs to be documented from a design perspective. The documentation at this stage reflects the entire software architecture and the software requirements. Also, to ensure the software fulfills its designed intended use with the highest levels of quality, safety, and efficiency, manufacturers should perform code inspections and software testing activities such as system, acceptance, regression, and unit level testing. Testing and code inspections identify and mitigate any risks that may be associated with the clustering software or the associated medical device.

To summarize, the Array Track’s HCA and PCA packages are playing a key role in the research and development of novel medical products. By analyzing clustered genomic datasets, scientists can develop theories that may be later employed to develop life-saving medical innovations.3 Indeed, such tools need to be validated to ensure they are safe, secure, and qualified enough to be utilized commercially. Do you have a clustering or an analytics-based application that needs FDA approval?  Our regulatory and software experts at EMMA International can help your medical device get FDA compliant. Contact us at 248-987-4497 or info@emmainternational.com for additional information.


1FDA (February 2019). ArrayTrack™ HCA-PCA Standalone Package – powerful data-exploring tools. Retrieved on December 27, 2020 from https://www.fda.gov/science-research/bioinformatics-tools/arraytracktm-hca-pca-standalone-package-powerful-data-exploring-tools

2Yatnalkar, Govind, Husnu S. Narman, and Haroon Malik. “An Enhanced Ride Sharing Model Based on Human Characteristics and Machine Learning Recommender System.” Procedia Computer Science 170 (2020): 626-633.

3Alexandra Trammer, Jim Damicis (December 2018). The Healthcare Industry as A Critical Driver of Cluster Development. Retrieved on December 27, 2020 from https://www.camoinassociates.com/healthcare-industry-critical-driver-cluster-development

Govind Yatnalkar

Govind Yatnalkar

More Resources

Ready to learn more about working with us?

Pin It on Pinterest

Share This