HTC Project
High Throughput Flow Cytometry is a technique for analysis of flow cytometry information based on large datasets.
The Flow Cytometry Analysis consists in measure a set of parameters from cells in a continuous flow. These parameters can be classified in different categories:
- Shape. Size and complexity in the cytoplasm (granulosity).
- Population. Number of cells that correspond to some characteristics (gated).
- Fluorescence. Can be read simultaneously at different wavelengths: FL1, FL2, etc.
The simultaneous measure of each of these parameters for each sample defines the High Content Screening (HCS), defined as the measure of a large number of information in the same sample at once.
One unique feature of Flow Cytometry is that it measures fluorescence per cell or particle. This contrasts with spectrophotometry in which the percent absorption and transmission of specific wavelenths of light is measured for a bulk volume of sample.
Main Goals
The project consists on developing a tool which helps scientifics to storage and analyze the outcome from the HCS experiments. The main goals of our research are described below:
- Storage of information (integrity of data).
- Analysis of the results.
- Visualization of the results.
Storage of information
Each well of the plate contains something different. It can be classified into 2 main categories: Samples or Controls (positive or negative).
Of course more than one sample can be added to a plate but each sample can also appear several times in the same plate (replication: duplicate or triplicate), we can use more than one control (different in nature or equals in nature but different in concentration).
So for each well we need an unique identifying number (UIN), but it need to be linked with eventual replication (some sample, like the positive may be test again for confirmation, so may have different place in different plate and different set of data but may need to be compare and therefore linked).
Each of these samples will be connected to a set of measures.
On the previous analysis procedures, these samples wre populated from an Excel file.
An Experiment layout (definition of gates) also needs to be linked to the data.
Analysis of results
The data analysis may consists at:
- Sorting data (related to a plate, to a sample, to controls, etc….)
- Statistical analysis (average, SD, Quartiles, hypothesis-driven statistics, etc)
- Combine different subsets to make some calculation (definition of Z, Z’ factors, etc) and/or normalization/standardization of data.
- Data mining, identification of patterns and structure in data.
You can see that data in the present context are multivariate by nature as for each sample there are several different measures at the same time.
Combine the information from sample analysis with properties of the samples (OLAP?)
A concern also will be about aberrant points (outliers): how to identify them rationally (Q test), and out to exclude them from calculation manually (when aberrant due to technical problem during the experience)/automatically.
Visualization
This is probably the most difficult part. As the data is multidimensional for one sample X there is several measurement (Y1, Y2, Y3, …Yn) and informative visualization became complex. This point is related to the data mining approach we see above. [CyVisualization]
A research can be done on this field, using different visualization techniques such as: Dot Plots, 3D Dot Plots, Parallel Coordinates, Scatter Plot, and so for.
Finally we should be able to export the visualization of data for communication or labnotebook.