This past week I spent a lot of my time working on my summer project, whose aim was to construct brain maps purely based on blood flow or neuroactivities. Essentially, the concept behind was quite simple: just use correlation algorithms to compare brain waves of each pixel of the brain against all the other pixels and aggregated the pixels that were similar in patterns. However, just as every other project, there were countless issues when I tried to implement the algorithms from a reference paper. But at the same time, this project brought me a lot of fun since I had a chance to manipulate and explore the algorithms and concepts such as PCA, SVD, correlation-coefficient, etc. that I learned throughout my undergrad and first year in grad school.
During this project, for one time I was suspecting the authenticity of the paper because the initial plot from the paper looked way nicer than what I plotted. Even later in the stage of the project, my plots were not anywhere close to where the paper had. However, when I finished the last part of my code, which clustered the brain sections my previous code segmented, I realized where the problem was by examining the distance among the clusters. Unlike the paper, which had cluster distance close to 0.4 to 0.6, all my cluster distance were around 0.1 to 0.2. It might sound very confusing why those numbers even mattered, but it told me an important piece of information that I previously ignored -- the noise. I expected noise to exist in the brain waves measured, but did not expect it would have a significant impact in the result. What I saw from my results directly told me how naive I was. Noises that constantly existed in the background, when integrated into the measured brain waves, would cause all the brain waves to have certain similarities that were directly reflected with high correlation values. In other words, noises from one brain pixel is very similar to noises from other brain pixels. The high correlation values eventually caused the algorithms not able to identify the differences. After identifying the problem, I could not help but think that in the field of science, we could not avoid handling data analysis, which essentially just a group of numbers. Those numbers can make a case really complicated but sometimes can make them significantly easier. Different ways to approach the same data would eventually end up with completely different results. It was not only about the powerful techniques that we as scientists could use to get a more convincing scientific result, but also granted us the power to manipulate and only publish what we believe to be true. Even with reviewers as monitoring to make sure the rightness of using a specific data analytical method, it lied within us to determine how to manipulate those numbers to make the most meaningful result without bias.
No comments:
Post a Comment