A researcher applies to access specific parts of the Airwaves dataset, and once their research is approved, they access the data using the Dementias portal. They are not physically situated in the clinical setting and each participant’s data is pseudonymised under the 7-digit participant ID.
Through this pseudonymisation and the fact that researchers will often work with hundreds of thousands of data records during their analysis, the chance that a participant could ever theoretically be identified by a researcher is very small. Researchers are not interested in the individual participant; they ‘stay far away’ from the participant when undertaking their research.
However, while far removed from the participant, a researcher is also in many ways very near to them, as the data offers insights into a participant’s health in ways that they wouldn’t be able to discover on their own. The cohort is typically represented as aggregated data, and an individual’s data might only be noticed in aggregate form if it is an outlier (a data point that is very different to other values within a dataset). Outliers are common and can result from errors in the measuring and reporting process. Through using standard data handling procedures, these outlying data points will be transformed in order to ‘clean’ and ‘wrangle’ the data into a form ready for statistical analysis, though the researcher will not look at a individual participant’s data as part of this process.
I chose to represent the cohort using more standard representations of data (a 3D scatterplot) to highlight the ‘data is data’ viewpoint of the researcher. Here, the issue of an outlying individual within the dataset is resolved through ‘trimming’ the outlier down to size (representing the data ‘wrangling’ process) and setting it to one side. I was drawn to using a blunt approach to represent the removal of an outlier to reinforce how for the researcher the individual data point is secondary to the ‘cleanliness’ of the aggregated cohort dataset.