The Black Box Body: Cultural Context and Machine Learning in Medical Diagnostics

29 min readJun 20, 2021

A summary of an ethnographic analysis looking at medical uses of AI. It documents a San Fransisco based deep learning startup building a radiological diagnostic application, conducted as part of the Digital Anthropology MSc, UCL in 2018. It focuses on the materiality of data, looking at the relationship between the machine learning model and the physical bodies it interprets.

Technology and medicine have always been closely tied, but increasingly developments in medical technology lie in the hands of data scientists. Medicine is a highly informational discipline based on standards, codes, and categorisations that allow doctors to understand biology as ordered systems. A primary example is the International Classification of Diseases (ICD), the diagnostic manual updated every ten years by the World Health Organisation. But the use of large data sets marks a shift from traditional medical informational structures to a different statistical approach. In 2017 the UN hosted the ‘AI for Good Summit’ at which Margaret Chan, Director-General of the World Health Organisation said “AI is a new frontier for the health sector… But we do not have the answers to many questions around AI — we’re not even sure we know all the questions that need to be asked (Clark 2017)”. Increased computing power and the increased availability of data over the last ten years have meant that machine learning (ML) and artificial intelligence (AI) technologies have moved from theory to application. Instead of finding a needle in the haystack, these softwares are finding the patterns in the hay (Sulivan 2013). These patterns create a set of ‘rules’ used to diagnose the patient which are developed entirely by data-driven algorithmic models so complex that the reasoning behind the results cannot be extrapolated.

These technologies rely on the common conception that data is the same as fact, and the use of algorithms to sort and interpret it enables objective, unbiased decision making. This leads to the idea that these data driven decision making models sit apart from the messy, situated world in a clean, euclidean realm that is able to sift out the relevant from irrelevant and provide us with something like automatic truth. The principle of using data to reveal and therefore control is reflected in previous ethnographic analysis of medical imaging which is understood as a mechanism of making “visible truth” (Wood 771:2016). Adrian Mackenzie’s work on bioinformatic algorithms points to how the use of algorithms transforms the body into “a mutable, sortable, comparable set of elements” in which the living, physical body seemingly becomes “a somewhat abstract relational entity, potentially open to many different determinations” (Mackenzie 2003:317). He documents how algorithms are often understood as abstract entities of pure, algebraic logic which manifests a “context-free grammar” (Mackenzie 2006:4).

However, much has been written to demonstrate how data, code and things created by code such as algorithms, are in themselves cultural objects and therefore subject to all of the social complexities of any other social object and therefore cannot be objective (Forsythe 2001; Mackenzie 2006; Helmreich 2007; Kelty 2008; Coleman 2008; Wilf 2013; Seaver 2014; Richardson 2015). This reflects the historical discourses on the political nature of data-driven, categorical technologies. The recursive relationship between the classification and the physical realities they create and therefore control are often a strong theme (Foucault 1988; Hacking 1990; Rabinow 1996; Bowker and Star 1999; Clarke 2010). Annemarie Mol points out that medical classification systems “inform and are informed by our bodies, the organisation of our health care systems, the rhythms and pains of our diseases, and the shape of our technologies” (Mol 2002:7). Deborah Lupton highlights the seemingly binary contrast between the “clean orderliness” of data and the messy, situated physical body, pointing to the belief that data has the ability to “contain and control the inherent and mysterious tendency towards disorder (disease, disability, pollution, and early death) of the body” (Lupton 2013). Her reference to control chimes with the conclusions of the many scholars who have shown how technologies of biological measurement such as the census, are shaped into tools to produce hierarchy, discipline, and control. Ian Hacking’s work on the ‘looping effect of human kinds’ (1996) demonstrates how the process of categorising people directly affects how we define and understand ourselves in a social context. Once a phenomenon or attribute is isolated and named, it becomes a category that people associate themselves with. This informs their behaviour and a reflexive relationship begins between social definitions and social realities. Hacking terms this ‘making up people’, a phenomenon he says “changes the space of possibilities for personhood” (Hacking 2006:193). Hacking gives the example of how homosexuality was medically defined in the late 19th century, resulting in the establishment of homosexual and heterosexual as ‘kinds of being’, thus enabling laws to be written which could officially discriminate against it (Hacking 2006:163). In the medical context, Bowker and Star’s analysis of the ICD characterises it as a “treaty”, a “bloodless set of numbers obscuring the behind-the-scenes battles informing its creation” (Bowker and Star 1999:68). In a similar vein to Hacking’s ‘making up people’, their analysis highlights the political consequences of medical categorisation giving the example of the definition between a legal and illegal abortion. They demonstrate how the social, cultural, and political negotiations involved in the creation of such a distinction are obscured once the categorisation is published. It becomes accepted, taken for granted as a set of “natural facts” (Bowker and Star 1999:54) creating a standardised, naturalised body based on an informational structure.

In the context of AI and machine learning technologies, the consequence of this recursive, situated relationship between those creating AI applications and the consequences of the use of the applications is often discussed as encoded bias. Bias is an inaccuracy that is systematically repeated, a phenomenon that is the same whether it is cognitive or statistical, conscious or unconscious. An example of encoded bias can be seen with the Google Translate service. If you translate “he is a nurse, she is a doctor” from English to Hungarian, and then back again the gender of the nurse is reversed from male to female and the doctor is converted to male (Fig.1). Neither language is gendered. Google Translate has not decided to be biased against female doctors in Hungary, but the existing social structures which equate the role of nurse as female and doctor as male are amplified and reinforced by production and use. When examples like this are placed in the context of Hacking’s ideas of ‘making up people’, one can see how these types of errors can begin to reinforce social dynamics in subtle and implicit ways. This is why encoded bias is a topic often at the forefront of the discourse surrounding these technologies.

And yet those building these technologies claim that the use of specific ‘deep’ or ‘unsupervised’ learning techniques means that the scale of the data used and the complexity of the models circumvents the socially situated, context-dependent biases evident in other machine learning technologies.

Figure 1. Google Translate

Image credit: https://translate.google.com/

In this article, I look to investigate how these issues are reflected in the engineering practices of those making these technologies. The scale of data produced by the field of radiology has led to it being dubbed ‘the Silicon Valley of medicine’ (Brouillette 2017) as it provides huge datasets through medical imaging with which to explore such technologies. I shall discuss the findings of a short ethnographic study of DataScan, a San Fransisco based technology startup company building biomedical diagnostic technologies with machine learning techniques that are commonly associated with artificial intelligence. The key software I shall focus on is an application to automatically diagnose lung cancer directly from chest CT scans. The DataScan application is a hybrid mix of software, algorithms, and data, some of which cannot be scrutinised. Adrian Mackenzie points out that “software designates a multidimensional and mutating object of analysis” (2006:2). This is true at the best of times, and it is made more complex by the black box factor. So how is one to practically engage with such a shifting target?

Ethnographic analysis isn’t going to open the black box, but Annemarie Mol cautions against getting “mesmerised by the numbers that pour out of machines”(Mol 2002:64). Instead, she advises that one takes “a step back in order to consider how such numbers are created” (Mol 2002:64). Her ethnography of the disease atherosclerosis recommends a conceptual and methodological focus on practices of various medical environments to understand the body as an “enacted” entity (Mol 2002:32). Taking a similar approach I, therefore, focus on the practices and attitudes of the engineering team to investigate this application and what it enacts. I look to understand how the use of AI techniques in medical technology intervenes in the traditional anthropological discourses surrounding the relationship between information structures, algorithms, and data. I look to understand whether these kinds of deep learning models create a new relationship between information and the body, or if they are an extension of traditional categorical technologies.

Data Doctrine

There’s an old saying that San Fransisco is a 7x7 grid surrounded by reality. A vast amount of money, expertise and power is sloshing around the Bay Area, leading to a magnification of possibilities and a self-perpetuating culture that, unsurprisingly, prioritises technology over all else. The concentration of money, power, and expertise in the area supports and promotes a cultural norm in which tech companies and their employees see themselves as being able to provide innovative new solutions to old problems. There is a pervasive sense that existing solutions to established problems must be ‘optimised’ or ‘disrupted’. Data Scan’s sales presentation states that 90 million patients worldwide are being misdiagnosed through radiological errors, and 110 million will have a diagnosis missed every year. They claim their software is capable of finding cancer two years earlier than a radiologist. Early detection of cancer is shown to be hugely important, with five-year survival rates dropping from 31% for patients detected with Stage 1 small cell lung cancer (SMLC) to 2% for those diagnosed with Stage 4 SMLC (American Cancer Society 2016).

DataScan is a small team of engineers, data scientists, and business development staff. There is also a radiologist on staff, Dr. Irving, who works in medical practice for half the week and with DataScan for the other half. They are all young, affluent, and male, a demographic consistent with the much-documented monoculture of the Silicon Valley tech industry. There is an important and rich discourse on the link between encoded bias and a lack of diversity in engineering teams. However, in this article, I shall focus on a different dynamic in the group. One day David, one of the lead engineers looked like he’d had a tough day and when I asked him about it he replied: “we’re in the business of saving lives, it’s not all going to be easy.” And yet, apart from Dr Irving, none of the team in the DataScan office building the software have any medical or healthcare training, no experience with medical software or working in a healthcare environment. They have conviction in their ability to save more lives through the practice of data science and the use of machine learning than traditional medical methods used by a radiologist.

There is an inherent concept ingrained in the DataScan team’s work that medical problems and data problems are the same thing. Data Scan’s mission statement reads: ‘Every time a doctor sees a patient, they are solving a complex data problem. The goal of each case is to arrive at an optimal treatment decision based on many forms of clinical information…our mission [is] to improve patient outcomes by using this data to its maximum potential.’ Martin, the founder of DataScan has described the inception of the company as his realisation that “figuring out what’s wrong with you and how to make you better is just a data problem. So I was like ‘oh I can do data problems, I don’t know anything about medicine, but I know about data problems’.” Peter, another team member explained to me that that the reason DataScan understands the radiological diagnostic process as a ‘complex data problem’ is because radiology is already fundamentally informational. Almost all clinical specialties have come to rely heavily on radiology in order to be able to function (Royal College of Radiologists). There are over 20,000 potential medical issues that affect the chest cavity alone which can be diagnosed through radiological methods. The role of the diagnostic radiologist is to non-invasively identify medical abnormalities, providing diagnostic recommendations through the visual analysis of medical images such as CT scans or X-Rays. The ability to identify anatomical form through layers of relatively abstract greyscale imagery and apply rigorous medical knowledge to them requires huge skill and training (Fig.2). In Dr. Irving’s description, diagnostic radiology in the US healthcare system deals primarily with medical imagery and written information about bodies. The attending physician sends the patient for a scan, the radiologist receives the scan and the symptoms report provided by the attending physician. The radiologist reviews the material comes to some diagnostic recommendations based on their significant training and expertise, and verbally dictates a report which is either transcribed by a person or a voice recognition system. This report goes to the attending physician and the patient to inform the treatment decisions. It is also sent to a team of people who manually code the report with an ICD (International Classification of Diseases) code for billing purposes. This process creates a lot of data. DataScan’s sales material states that by 2020 there’ll be 25,000 petabytes of medical data available worldwide and claims they can use deep learning techniques to exploit the data ‘to its maximum potential’ and improve patient outcomes. In an internal study, DataScan achieved results that surpassed the performance of a panel of 4 radiologists by 50% in the malignancy classification of 1000 examples. The study showed that the panel had precision of 33.7% and recall of 93.0%, while DataScan’s model exhibited precision of 56.3% and recall of 100%. Studies of this nature suggest the model can ‘out-perform’ a radiologist. This is the question that the team is grappling with on a daily basis — is the model better at diagnosing lung cancer than radiologists, and if so, how much better?

Figure 2: Chest CT scan example

Liam, the data scientist on the team, shows me the process of training the model, a dense stream of numbers with no syntax, punctuation or columns, a wall of fluctuating numbers (Fig.3). Once the processing is complete he points to a chain of numbers at the bottom and happily informs me that the model is 91.7% accurate (Fig.4).

Figure 3 & 4: The DataScan model

The team holds that the statistical rigour of the model enables them to circumvent the human tendency to error, and therefore out-perform doctors. Returning once again to the mission statement, their ability to use the available data to its ‘maximum potential’ holds the implication that doctors can’t. Taken in tandem with the idea that the data is fully representational of the physical body, the further implication is that the body can speak for itself through the model, freed of human bias of medical interpretation. The physical body of the radiologists therefore also starts to recede, replaced by the model.

There is a sense that despite the training and expertise of radiologists, the doctors cannot escape their human bias, the limits of their physical bodies. Kyle and the rest of the team regularly refer to the subjectivity of radiologists. Frederick, James and Kyle all separately tell me that radiologists have a 20% error rate with themselves — that is if they give a report and then are shown the same data a year later, 20% will give a different answer to what they stated the first time. Another example was the comparison drawn between the memory capacity of a human brain which is around 2.5 PB of memory data (Fisher 2017) and the amount of available data which will be 25,000 petabytes by 2020 according to their sales material. This betrays a commitment to statistical analysis over the prioritisation of relevant information over irrelevant. To them, it’s about statistical certainty gained through scale rather than the ability to identify the right information for a specific case. Both studies demonstrate the belief that the diagnostic process of a radiologist evaluating a scan and coming to a diagnosis can be simplistic and misleading while the use of large datasets and machine learning guarantees statistical certainty and outputs which are accordingly far more reliable and representational.

However, while their efficacy rating would imply that this is correct, in order for medical problems and data problems to be the same thing, the body has to be translated into data without loss, not just a “model of the essence” (Johnson 2008:108). This happens through the act of scanning the patient and is part of a long and established relationship between the biological and informational in medicine. Adrian Mackenzie points to how the body increasingly lost its “corpuscular, organismic character” throughout the 20th Century (2003:316). In his work on bioinformatics, he refers to the process of rationalising the body into an algorithmic context as the “vectorising” of the body into a single “high-dimensional” space (Mackenzie 2015:431). Historian David Rosenberg’s etymological analysis of the word ‘data’ shows that the meaning of the word has transformed from its seventeenth century definition as an axiomatic premise, the tenet of an experiment, to the modern interpretation of ‘data’ as the outcome of an experiment. In this modern incarnation, ‘data’ is therefore seen as having an inalienable relationship to fact and truth (Rosenberg in Gitelman 2013:17). This is certainly the case at DataScan. Liam tells me that “data is fact when data covers a lot of different things and is representative enough.” In day-to-day conversation, Liam refers to the data and the model using highly material language. When quizzed on this he says “I always thought of data as something very concrete that we can digest. It’s also very relevant to the fact — it depends on how people interpret it. But data is data, it’s solid, it’s like the table and chair, it’s an object. It’s not meant to be distorted. For me, data is more like fact.”

The physical body of the patient is not overly present in the DataScan offices. And why would it be? Dr. Irving is the only person on the core team with any medical experience, and as he explained, he understands his practice to be very much concerned with the informational body over the physical. Instead, the team is focused on the size and quality of the data and the performance of the model. Any discussions of the physical body is seen as irrelevant because it is understood to be totally represented by the scans. This chimes with the documented widespread understanding of medical information and images as objective (Beaulieu 2001) and transparent (Joyce 2008), thus rendering them with the capacity to “make visible truth” (Wood 2016:771). In the case of DataScan, this is twinned with the parallel notion that data can be manipulated to understand bodies as “knowable and transformable” through “cross-linking and reformatting biological information sources so that connections between them become accessible on the space of a single computer screen” (Mackenzie 2006:59).

Statistical Intuition

But if radiology is already informational, and bioinformatics have been collapsing the body into a single computer screen for years, why do the team understand their application to be doing something new? According to Martin, one of the key capabilities of the model is that it has the capacity to gain “intuition, like a doctor”. This holds the implication that while other diagnostic classification systems are rigid and hard, the model’s AI capacity can replicate the nuance of a medical mind, as opposed to just regurgitating the structures of a classification system such as the ICD (International Classification of Diseases). The team sees the black box and the unsupervised learning techniques as the real point of difference. The definition of intuition is “a thing that one knows or considers likely from instinctive feeling rather than conscious reasoning” (OED Online 2018). In some ways that rings true, the model isn’t conscious and it certainly isn’t reasoning but it is establishing a likelihood. However, that likelihood is established through statistical evaluation — no feeling, no instinct. These are highly subjective qualities to be trying to emulate if one is building something to be reliably accurate. Is it fundamentally counter to the statistical rigour that is demonstrated to me by Liam’s 91.7% accuracy rating. This raises the question of how the model actually works, and how they train it to be like a doctor, but also not like a doctor.

The DataScan team uses two types of data to train the lung cancer diagnostic model- local and global. Global data is from sets of CT scans that have an ICD code attached to them, supplied from their partners and clients. This will indicate to the model whether there is an abnormality or not, but not demonstrate where. Local data is gathered through the manual processing and labeling of scans from a major longitudinal study into lung cancer. A group of radiologists, recruited online from across the world use a custom online interface to label these CT scans with diagnostic information remotely. The radiologists not only indicate the presence of a disease, they also demonstrate to the model where the diagnostic indicators (nodules) are in the scan, pointing out those which are both malignant and benign. This practice effectively crowdsources a highly detailed set of annotated training data which is intended to enable the model to distinguish between cancerous tissue and all other anatomies, regardless of any variation in the imagery. This enables the model to look for anything that is a ‘signal’- any part of underlying data that is significant in getting the information they want. Kyle, another of the engineers tells me that if it isn’t a signal “everything else is noise.”

The radiologists label the scans based on a set of labeling protocols written by DataScan in collaboration with radiological advisors. Kyle explains that in training the rads in the labeling protocols, DataScan is essentially asking them to “throw away your definition of clinically relevant, this is your new definition of how to read an image. Point out everything that is of interest in these new guidelines.” Kyle adds that this process is the “special sauce” of the technology and the part of the engineering process which is most commercially sensitive. This is because there is a significant difference between interpreting an image to reach a diagnosis and interpreting the same image to train an AI.

This difference comes from the structure of the algorithm. Many of the algorithms we are exposed to on a regular basis such as automated spell check operate as a chain of categories and commands processed in a linear manner. Each command is specifically programmed and therefore can be explained. But the DataScan model relies on a convolutional neural network (CNN). CNNs are discussed in biological terms, a network of distributed mathematical processing units designed to emulate the structure of neurons in the human brain. The input data, in this case, the CT scan, is broken into pixels and distributed between layers of small computational units connected in a matrix (Fig.5). The processor blocks in the first layer establish patterns and correlations with other data sets it has encountered and feeds its output up to the next layer. These units perform analysis at ever-increasing levels of sophistication until it can classify the image as a whole with ‘normal’ or ‘abnormal’, ’cancer’ or not ‘cancer’.

Figure 5: Convolutional Neural Network

Image credit: http://www.ais.uni-bonn.de/deep_learning/

Despite the notion that they are training software to ‘think like a doctor’ and using models that are intended to mirror the structure of biological neurons, the team is in fact training doctors to think like an AI. The model is doing something very different to a doctor. The model doesn’t know what a vein is, or lung or a nodule. It doesn’t know what a patient is or a radiologist. It breaks the CT scans down into their most granular, abstract form, and finds statistical correlations between the pixels in the images to create mathematical rules to synthesise the medical knowledge and align data with outputs by trying “every possible combination” (Kyle). This represents a move from medical expertise to statistical certainty. British and American radiologists are taught when looking at a CT scan to follow the Fleischner Criteria or the British Thoracic Society guidelines. These guidelines will say if the nodule is between 6 and 8 mm in diameter, the patient has previous scans within the last year showing that the nodule was between 4 and 6 mm and it has a set of suspicious characteristics, there is a 10% chance it is malignant. In contrast, Kyle describes the model as trying “every possible combination, looking at the entire feature space of all of those CT scans, as well as the known outcome of whether or not it was cancer, and it’ll come up with its own sequence of patterns.” In doing that, it is suggested the model will learn to look at different features than those identified by the Fleischner Criteria. Kyle suggests the model “probably figured out who cares if it’s between 4mm and 6mm? What really matters is if it has 4 solid white specs throughout it and previously it did not.” Essentially the model is defining its own rules that are irrespective of medical categories, processes, and definitions, which they propose will end up getting better results. Therefore, from a technical perspective, what Martin means by ‘intuition’ is that when the model encounters something it has not encountered before, instead of returning an error it will “adjust the learning environment” (Kyle). This means it will factor those new things into its analysis. While this ability to self-adjust may seem like intuition, in fact, the model is forming a set of what Kyle refers to as “concrete rules that define what a finding is based on all the examples that it has looked at.” These concrete rules “do real math on to determine whether [a CT] falls into a category” (Kyle). So in essence, despite the mystery of the black box, the processes and practices of the DataScan team are still highly linked to the processes and practices of other information structures such as the ICD. Both are a set of “concrete rules” (Kyle) obscured by a black box that creates ‘natural facts’ (Bowker and Star 1999). The difference is that the black box constructed by the ICD is one of bureaucracy, not statistical complexity.

The Bureaucracy of Black Boxes

The team’s belief in the viability of statistical certainty through the use of big data sets and unsupervised learning lead to the notion that the model can modulate around any variables apparent in the scans. I asked James about how they allow for culturally specific variations in the data. James replied that their application is “culturally agnostic” and explained that “just because we don’t know or understand the culture doesn’t mean we can’t find abnormalities [in the CT scans].” Similarly, whilst explaining the categorisation processes of the model to me, Kyle mentioned that the definition of ‘diameter’ in the context of the diameter of a nodule has various interpretations depending on which school of medical practice you adhere to. These small distinctions are translated into the local and global training data implicitly but not directly acknowledged. This is due to the scale of the data and the understanding that statistical rigour will circumvent the need to address it. One of the other engineers said to me when I raised a similar question that “context is one of the variables”, implying that the environmental discrepancies between bodies, culture, environment and medical practice were all just features to be included in the model, i.e.: with enough data, one can model the world, and account for all variables.

However, in their analysis of the ICD, Bowker and Starr go on to discuss the tension that arises when “the messy flow of bodily and natural experience must be ordered against a formal, neat set of categories” (1999:68). They discuss how in practice, doctors take the apparently rigid parameters of the ICD and apply them in different ways within different environmental contexts. The role the ICD plays is to provide some sense of cohesion in international medicine and provide a “stabilising force between the natural and social worlds” (Bowker and Star 1999:86). The designers of the ICD acknowledge this directly, claiming they have attempted to “paint a fluid picture of the world of disease — one that is sensitive to changes in the world, to socio-technical conditions, and to the work practices of statisticians and record keepers” (Bowker and Star 1999:79).

Similarly, Kyle directly acknowledged that the model will “change depending on the data set that it is trained on, the guidelines used to prepare that data set, even how Liam and I are feeling that day.” This reference to the data collectors and their own feelings and moods brings data and the engineers themselves to the fore. The code which creates the model and the application around it bears traces of those that built it. Mackenzie goes on to critique his own definition of code as a “context-free grammar” pointing out that the formalism applied to code is both “appealing and mystifying” (2006:4). Appealing because it simplifies code into a highly pervasive and convenient cultural imaginary. Mystifying because as it does not take into account that the “actual code that programmers read and write… encounters lives, institutions and events” (Mackenzie 2006:4). This tendency for the code that programmers write to encounter lives, institutions, and events is described by Liam as the “human noise impact”. It’s the data that drives the algorithm, and data carries context. As Liam puts it: “any improvement [of the model] is conditioned on the data, and in the real world, people collect data from different hospitals and clinics, use different equipment and imaging techniques… Any data comes with a condition, especially if you think about how the data is collected, presented, or gathered. Data is basically a summary of something, and that something is living in an environment so any data we have is on the condition of the environment it is collected in and any errors of the data collector. It is true that you can think of [data] as mostly factual, but you have to think of them as in context. They have to be very much in context.” Lupton summarises this by saying: “The ways in which phenomena are quantified and interpreted, and the purposes to which these measurements are put, are always implicated in social relationships, power dynamics and ways of seeing” (Lupton 2013:399). Thus, the outputs of the model are inalienably linked to all the processes and people which contribute to its functioning. It’s the data that drives the model and the data which encodes and embodies the social and cultural values of the environment it describes.

The mindset that leads to ‘culturally agnostic’ systems relies on the idea that technology sits apart from the world and acts upon it, shaping and informing. In order for the DataScan model to be culturally agnostic, both the knowledge and practices of the radiologists and the bodies being diagnosed need to also become de-situated. The translation of the physical body into data de-situates and de-prioritises it, the use of the application de-situates the radiologist from the process of diagnosis, and the use of big data sets and unsupervised learning de-situates the diagnosis from medical information structures. Liam constructed the model, but the complexity is so great that we can’t just reverse engineer the rules it produces. The reasoning is dispersed through thousands of computational units arranged in hundreds of intricately interconnected layers. Because of the impenetrable nature of the black box, we can’t find out what the rules are defining this knowledge, even if it is superior to the traditional medical classifications. The knowledge is “baked into the network” (Castelvecchi 2016:23) rather than into the doctors as it is with the ICD. This de-situation results in a reification of the technology which makes it appear to be operating independently from the social world it inhabits, a reification which is enabled by the opacity of the black box. As Brian Pfafenberger puts it “what is in reality produced by relations among people appears before us in a fantastic form as relations among things” (Pfaffenberger 1988:242). It is our limited ability to perceive what is happening inside the model that leads us to attribute organic terms such as ‘learning’ and ‘thinking’ to it.

It is the same tendency for categorisation structures to produce ‘natural facts’ which makes the potential of encoded bias in machine learning technologies such an important issue. Stephan Helmreich concludes in his ethnography of Artificial Life developers that “when form is decoupled from life, we are left with free-floating form” (2007:8). Similarly, the idea of de-situated, objective, statistical knowledge implies that it floats freely in that clean, Euclidean realm on the other side of the screen. But as discussed, while the model is potentially creating new categories and new connections through the scale and complexity of its processing, it is still essentially sorting cancer from not cancer. It is an amplification of traditional knowledge structures like the ICD. And as such, it is highly connected to those that make it, meaning the knowledge it creates can’t simply float off into an impenetrable void, no matter how unsupervised it is.

Magic Boxes

So how do the team rationalise this apparent paradox between the culturally agnostic and highly situated data? Liam clearly sees the need for data to be situated in order to be useful, but also believes that it is inalienably related to fact, leading to his conception of data as “solid, it’s like the table and chair, it’s an object.” According to the team, the model is totally ephemeral and de-situated, yet it has a scale. Liam at one time describes it to me as “bigger than the British Library.” One analogy he used to describes the relationship between the model and data was the relationship between molecules and the body, “except you can’t see the body” (Liam). These are contradictory binaries but the use of a contained metaphor for something highly complex acts as a mechanism to bridge the gaps between the binaries, the vectorised body and the physical one, the tangible and the intangible, the situated and the de-situated. Nick Seaver’s ethnography of music recommendation engineers shows how they discuss their work through spatial metaphors. They express their roles as being akin to park rangers who enable people to follow their own paths, or gardeners tending and taming the wilderness of music metadata. Seaver shows how his informants wielded metaphors ambivalently, suggesting they use them to “locate their work at the interface of the natural, cultural, and technical” (2016:2). The same can be said here — the use of metaphor enables the DataScan team to locate their work between the notion of the model as an “objectively existing cultural order” of medical information, and their own “interpretive invention” due to their engineering practices (2016:2). They use many metaphors, but the one that sits at the heart of the work they do is the black box.

Black box technologies are often discussed as a type of Oracle, an entity to which one can ask a question and have the truth revealed to you. But the role which the black box metaphor seems to play in the culture of the Data Scan team is as a conceptual buffer between the messy, physical, social world and the clean, objective informational world, creating a space to move between the two. Essentially, the developers treat the model more like a TARDIS of truth, than a woman trapped in a temple on a tectonic fault who’s hallucinating from exposure to natural gas. The TARDIS (Time And Relative Dimension In Space) is the name of a time machine from the television series Doctor Who (Fig.6). It appears like a small police box on the outside but is enormous on the inside and enables the protagonists to do impossible things such as time travel. It is a place of refuge and sanctuary as the outside can’t get in. Whatever context the TARDIS is in, it maintains its boundaries and forms an impenetrable bubble of metaphysical incongruence. In the same way, the black box maintains its edges and creates a barrier between the cultural context surrounding it and the model within it. The culture of de-situation and reification of data has the consequence of establishing the black box as a metaphorical construct, as well as a set of mathematical processes obscured by complexity. This construct creates a ‘space’ in which the floating of knowledge can conceptually occur. The behaviour of the team suggests that this ‘space’ is required in order to project their need for a clean, structured, Euclidean version of the messy social world which can be used to reflect back natural truths about ourselves. The metaphor enables the team to dip in and out of parallel realities enacted by their various practices, without encountering conflict.

Figure 6. The TARDIS

Image credit: Dr Who, BBC, The Waters of Mars Season Four

Conclusion

In conclusion, through attending to the practices of the DataScan team, it becomes clear that the bodies which are enacted through this technology are as much a co-creation of the natural and social as any other medical categorisation system and should be conceptualised accordingly. The informational body that is enacted through this technology is just as situated and politicised as that enacted through any other medical practice. That does not mean that it is not highly effective and useful, but that these technologies should be treated as having similar limitations, which is why such technologies are becoming increasingly regulated. Medicine is a highly sophisticated and scientific practice, but very few doctors treat it as infallible. To do so with technologies of this kind would be a mistake that would undermine their huge potential. Healthcare professionals must be able to ensure that the values the model is delivering are commensurate with the values and needs of the environment it is being used in, and benefit from the wider impact of the knowledge and rules being generated. The practice of weighing hard data against situated evidence has always been at the heart of medicine and whilst bias can be encoded and amplified, in the main, it comes from historical social constructs, not technical ones. We may not be able to open the black box, but refining our understanding of the practices, tensions and tradeoffs that are made through the creation of these applications could help us identify new affinities and relationships between the natural and social.

It is likely that deciding what data matters and what doesn’t, understanding how that data was collected and in what context will become central to many professions as technologies of this nature become ubiquitous. There is a growing need for people to be aware of the need to balance qualitative and quantitative, thick and thin data in order to make informed decisions about their own lives, and in this case, the lives of others. Of course, the black box metaphor has many more interpretations than that of a truth TARDIS. Those that distrust potentially disruptive technologies of this kind have the potential to cause may think of it as a pandora's box, waiting to unleash the singularity if and when it is opened. Personally, I think of it as the box containing Schrödinger’s cat, a space that is designed to contain a paradox. The provision of a space that holds a paradox is required in order to establish and build this technology. Opening the box will answer the question, but there’s uranium in there as well as a cat so I urge caution.

Bibliography

American Cancer Society, 2016, Small Cell Lung Cancer Survival Rates, https://www.cancer.org/cancer/small-cell-lung-cancer/detection-diagnosis-staging/survival-rates.html (accessed 30.08.19)

Beaulieu, A. (2004) ‘From brainbank to database: The informational turn in the study of the brain’. Studies in History and Philosophy of Biological and Biomedical Sciences, 35(2), 367–390

Bowker, G., & Star, S. L., 1999. Sorting things out : classification and its consequences, Cambridge, Mass. ; London: MIT Press.

Brouillette, M,. 2017. Deep Learning Is a Black Box, but Health Care Won’t Mind. MIT Technology Review. https://www.technologyreview.com/s/604271/deep-learning-is-a-black-box-but-health-care-wont-mind/ (accessed 30.08.17)

Castelvecchi, D,. 2016. Can we open the black box of AI? Nature, 538(7623), 20–23.

Clarke, A., 2010. Biomedicalization technoscience, health, and illness in the U.S., Durham, NC: Duke University Press.

Clark, L,. 2017. AI in healthcare is being built by and for the wealthiest: we need a wider perspective, warns WHO. Wired Magazine. http://www.wired.co.uk/article/margaret-chan-un-ai-health (accessed 30.08.17)

Coleman E.G., Golub A. 2008. Hacker practice: moral genres and the cultural articulation of liberalism. Anthropol. Theory 8(3):255–77

Etiope, G., Papatheodorou, G., Christodoulou, D., Geraga, M., & Favali, P. 2006. The geological links of the ancient Delphic Oracle (Greece): A reappraisal of natural gas occurrence and origin. Geology, Volume 34,

Fischer, T,. 2017. Terabytes, Gigabytes, & Petabytes: How Big are They? Lifewire. https:// www.lifewire.com/terabytes-gigabytes-amp-petabytes-how-big-are-they-4125169 (accessed 30.08.18)

Foucault, M,. 1988. The History of Sexuality Vol. 1: The Will to Knowledge. London: Penguin.

Forsythe, D., & Hess, D,. 2001. Studying those who study us : an anthropologist in the world of artificial intelligence, Stanford, Calif.: Stanford University Press.

Gitelman, L.M.N., 2013. “Raw data” is an oxymoron, (Infrastructures series Y). Cambridge, Massachusetts : MIT Press

Hacking, I., 1996. The looping effects of human kinds. In Causal Cognition. Oxford University Press, pp. Causal Cognition, Chapter 12.

Hacking, I., 2006. Making up people. London Review of Books, 28(16), pp.23–26.

Hayles, N., 1999. How we became posthuman : virtual bodies in cybernetics, literature, and informatics, Chicago ; London: University of Chicago P.

Helmreich, S., 2007. “Life is a verb”: inflections of artificial life in cultural context. Artificial life, 13(2), pp.189–201.

Johnson, D. 2008. How do you know unless you look?: brain imaging, biopower and practical neuro-science, Journal of Medical Humanities, 29, 3, 147–61.

Joyce, K., 2008. Magnetic appeal : MRI and the myth of transparency, Ithaca: Cornell University Press.

Kelty, C., 2008. Two bits : the cultural significance of free software, Durham [N.C.] ; London: Duke University Press.

Lupton, D., 2013. Quantifying the body: monitoring and measuring health in the age of mHealth technologies. Critical Public Health, 23(4), pp.393–403.

Mackenzie, A., 2003, Bringing sequences to life: how bioinformatics corporealizes sequence data, New Genetics and Society, 22:3, 315–332

Mackenzie, A., 2006. Cutting code : software and sociality, (Digital formations ; v. 30).

Mackenzie, A,. 2015. The production of prediction: what does machine learning want? In : European Journal of Cultural Studies. 18, 4–5, p. 429–445 17 p.

Mol, A., 2002. The Body Multiple: Ontology in Medical Practice, Durham: Duke University Press.

OED Online. June 2018. “intuition, n.” Web.

Pfaffenberger, B,. 1988. Fetishised Objects and Humanised Nature: Towards an Anthropology of Technology. Man, 23(2), new series, 236–252. doi:10.2307/2802804

Rabinow, P,. 1996. Essays on the Anthropology of Reason. Princeton, NJ: Princeton University Press.

Richardson, K., 2015. An anthropology of robots and AI : annihilation anxiety and machines, (Routledge studies in anthropology ; 20).

Royal College of Radiologists, https://www.rcr.ac.uk/clinical-radiology/careers-and-recruitment, (accessed 30.08.18)

Seaver, N. 2014. On Reverse Engineering. Looking for the cultural work of engineers. https://medium.com/anthropology-and-algorithms/on-reverse-engineering-d9f5bae87812 (accessed 30.08.17)

Seaver, N,. 2016. Parks and Recommendation: Spatial Imaginaries in Algorithmic Systems. AoIR Selected Papers of Internet Research; ir16. Association of Internet Researchers. University of California.

Sulivan, J,. 2013. Forget the needle; consider the haystack. ScienceDaily: Princeton University, Engineering School

Wilf, E., 2013. Toward an Anthropology of Computer-Mediated, Algorithmic Forms of Sociality. Current Anthropology, 54(6), pp.716–739.

Wood, L.A., 2016. Con‐forming bodies: the interplay of machines and bodies and the implications of agency in medical imaging. Sociology of Health & Illness, 38(5), pp.768–781.