Classifications with Digital Technology - Bias in Practices with Digital Documents

6. Bias in Practices with Digital Documents

6.3 Classifications with Digital Technology

viewing cultural memory - or the documentary heritage of humanity - it becomes important to pay attention to a different type of documentary practice that precedes the database (understood as final product of a practice), namely the classification and categorisation of information, which requires naming information entities and establishing relationships between them.

manually by librarians, archivists and other “information professionals”, the development of digital technology has triggered two new metadata practices. The first refers to the automatic creation of metadata by the computer. Metadata used to be created manually by people and was external to the document, so to speak. However, with the automation of this process, metadata is now also created automatically by computers, registering information such as the date, title and author of the document, when it was modified, what types of fonts have been used and similar information. With this metadata has become part of the document, rather than being external to it, for example in library catalogues and indexes. For certain types of digital documents, it is even recommended that the metadata is embedded in the document rather than stored somewhere else.⁸⁴³ The second new practice refers to the creation of metadata by non-professionals or end-users.

If metadata was previously a matter of professional practice, it is now additionally undertaken by end-users, an example being the so-called folksonomy, which is analysed below. The concept of “folksonomy” was coined by Thomas Vander Wal in 2004 combining the words folk and taxonomy; and for the purpose of its understanding, it is useful to briefly explain what tag and tagging mean.⁸⁴⁴ Tags represent a certain type of metadata, in the form of keywords, with its meaning in relation with digital technology similar to its common usage, namely “a label attached to someone or something for the purpose of identification or to give other information.”⁸⁴⁵ According to Nicholas Gane, “tagging promotes the connectivity of information within and in some cases between new media archives. It is one practice among many that transforms the archive into a networked storage medium by making connections between vast amounts of data at unprecedented speeds.”⁸⁴⁶ While tags used to be created either by professional institutions or the creators of documents, digital technology has enabled practices in which tags are attached by users, leading to the emergence of folksonomies. As explained by Vander Wal: “Folksonomy is the result of personal free tagging of information and objects (anything with a URL) for one's own retrieval. The tagging is done in a social environment (usually shared and open to others). Folksonomy is created from the act of tagging […].”⁸⁴⁷ In order to visualise the tags, websites with this feature display them in

843 Gilliland, “Setting the Stage”.

844 Thomas Vander Wal, Weblog – Folksonomy: Coinage and Definition, February 2, 2007, http://vanderwal.net/folksonomy.html (accessed March 18, 2012).

845 See entry on “tag” in Oxford Dictionaries Online http://oxforddictionaries.com/definition/english/tag?q=tag (accessed 19 November 2012).

846 Gane and Beer, New Media, 82.

847 Vander Wal, Weblog – Folksonomy.

called “tag clouds” that can take various forms. The keywords can be organised alphabetically, with keywords that are used most often or are most important marked through a larger font size, as exemplified by the image on the left-hand side below (Figure 3).⁸⁴⁸ However, tags can also be visualised as an index, as in the image on the right-hand side (Figure 4), in an alternative and emerging form of tag visualisation.⁸⁴⁹

Figure 3 Tag Cloud © Umair Hague Figure 4 Index Tag Cloud © Vitaly Friedman

Regardless of the form that they take, all folksonomies share in common the “user added keywords as a fundamental organizational construct.”⁸⁵⁰ As explained by Mathes in comparison to other classification systems, an important aspect of a folksonomy is that the terms are placed in a flat namespace, without hierarchy or directly specified relationships between the terms.⁸⁵¹ Accordingly, Mathes argues that this “is unlike formal taxonomies and classification schemes where there are multiple kind of explicit relationships between terms…folksonomies are simply the set of terms that a group of users tagged content with, they are not a predetermined set of classification terms or labels.”⁸⁵² Therefore, folksonomies represent informal classification systems that are dynamic and changing simultaneously with the popular interests of website visitors. They are comparable with and complementary to formal classifications such as bibliographies and library catalogues, reflecting a new informal method of classification that has been made possible by the bias of digital technology.

848 Umair Hague, Blog of Umair Hague, http://www.bubblegeneration.com/2010/03/about-me.html (accessed 29 April 2013).

849 Vitaly Friedman, “Tag Clouds Gallery: Examples and Good Practices,” Smashing Magazine (7 November 2007) http://www.smashingmagazine.com/2007/11/07/tag-clouds-gallery-examples-and-good-practices/

850 Mathes, “Folksonomies”.

851 Ibid.

852 Ibid.

Furthermore, apart from metadata, it is important to draw attention to a different automated process. Despite not considered part of classification practice, search engines can also be said to carry out an automated form of classification, not in terms of descriptions, key words, titles or authors, as the metadata does, but rather in terms of indexing digital documents, and more precisely websites, which can be considered examples of digital documents, as shown in the previous subchapter. Comparing metadata with search engines is possible given that they fulfil similar functions, namely representing tools that should facilitate information finding.

While metadata achieves this by creating additional information about a document, search engines do so through a different automated process that requires a brief clarification. An explanation was provided in the previous chapter concerning how search engines are influenced by specific political and commercial interests; however, in order to understand changes triggered by digital technology on documentary practices, it is useful to additionally note how search engines function from a technical perspective. Abelson et al. provide technical explanations by dividing the process of searching information into seven steps: the first three steps occur independently of users; while the other four follow the use of the search engine and are reflected at the level of displaying the search results, as discussed in the previous chapter.⁸⁵³ In this subchapter, the focus lies only on the first three steps, which take place as follows: search engines gather information by exploring the web, visiting web sites on a regular basis to ascertain what they contain, and re-visiting old sites for content that has been updated; search engines subsequently make copies of the sites visited; and they build an index showing which words appear on which web sites.⁸⁵⁴ This gathering of information is conducted by specific software. Officially called a web crawler, and unofficially a spider, this specific software is a kind of program known as a web robot or simply bot, created to perform a repetitive task such as information gathering.⁸⁵⁵ However, in line with Abelson et al., it is important to point out that search engines do not index everything, but rather select, with what is being selected and indexed representing less than 5% of what could be potentially collected.⁸⁵⁶ In this regard, Abelson et al. explain to the importance of having a website indexed, which would otherwise never be found by a search engine and users would consequently assume that it does not exist, given that users rarely know that only a small

853 Abelson, Ledeen and Lewis, Blown to Bits, 120.

854 Abelson, Ledeen and Lewis, Blown to Bits, 121; The remaining four steps follow after a person makes a query. As a fourth step the search engine needs to understand the query; as fifth step to determine the relevance of each possible result to the query; as sixth step to determine the ranking of relevant results; and finally as seventh step to present the results.

855 Abelson, Ledeen and Lewis, Blown to Bits,123.

856 Abelson, Ledeen and Lewis, Blown to Bits, 122.

portion of what exists becomes indexed. Abelson et al remark that this is similar with removing the entry of a book from a library’s catalogue: if it is not in the catalogue, it is assumed that it is not in the library. Search engines make invisible the information that they do not index, and thus “removing information in the digital world does not require removing the documents themselves. You can make things disappear by banishing them into the un-indexed darkness.”⁸⁵⁷ Therefore, digital technology has not only changed the creation of metadata by automating the process, but has also influenced the process of finding information.

Classifications, albeit not in the form described above, are also important at the software level, with Alison Adam having written about the relevance of the list, which she defines as a

“fundamental way for organizing and classifying information.”⁸⁵⁸ Given that her argument partly relates to the notion of list in computer sciences, it is first necessary to clarify another notion, namely data structure, which refers to the conceptual shape and arrangement of data in a program.⁸⁵⁹ Indeed, this aspect could have been analysed in the previous chapter, given that it concerns structuring of documents; however, as already mentioned, it is impossible to analyse all aspects of digital technology in the space available. Therefore, for the purpose of this dissertation, only the list is analysed, which is an example of data structure yet also a tool for classifying information, as discussed below. There are several types of data structures, such as rectangular blocks known as arrays, in which data is arranged in rows and columns;⁸⁶⁰ or the list, in which data is arranged sequentially, with the list further differentiated into stacks and queues depending on how data can be accessed;⁸⁶¹ or the tree, in which data is arranged hierarchically.⁸⁶² Brookshear specifies that a computer’s main memory is not organised in lists, stacks, queues and trees, but rather as a sequence of memory cells, thus stimulating all other structures, reminding of Kirschenbaum’s and Kittler’s explanations that software can be broken down to hardware. Nevertheless, data structures are relevant as “abstract tools that are created so that users of the data can be shielded from the details of actual data storage (memory cells and addresses) and can be allowed to access information as though it were

857 Abelson, Ledeen and Lewis, Blown to Bits.

858 Alison Adam, “Lists,” in Software Studies: a Lexicon, ed. Mathew Fuller (Cambridge, Massachusetts &

London: MIT Press, 2008), 174.

859 Brookshare, Computer Science.

860 For a comprehensive technical description see especially chapters 6.2 and 8.1 in Brookshare, Computer Science.

861 In a stack data is inserted and removed at the top of the stack, and in a queue data in inserted at the end of the queue and removed at its head. See Brookshare, Computer Science, 366-367.

862 Brookshare, Computer Science, 367.

stored in a more convenient form.”⁸⁶³ Therefore, returning to Adam’s argument, for whom the list reflects a fundamental way of classifying information, while she referred to the list understood as data structure as in programming, she places the discussion in broader cultural context, arguing that this is a continuation of the relevance played by the list since the emergence of literate societies, with some of the earliest evidence of written language taking the form of lists.⁸⁶⁴ Adam exemplifies this with cuneiform tablets containing accounting lists and lists of objects and vocabularies, lists for religious rituals and other types such as lists of instructions, resembling Innis’ analysis regarding the bias of a medium on the development of empires. It is possible to infer from Adam’s argument that she actually approaches the list as a specific type of document, or what would be called “genre” in the field of library and archival sciences. Accordingly, for Adam the list is a form of knowledge representation; however, she also speaks about the recipe or instruction lists, which “detail a list of steps needed to complete a task but contain no generality nor the idea of proof; rather they contain ‘hard coded’ steps of sequences of instructions.”⁸⁶⁵ Regardless, this does not make them less important, because lists supply knowledge or information about what exists and how to behave in the world, with Adam concluding that lists, whether inscribed on clay or silicon chips, represent not only things that help people to classify information, but also knowledge and how we reason about knowledge.⁸⁶⁶ From this perspective, the relevance of creating lists at the software level is comparable with the relevance of software documentation, and it can be thus considered a classification practice enabled by digital technology, next to different forms of metadata or search engine-based indexing practices.

Im Dokument OPUS 4 | The digital "Memory of the World" : an exploration of documentary practices in the age of digital technology (Seite 163-168)