Glossary
Overview
AI
Short for »artificial intelligence.« The term should be viewed critically, as it represents a humanisation (Anthropomorphism). In actuality, it currently refers exclusively to »Machine Learning.«
Algorithm
A sequence of instructions for solving a problem. Algorithms follow defined individual steps, which are executed in their specified order (input, processing, output).
ANN
Short for »artificial neural network.« Just like AI, the term is coined after the biological model of the human brain. The artificial networks consist of a model of neurons with the aim of processing information. This designation provokes a humanisation (Anthropomorphism).
Annotation
Annotated, i.e. provided with a note. For example, knowledge in the form of Metadata and Tags can be added to specific images or digital objects in order to better classify or filter them.
Anthropomorphism
Humanisation, i.e. human characteristics are attributed to the non-human. The machine is supposed to be intelligent like humans (or surpass them) and in the process obtain a circuitry that resembles the human brain.
API
Short for »Application Programming Interface.« Stands for a programming interface that enables the connection of a piece of software to another program, e.g. for Scraping data sets of museum collections.
Bias
Describes a disproportionate weight, e.g. in the training of AI, in favour of or against information contained in the data. Conversely, this can result in disadvantages or perpetuate unfair biases, which is particularly critical when the data is used as the basis for making decisions with (in)direct impact on daily life.
Clustering
Means the classification of various objects in a data set into different groups. The classification is carried out automatically on the basis of detected similarities, e.g. in an image corpus for groups such as »dogs« and »cats.«
Digital Humanities
Discusses the use of computer-based methods and digital object resources in an interdisciplinary manner and reflects on their application and impact in the humanities and cultural studies.
GitHub
Is an American online service for managing software development projects and has been part of Microsoft since 2018. The service is based on Git, which is used for file management. »Training the Archive« also manages a so-called Repository.
IIIF
Short for »International Image Interoperability Framework.« A standardised interface, e.g. for the inter-institutional exchange of image data and other digital objects.
ImageNet
Various ANN that already contain mathematical weights can be used via public libraries such as Keras or TensorFlow. The complex process of training the weights is based on the ImageNet image database, which is composed of up to 14 million images from the Internet. This has led to questionable or even biased categorisation.
Keras
Is an open deep-learning library, similar to TensorFlow, written in Python and Open Source. The library can be used in a particularly meaningful way when a certain ANN pre-trained by means of Transfer Learning are applied to one’s own tasks. This, however, creates dependence on external training.
Machine Learning
The term describes the development of a model using special learning algorithms that draw on a large amount of training data. The ‘knowledge’ generated can be used for predictions or recommendations.
Metadata
Also called metainformation, describes structured data containing information on characteristics of other data to define properties of objects (e.g. medium of an artwork).
Open Source
Is applicable as soon as the source code for a software is available and can thus be viewed, changed and used (free of charge) by the public. In some cases, licences of use must be observed. »Training the Archive« wants to publish as much code as possible, e.g. on GitHub.
Pattern Recognition
Describes the recognition of regularities, repetitions and similarities in a large amount of data to facilitate facial, speech or text recognition, for instance.
Proof of Concept
In short: PoC. From project management. PoC is proof that a project is feasible in principle, e.g. by means of a Prototype. Starting from this milestone, further work can be completed on the project.
Prototype
Describes a sample design of the end product to be developed. In software development, a Prototype template is adapted to the needs of the user and thus continually developed in iterative cycles.
Python
A high-level programming language with which, among other things, Machine Learning can be programmed. It is characterised by an easy-to-read, concise programming style. It is often used in science because it is comparatively easy to learn and offers good integration of scientific libraries. The name is derived from the British comedy group Monty Python.
Repository
Describes a digital archive by means of a directory for storing and describing digital objects. For example, »Training the Archive« manages a Repository on GitHub as a freely accessible source of code for the first Prototype.
Robotics
Robotics as a subject and the robot as an entity deal with the unification of an interaction with the physical world through sensors, actuators as well as information processing and a technically feasible kinetics. In this context, AI is often mistakenly illustrated or symbolised by means of humanoid robots.
Scraping
Targeted extraction of information from the source code of websites to make the desired content available locally for further use. To scrape the image files of a museum using an API is one example of its application.
Tags
A tag labels a data set with additional information.
TensorFlow
»import TensorFlow as tf.« A framework to be applied to Machine Learning in order to have computational operations performed by the ANN. Keras, for example, is an integral part of the TF-API.
Transfer Learning
Procedure of instantiating a fully trained ANN from Keras or TensorFlow in order to pass compiled image data through it as input. The characteristics learned on one problem are applied to a new, similar problem. This is advantageous to research because the models have already trained a fundamental ‘understanding’ about the human world in terms of the general structure and content of images and this knowledge does not need to be taught from scratch.
Working Paper
The publication format reflects the current state of work and discussion within the research group, makes new knowledge available and also transfers it to the outside world.