Ignorant__visions and LAION-5B, LAION-Aesthetics and LAION Art
LAION 5-B is one of the most important components of generative AI with neural network machine learning algorithms. LAION 5-B is a database containing more than five billion image-text pairs composed of images collected by Common Crawl and tagged with text-tags using OpenAI’s machine vision model CLIP. It is the largest open-source dataset and is widely used for AI research. It is notably used in the training of Stable Diffusion and Midjourney, two major diffusion-model based generative AI softwares capable of producing text-to-image content.
LAION Aesthetic is a subset of LAION 5-B and relies on aesthetic scoring for its curation. LAION Aesthetic contains 1.2 billion images that have been scored 4.5 and over. Scores lower than that being disqualified because of insufficient visual appeal or non-suitable content. LAION Art can be considered a subset of LAION Aesthetic and contains 8.1 million images that have been scored 8.0 and higher. LAION Aesthetic and LAION Art formed the basis for the training of Stable Diffusion’s image generators and thus are essential for any discussion of generative AI and its visuality.
Aesthetic Scoring as a process would require a separate article to give it justice, but it is the process by which a machine vision system is taught to recognise a “visually appealing” image to bolster the “visual quality” of a dataset. The basis of aesthetic rating can be traced back to the photographic competition website DPChallenge where users would upload photographs for specific challenges. Users would in turn rate the images from 0 to 10 and provide text comments about the pictures. The Aesthetic Visual Analysis dataset was built by scraping two-hundred thousand images from the website. AVA was used to train early machine vision systems but the quality of the dataset was poor and required further curation of its text-image content (a lot of the comments associated with images were irrelevant for machine learning purposes). Human users would filter small samples of the dataset and correct the text pairing or scores, providing information for the training of a macine learning system that was trained to automate the process of aesthetic scoring for large datasets. This model is CLIP - contrastive language-image pretraining. AVA and CLIP were essential to select, rate and curate the wealth of images provided by Common crawl to form LAION. Yet the process and assumptions behind aesthetic scoring and the ground-truth of what constitutes “visual quality” or “appeal” remains underinvestigated. This is particularly interesting as CLIP is also deployed in the rating of synthetic images produced by Stable Diffusion and which constitutes the five thousand image rich Simulacra Aesthetic Captions dataset. This dataset, made of images produced using Stable Diffusion, is rated by CLIP and is used in turn to rate and discrimate image outputs in the image-generation process. The machine in a sense assesses its own images in terms of what it has been taught to consider as a visually appealing synthetic image. If the generated image is not good enough, it will try again to generate an image of higher quality.
What this short overview, far too superficial but giving the essentials, leads us to is the unstable notion of “visual quality” and “appeal” present in images rated 8 and higher in the LAION Art subset. In this dataset of 8.1 million images we encounter stock images of cars, cakes, snapshots of dolls and congregations of parishioners, old masters paintings by the like of Titian or Reubens, memes and to put it bluntly, kitsch and bad quality images en masse. We are at the early stages of our investigation into how museums feed into these datasets and what kind of works from museums end up in machine learning systems and by what avenue. But I cannot ignore my experience of browsing the LAION Art database and being captivated by the content I would see. Maybe it awakened nostalgia for the blogging culture of the early 2000s where endless image galleries would document the strangest images from all corners of the internet. And LAION Art is just like this. As such, to keep looking at LAION Art for my enjoyment but also research purposes I have set up the instagram account Ignorant__visions where an image from LAION Art is posted once a day with its URL, text caption and Aesthetic score. I will be posting until I finish my PhD in August 2027 and hopefully it will provide an useful gallery to provoke and illustrate my research into the visuality of generate AI. The account has just been launched but I am already receiving a few messages, asking about the nature of these images and their relation to generative AI. We hope that this modest gimmick will still serve a public purpose for research dissemination.
If you want to explore LAION Art for yourself further, follow this link >> https://huggingface.co/datasets/laion/laion-art/viewer/default/train?p=45
To test CLIP on a small dataset to get rankings of “best images” you can use either
2D Clip by Leonardo Impett >> https://leoimpett.github.io/2dclip/
Aesthetic Selector by Fernando Rubio Perona >> https://github.com/ferrubio/AestheticSelector?tab=readme-ov-file
LAION 5B is currently at the centre of a controversy and legal trouble after child abuse material was found in the dataset. Concerns about abusive content of trauma or non-consensual content being used for the training of generative systems is an urgent policy and cultural conversation. This relates to the ownership of data on one hand but also the regulation of use and implementation of this data by private corporations and researchers alike. Abeba Birhane, a researcher that was key in the development of CLIP has been vocal in her work about the risks and shortcomings of these datasets and automated rating systems in regards to harmful content. I recommend the following reading and can suggest more:
Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: Misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963. https://doi.org/10.48550/arXiv.2110.01963
Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: Misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963. https://doi.org/10.48550/arXiv.2110.01963
Below follow me as I browse through a couple pages of LAION Art :)