AI/ML News & Innovations Hub

Stability AI

5 articles tagged with this keyword, sorted by most recent first.

Chip Huyen Blog 2023-10-10 00:00 UTC Score 53.0 USR-0111-20231010-ai-specialis-f4a68771

Multimodality and Large Multimodal Models (LMMs)

For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and see. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world. OpenAI noted in their GPT-4V system card that “ incorporating additional modalities (such as image inputs) into LLMs is viewed by some as a key frontier in AI research and development .” Incorporating additional modalities to LLMs (Large Language Models) creates LMMs (Large Multimodal Models). Not all multimodal systems are LMMs. For example, text-to-image models like Midjourney, Stable Diffusion, and Dall-E are multimodal but don’t have a language model component. Multimodal can mean one or more of the following: Input and output are of different modalities (e.g. text-to-image, image-to-text) Inputs are multimodal (e.g. a system that can process both text and images) Outputs are multimodal (e.g. a system that can generate both text and images) This post covers multimodal systems in general, including LMMs. It consists of 3 parts. Part 1 covers the context for multimodality, including why multimodal, different data modalities, and types of multimodal tasks. Part 2 discusses the fundamentals of a multimodal system, using the…

Read article →

Fast.ai 2023-04-03 14:00 UTC Score 25.0 AI-185-20230403-developer-an-66bd04e7

From Deep Learning Foundations to Stable Diffusion

We’ve released our new course with over 30 hours of video content.

Read article →

Jay Alammar Blog 2023-01-01 00:00 UTC Score 41.0 USR-0113-20230101-ai-specialis-60429c7c

Remaking Old Computer Graphics With AI Image Generation

Can AI Image generation tools make re-imagined, higher-resolution versions of old video game graphics? Over the last few days, I used AI image generation to reproduce one of my childhood nightmares. I wrestled with Stable Diffusion, Dall-E and Midjourney to see how these commercial AI generation tools can help retell an old visual story - the intro cinematic to an old video game (Nemesis 2 on the MSX). This post describes the process and my experience in using these models/services to retell a story in higher fidelity graphics. Meet Dr. Venom This fine-looking gentleman is the villain in a video game. Dr. Venom appears in the intro cinematic of Nemesis 2, a 1987 video game. This image, in particular, comes at a dramatic reveal in the cinematic. Let’s update these graphics with visual generative AI tools and see how they compare and where each succeeds and fails. Remaking Old Computer graphics with AI Image Generation Here’s a side-by-side look at the panels from the original cinematic (left column) and the final ones generated by the AI tools (right column): This figure does not show the final Dr. Venom graphic because I want you to witness it as I had, in the proper context and alongside the appropriate music. You can watch that here:

Read article →

Jay Alammar Blog 2022-10-04 00:00 UTC Score 33.0 USR-0113-20221004-ai-specialis-dafdda9c

The Illustrated Stable Diffusion

Translations: Chinese, Vietnamese. (V2 Nov 2022: Updated images for more precise description of forward diffusion. A few more images in this version) AI image generation is the most recent AI capability blowing people’s minds (mine included). The ability to create striking visuals from text descriptions has a magical quality to it and points clearly to a shift in how humans create art. The release of Stable Diffusion is a clear milestone in this development because it made a high-performance model available to the masses (performance in terms of image quality, as well as speed and relatively low resource/memory requirements). After experimenting with AI image generation, you may start to wonder how it works. This is a gentle introduction to how Stable Diffusion works. Stable Diffusion is versatile in that it can be used in a number of different ways. Let’s focus at first on image generation from text only (text2img). The image above shows an example text input and the resulting generated image (The actual complete prompt is here). Aside from text to image, another main way of using it is by making it alter images (so inputs are text + image).

Read article →

TensorFlow Tutorials — Score 12.0 AI-192-nodate-developer-an-5d475d60

Stable Diffusion

Read article →