Santa Paws Is Coming to Town

Generating high-quality photos of our favorite fluffy friends.

Dec 24, 2024

The holidays are upon us! Wouldn’t it be nice to include a fun holiday photo of your dog (or cat!) in updates to your family?

It can be tough if your dog won’t sit still long with the Santa hat on for you to take the photo. What if we could snap our fingers and have a nice professional holiday photo of our best friend?

It even captures the slight expression of dread as he’s forced to wear the hat.

Or maybe you’d like to see what it’s like to have your dog enjoy a fireworks show on New Years Eve, which you’ll never get to see in real-life because every NYE you drive 200 miles away from the nearest city so your dog can avoid being terrified and shaking for the subsequent three days.

Or maybe you’d just like to design a custom holiday card featuring your best friend.

An AI Model Just For Your Pet

You can do all of this by fine-tuning your own custom generative AI model.

There are many guides online to do this, including a great one from Replicate, but many seem to fall short in replicating the little details that only we would recognize about our pet – the speckles on the paws, the orientation of the whiskers, the patterns on the chest, the angle of the ears.

The casual observer might not notice, but the casual observer doesn’t spend 12 hours a day staring at that fuzzy face.

So this guide walks you through the extra little tips and tricks and tweaks to get those details that only you would notice.

What Is Fine-Tuning?

Fine-tuning is the process of taking a pre-trained AI model—which already has a lot of general knowledge—and giving it additional, specialized training so it performs better on a specific task or domain. Essentially, you’re “tuning” the model’s existing parameters to your particular needs by feeding it new examples that reflect the type of output or behavior you want.

Overview

The process is going to build on top of this Replicate post. We’re going to make some tweaks and improvements to it, but 95% of that guide applies.

The tweaks, specifically, entail 1) improving how we generate training captions for our training photos and 2) make a few minor configuration tweaks to the training process to make sure we really capture the fine little details.

1. Gather Training Photos

The first thing you’ll need to do is gather 15 to 20 photos. Place all of these photos into a directory together. Make sure they capture your pet at more-or-less the same age, but in a variety of environments and poses. You don’t want some photos to be a puppy and others to be an adult.

Any aspect of your pet that isn’t captured in the photos won’t be learned by the model (e.g. if you don’t include any images that have their tail visible, the model can’t learn what their tail looks like).

Don’t select more than 20. You may find it challenging to choose only your favorite 20 because they’re all so cute – but be decisive.

Also, make sure your pet is the only primary subject of the photos! Here are some that I used in my training:

Note: If you’re in the Apple ecosystem and have a bunch of HEIC images and need to quickly convert them to JPG, here’s a little script I wrote to automate that for you. Just pass it a directory and it’ll take care of it.

2. Write Captions

The quality of our captions is essential to good results here. The training tool that Replicate provides has an "autocaption” option, but we are not going to use it. Instead we’re going to have OpenAI generate much better captions for us. They’ll also be more consistent with captions that we’ll be using later on to generate new images.

Two things are important to emphasize here:

Our captions are going to need a trigger word that you get to choose and should be unique enough that there’d never be any reason for it to occur in a prompt other than as a reference to your pet. For my training, I chose OUR_REBEL_DOG because our dog’s name is Rebel.
Every image in your directory of images is going to need a corresponding caption. That caption will live in a file with the same name as the corresponding image, but with a .txt extension instead of .jpg.

For example, if you’ve got an image called IMG_3058.jpg you’ll need a caption that lives in IMG_3058.txt.

I’ve written a script to automate this for you.

You just need to point it at a directory, give it an OpenAI API key, and tell it the TRIGGER_WORD you’re using.

./caption.sh -a TRIGGER_WORD -k "your-api-key" -d /path/to/images

When all is said and done, you’ll have a directory of captions like:

In this charming portrait, OUR_REBEL_DOG is seated gracefully in a lush garden setting. Their fur is a warm golden hue with creamy white patches on the chest, giving a soft contrast to their expressive eyes. Surrounding them, a vibrant backdrop of dense green foliage and flowering plants adds depth and color. A weathered tree trunk provides a rustic touch, framed by a border of purple blooms and neatly cut grass. The lighting is bright and natural, highlighting the textures of both the scene and OUR_REBEL_DOG’s coat, evoking a serene and picturesque atmosphere.

3. Train Your Model

This part is easy, but you need a Replicate account, so create one if you don’t have one.

Create a .zip file of your directory that contains the images and captions.
Hop over to Replicate’s training tool: https://replicate.com/ostris/flux-dev-lora-trainer/train
Fill out the form:
Destination: Choose “Create New Model” and give it a name.
input_images: Upload your .zip file
trigger_word: Write the trigger word you chose in your captions
autocaption: Uncheck autocaption. We generated our own
#steps: Bump it up to 2,000
lora_rank: Bump this up to 48. This is important for capturing little details.

That’s it! Click “Create training” and wait ~40 minutes. The whole training process should cost under ~$5 in my experience.

4. Generate New Captions for Pawliday Photos!

Open up ChatGPT to generate some captions for us. Try a prompt like:

We are generating photos of our dog using an AI image generator. Write ten detailed prompts that describe our dog n various fun Christmas scenarios that would make a great holiday card.
Make sure every prompt refers to our dog at least once. Refer to our dog as "OUR_REBEL_DOG, a yellow labrador retriever,". Include multiple references to this in the prompt to avoid misrepresenting OUR_REBEL_DOG.
Make the prompts highly detailed and interesting, and make them varied in subject matter and style. Make sure the prompts will generate images that include unobscured facial details. Pay attention to composition, colors, style, and other details that make an image great.

The emphasis on including the trigger word, and a brief descriptive phrase of your pet, is important for steering the image model.

5. Generate Photos!

You can now generate photos, illustrations, cartoons, or whatever you want.

In your Replicate models tab, you should see your custom model. Open it up.

Fill out the form (skip most of the fields). These are the notable fields:

prompt: Copy in the prompt from ChatGPT
model: Keep it on dev
#num_outputs: I like to change this to 4, because there are a lot of misses
#num_inference_steps: Increase this to 40
guidance_scale: Keep it small Try a few values like 2.5 and 3. This has a big impact on quality.
output_format: I always change this to .jpg, but it has no influence on the output quality
output_quality: Bump it up to 95.
go_fast: Uncheck this

Click “Run”, sit back, and watch as amazing photos start pouring in!

Examples

A Cookie Scene Painting

A detailed painting of OUR_REBEL_DOG, a yellow Labrador retriever, wearing a red ribbon and sitting beside a cookie plate left out for Santa. In the background, a dimly lit room reveals a Christmas tree sparkling with vintage ornaments, stockings hanging on the mantle, and a partially eaten cookie with crumbs scattered on the rug. OUR_REBEL_DOG’s expression is mischievously innocent, with their head slightly tilted as though caught in the act. The scene brims with holiday charm, humor, and warmth.

A Norman Rockwell Wintery Greeting Card

A Norman Rockwell illustration of a vibrant outdoor winter scene showing OUR_REBEL_DOG, a yellow Labrador retriever, frolicking through deep snow with children building a snowman nearby. The illustration is filled with movement and energy—snow flying from OUR_REBEL_DOG’s paws as they chase a red scarf caught in the wind, children laughing as they pack snowballs, and a backdrop of a rustic barn decorated with a simple evergreen wreath. The art style captures the crispness of a snowy day and the warmth of shared joy. Inside: 'Wishing You a Holiday Full of Laughter and Play!' in a warm, inviting font.

Giving You a Holiday Card

OUR_REBEL_DOG, a yellow Labrador retriever, sitting in front of a beautifully decorated Christmas tree, wearing a Santa hat slightly tilted over one ear. Surrounding them are a pile of wrapped presents in vibrant colors, some with their bows slightly untied as if OUR_REBEL_DOG has been curious. The room is softly lit by the glow of the tree and a fireplace in the background. Inside of the card: The phrase 'Wishing You a Pawsitively Merry Christmas!' written in a whimsical font, surrounded by small paw print illustrations."

A playful cartoon-style depiction of OUR_REBEL_DOG, a yellow Labrador retriever, dressed in reindeer antlers and a red nose. They are drawn next to a snowman with a Santa hat, holding a carrot in their mouth as if playfully teasing about adding it to the snowman. The snowy background is bright and cheerful, with children sketched in simple lines playing in the distance. Inside: 'Have a Howl-iday Full of Joy and Laughter!' in bold, comic-style letters with scattered paw prints and snowflakes around the text.

Pro-Tip

You can generate photos for every occasion. You don’t have to use it just for the holidays.

Learn more about LOGIC, Inc. at https://logic.inc

BITS of LOGIC, Inc.

Discussion about this post