ControlNet
ControlNet is a second model that rides alongside the main image model and tells it where things should go. You feed it a reference — a sketch, a photo of a person, a room, a pose — and a preprocessor extracts the structure (edges, body skeleton, face landmarks, depth). The text prompt still drives the look, but the layout, geometry, or identity comes from the reference.
Pocket exposes ControlNet as Advanced Guidance so you get this level of control directly on iPhone or iPad — no desktop, no comfy graph. Pick a mode, drop in a reference, write a prompt, and generate. Four sub-modes cover most of what people actually need:
#Sub-modes
- Sketch a Masterpiece — turn a sketch, line drawing, or scribble into a finished image. Uses Scribble, LineArt, LineArt Anime, or Canny under the hood.
- Face and Pose Capture — extract a body pose and/or face landmarks from a reference and rebuild the subject as anything you want. OpenPose model.
- Face Transfer — preserve a specific person's identity across new contexts and styles. InstantID for SDXL.
- Depth Map — lock the spatial layout of a scene (foreground vs background, object placement) while rewriting everything inside it. Depth, Normal Bae, and Segmentation.
#When to reach for ControlNet
A plain text prompt works fine when you don't care about exact composition. The moment you do — "a knight in this pose," "a room with this layout," "my line drawing, but as oil paint" — ControlNet is the right tool. It also lets you reproduce the same underlying structure across many style variations, which is hard to do with seeds alone.
#Workflows
Sketch to finished art. Draw a rough composition on the canvas, switch to Sketch a Masterpiece, write a prompt that describes style and lighting, and generate. Iterate by tweaking the line width and prompt — the geometry stays put.
Pose match. Find a reference photo with a strong pose, run Pose Capture, then prompt for a totally different subject (alien, mech, dancer) to inherit the silhouette.
Identity across styles. Take a clean portrait, use Face Transfer, and run the same prompt through Anime, Cyberpunk, and Pixar styles. The face stays recognizable across all three.
Room redesign. Photograph a room, use Depth Map, and prompt for a different decor era. Furniture stays where it is; materials and lighting change.
#Tips
- One ControlNet at a time on Pocket. Switching modes turns the previous one off — that's intentional, since multiple guidance sources can fight each other on mobile.
- Strength matters. If the model is over-fitting the reference and ignoring your prompt, drop ControlNet strength to 60–75%. If it's drifting away from the reference, push higher.
- Prompt for what the reference can't carry. A skeleton map has no style; a depth map has no texture. Spend prompt tokens on lighting, materials, and color.
- Steps help. ControlNet runs benefit from 30+ steps more than vanilla text-to-image does.