Preprocessors
A preprocessor takes the image (or video) you've loaded into the ControlNet panel and converts it into the kind of cue your chosen ControlNet model expects: a depth map, a pose skeleton, an outline drawing, a segmentation mask, or a set of face landmarks. The preprocessor row sits just to the left of your reference image in the ControlNet panel.
You can preview each preprocessor's output before generating — useful for confirming the cue is what you wanted before you spend Spark on a render.
#Available preprocessors
| Preprocessor | What it produces | Pair with |
|---|---|---|
| Face Capture | Facial landmark map | OpenPose CN model |
| Pose Capture | OpenPose-style joint skeleton | OpenPose CN model |
| Sketch / Outline | Anime-style outline drawing | Canny, LineArt, LineArt Anime, M-LSD, Scribble, SoftEdge, Segmentation, Depth, Normal Bae |
| Depth Map | MiDaS-based depth estimation | Depth CN model |
| Segmentation | Class-labeled region map (U2Net / IS-Net) | Segmentation CN model |
| RMBG (Background Removal) | Subject isolated from background | Segmented Subject / Segmented Background CN |
| Invert | Inverts colors (white↔black) | Use after Sketch / Outline for line drawings |
Multiple preprocessors can be chained — for example, Sketch + Invert turns a photo into white lines on a black background, which is what Canny / LineArt models expect.
#On-device preprocessor models
When you run on-device, the preprocessing step itself runs locally on your Mac. Sogni Studio lazy-loads these small models the first time you use a given preprocessor:
| Model | Size | Used for |
|---|---|---|
anime2sketch |
200 MB | Sketch / Outline preprocessor |
midas_small |
61 MB | Depth Map preprocessor |
is_net |
163 MB | IS-Net segmentation |
u2_net |
165 MB | U2Net segmentation |
u2_netP |
4 MB | Lightweight segmentation variant |
RMBG |
165 MB | Background removal |
If you only ever render on Supernet, you don't need any of these — preprocessing runs on the worker.
#Face Capture
Locates all faces in the reference, detects landmarks, and emits a facial landmark map. Pair with the OpenPose ControlNet model to transfer expressions and head orientation. Works with one or more faces and can run together with Pose Capture for full body + face control.
#Pose Capture
Detects human subjects and extracts an OpenPose bone-joint skeleton. Pair with the OpenPose ControlNet model to lock the pose of a generated subject. Combine with Face Capture to lock pose and expression in a single generation.
#Sketch / Outline
Converts a reference into a sketch-style line drawing. For best results, follow with Invert so you get white lines on a black background — what Canny, LineArt, and Scribble models are trained on.
Sketch works well with: Depth, LineArt, LineArt Anime, M-LSD, Normal Bae, Scribble, Segmentation, SoftEdge.
#Depth Map
Runs MiDaS depth estimation locally (or its equivalent on Supernet) and produces a grayscale depth image. Pair with the Depth ControlNet model. Best for compositions where 3D structure matters — interiors, landscapes, portraits with strong foreground/background separation.
#Segmentation
Splits the reference into labeled regions — sky, person, vehicle, building, etc. — using U2Net or IS-Net under the hood. Pair with the Segmentation ControlNet model to lock spatial layout while changing the content of each region via prompt.
#RMBG (Background Removal)
Isolates the foreground subject from the background. Used to feed either:
- Segmented Subject ControlNet — render the subject as-is, regenerate the background.
- Segmented Background ControlNet — keep the background, regenerate the subject.
Useful for character-on-new-environment shots and background swaps.
#Invert
Not a real preprocessor — flips the colors of whatever's currently shown. Apply after Sketch / Outline when working with Canny, LineArt, or Scribble models.
#Tips
- Preview before you render. The preprocessor preview is the cue that ControlNet actually sees. If the depth map looks flat or the pose skeleton missed limbs, the render will reflect that — fix at the preprocessing stage.
- Crop the reference first. If your reference has multiple subjects but you only want one, crop down so preprocessors lock onto the target.
- Lower frame rates for video. Video ControlNet preprocessing scales linearly with frame count — drop to 12 or 16 FPS if you only need motion direction, not every micro-movement.