Sogni: Learn logo

Preprocessors

A preprocessor takes the image (or video) you've loaded into the ControlNet panel and converts it into the kind of cue your chosen ControlNet model expects: a depth map, a pose skeleton, an outline drawing, a segmentation mask, or a set of face landmarks. The preprocessor row sits just to the left of your reference image in the ControlNet panel.

You can preview each preprocessor's output before generating — useful for confirming the cue is what you wanted before you spend Spark on a render.

#Available preprocessors

Preprocessor What it produces Pair with
Face Capture Facial landmark map OpenPose CN model
Pose Capture OpenPose-style joint skeleton OpenPose CN model
Sketch / Outline Anime-style outline drawing Canny, LineArt, LineArt Anime, M-LSD, Scribble, SoftEdge, Segmentation, Depth, Normal Bae
Depth Map MiDaS-based depth estimation Depth CN model
Segmentation Class-labeled region map (U2Net / IS-Net) Segmentation CN model
RMBG (Background Removal) Subject isolated from background Segmented Subject / Segmented Background CN
Invert Inverts colors (white↔black) Use after Sketch / Outline for line drawings

Multiple preprocessors can be chained — for example, Sketch + Invert turns a photo into white lines on a black background, which is what Canny / LineArt models expect.

#On-device preprocessor models

When you run on-device, the preprocessing step itself runs locally on your Mac. Sogni Studio lazy-loads these small models the first time you use a given preprocessor:

Model Size Used for
anime2sketch 200 MB Sketch / Outline preprocessor
midas_small 61 MB Depth Map preprocessor
is_net 163 MB IS-Net segmentation
u2_net 165 MB U2Net segmentation
u2_netP 4 MB Lightweight segmentation variant
RMBG 165 MB Background removal

If you only ever render on Supernet, you don't need any of these — preprocessing runs on the worker.

#Face Capture

Locates all faces in the reference, detects landmarks, and emits a facial landmark map. Pair with the OpenPose ControlNet model to transfer expressions and head orientation. Works with one or more faces and can run together with Pose Capture for full body + face control.

#Pose Capture

Detects human subjects and extracts an OpenPose bone-joint skeleton. Pair with the OpenPose ControlNet model to lock the pose of a generated subject. Combine with Face Capture to lock pose and expression in a single generation.

#Sketch / Outline

Converts a reference into a sketch-style line drawing. For best results, follow with Invert so you get white lines on a black background — what Canny, LineArt, and Scribble models are trained on.

Sketch works well with: Depth, LineArt, LineArt Anime, M-LSD, Normal Bae, Scribble, Segmentation, SoftEdge.

#Depth Map

Runs MiDaS depth estimation locally (or its equivalent on Supernet) and produces a grayscale depth image. Pair with the Depth ControlNet model. Best for compositions where 3D structure matters — interiors, landscapes, portraits with strong foreground/background separation.

#Segmentation

Splits the reference into labeled regions — sky, person, vehicle, building, etc. — using U2Net or IS-Net under the hood. Pair with the Segmentation ControlNet model to lock spatial layout while changing the content of each region via prompt.

#RMBG (Background Removal)

Isolates the foreground subject from the background. Used to feed either:

  • Segmented Subject ControlNet — render the subject as-is, regenerate the background.
  • Segmented Background ControlNet — keep the background, regenerate the subject.

Useful for character-on-new-environment shots and background swaps.

#Invert

Not a real preprocessor — flips the colors of whatever's currently shown. Apply after Sketch / Outline when working with Canny, LineArt, or Scribble models.

#Tips

  • Preview before you render. The preprocessor preview is the cue that ControlNet actually sees. If the depth map looks flat or the pose skeleton missed limbs, the render will reflect that — fix at the preprocessing stage.
  • Crop the reference first. If your reference has multiple subjects but you only want one, crop down so preprocessors lock onto the target.
  • Lower frame rates for video. Video ControlNet preprocessing scales linearly with frame count — drop to 12 or 16 FPS if you only need motion direction, not every micro-movement.

#See also