How scammers create deepfake videos step by step

Behind the Digital Curtain: A Chilling Walkthrough of How Scammers Build Deepfake Videos

You see a video of a celebrity endorsing a shady crypto scheme. Or a news anchor reporting a fake crisis. The face, the voice, the subtle expressions—it all looks real. But in a quiet room, a scammer with a laptop is conducting a sinister symphony of code, using tools as accessible as a search engine. How scammers create deepfake videos step by step is not a story of magical AI. It’s a methodical, often shockingly simple, process of digital forgery that turns public data into private weapons. Understanding this pipeline isn’t about empowering criminals; it’s about demystifying their power. When you see the nuts and bolts—the data scraping, the training, the rendering—the deepfake loses its aura of invincibility. You realize it’s a constructed illusion, and like any illusion, knowing the mechanics is the first step to seeing through it.

At TrueKnowledge Zone, we believe that informed vigilance is the strongest defense. By tracing the scammer’s path from a public social media post to a finished fraudulent video, we can identify the weak points where awareness and skepticism must intervene. This is a look inside the digital workshop where trust is counterfeited.

Phase 1: The Digital Hunting Ground – Sourcing the Raw Materials

Before a single line of code is run, the scammer needs data. This phase is passive, automated, and violates no laws—it simply harvests what’s already publicly available.

Step 1: Identifying the Target & Defining the Goal
The scammer starts with intent. Is this a financial scam targeting a business executive? A sextortion plot against an individual? A political disinformation campaign? The goal dictates the target. A CEO for a wire fraud scam. A public figure for a fake endorsement. An ordinary person with a clear social media presence for blackmail.

Step 2: Scraping the “Source” and “Destination” Data
For a face-swap deepfake, the scammer needs two sets of videos/images:

The Source (The Actor): This is the person performing the actions. The scammer needs a video where this person’s face is clearly visible, with a range of expressions and head angles. This could be a stock video, a movie clip, or even a video of themselves.
The Destination (The Target): This is the person whose face will be pasted onto the source. The scammer becomes a digital packrat, using automated tools to download every available photo and video of the target. They trawl:
- Social media profiles (Instagram, Facebook, LinkedIn).
- YouTube channels, TikTok accounts, webinar recordings.
- News interviews, public speaking engagements.
- Company “About Us” pages with headshots.
  The ideal is a “dataset” of hundreds or thousands of images from multiple angles and lighting conditions.

Step 3: Data Cleansing and Preparation
Raw scraped data is messy. Using simple software, the scammer:

Extracts Frames: Breaks target videos into thousands of individual image frames.
Face Cropping: Uses an automatic face-detection algorithm to isolate and crop out just the face in each frame.
Sorting and Filtering: Removes blurry images, images with obstructions (hands, microphones), and organizes the clean face shots.
This creates a tidy folder of “target face” images, the essential fuel for the AI model.

Phase 2: The Training Ground – Teaching the AI to Wear a Face

This is the core technical phase, but thanks to user-friendly apps, it often requires more patience than expertise.

Step 4: Choosing the Deepfake Software
The scammer selects their tool. Options range from:

Consumer Apps (e.g., Reface, Zao): Super simple, for quick, low-quality face-swaps on pre-set templates. Limited but fast.
Desktop Software (e.g., DeepFaceLab, FaceSwap): Open-source, free, and powerful. This is the tool of choice for serious scammers. It has a learning curve but offers high-quality, customizable results. It operates like a recipe: you feed in the data, configure settings, and run the scripts.
Cloud-Based Services: Some dark web forums offer “deepfake-as-a-service,” where you upload data and receive the finished video, requiring zero technical skill.

Step 5: The “Training” Process – Where the Forgery Learns
Using software like DeepFaceLab, the scammer sets up a “workspace.” They point the software to their folder of target face images and the source video.

The AI model (usually a type of neural network called a Generative Adversarial Network or GAN) begins its work. It tries to deconstruct the target’s face into a mathematical model—learning the geometry of their jawline, the curve of their lips, the pattern of their wrinkles.
It simultaneously analyzes the source video frame-by-frame.
In a process that can take hours to days on a good computer, the model practices. It tries to generate the target’s face in the pose and expression of the source actor. It fails, receives feedback, and tries again, millions of times over.
The scammer monitors the “preview” window, watching as a blurry, ghostly version of the target’s face slowly sharpens and aligns onto the source actor’s head. They wait until the preview looks convincing.

Phase 3: The Rendering & Polish – From Model to Master Forge

The trained model is just a set of instructions. Now it must create the final product.

Step 6: “Converting” the Source Video
The scammer initiates the “convert” or “merge” process. The software now processes the entire source video, frame-by-frame, applying the trained model. For each frame, it:

Detects the source actor’s face and its pose.
Uses its mathematical model to generate the target’s face in that exact pose and expression.
Blends the generated face onto the source video, matching skin tone and lighting.
This rendering process is computationally heavy but automatic. The output is a raw deepfake video: the target’s face moving and talking with the source actor’s body and voice.

Step 7: Post-Production and Audio Manipulation
The raw deepfake often has minor visual flaws. More importantly, it has the wrong voice.

Visual Cleanup: The scammer might use basic video editing software (like Adobe After Effects or DaVinci Resolve) to smooth out blending edges, add grain to match the source video, or correct color mismatches.
Voice Cloning (The Crucial Final Touch): This is where the scam becomes truly dangerous. Using a separate AI tool (like ElevenLabs, Murf.ai, or a cloned version of ChatGPT for voice), the scammer clones the target’s voice. They feed it a sample of the target’s voice (scraped from the same public sources) and type the script they want the fake video to “say.” The tool generates a synthetic audio track in the target’s voice.
Lip-Sync Refinement: Advanced scammers might use an additional AI tool to slightly adjust the mouth movements of the deepfake to better match the new, cloned audio track, making the final product eerily synchronized.

Phase 4: The Deployment – Weaponizing the Illusion

The finished video is a digital bullet. Now it needs to be fired.

Step 8: Crafting the Narrative and Delivery Method
The video alone isn’t enough. It needs a story.

For Fraud: The video is sent via a spoofed email or message to a subordinate, perhaps with the text, “Please process this wire transfer immediately. I’m in a meeting, confirm via text.” The video provides “proof” the order is legitimate.
For Sextortion: The fake explicit video is sent to the target along with threats to release it unless a cryptocurrency payment is made.
For Disinformation: The video is uploaded to social media with a sensationalist caption, often on a fake news-looking page, and then boosted via bots to gain momentum.

Step 9: Exploiting the “Verification Window”
The scammer’s entire timeline is built around exploiting the gap between the victim seeing the video and being able to verify its authenticity. They rely on:

Urgency: “Act now or the deal fails.”
Emotion: “Your child is in jail!”
Secrecy: “Don’t tell anyone, it will ruin my reputation.”
This pressure prevents the victim from doing the one thing that breaks the scam: pausing to verify through another channel.

A Real-World Walkthrough: The Fake CEO Urgent Wire

Target: CFO of a mid-sized company.
Data Scrape: Scraper collects all YouTube interviews, company webinar footage, and LinkedIn photos of the CFO.
Source Video: Scammer films themselves on a webcam, in a shirt and tie, saying a generic script with a “serious” expression.
Training: Using DeepFaceLab on a gaming laptop, they train the model for 48 hours.
Rendering & Audio: They render the fake video of the “CFO” giving the instructions. They clone the CFO’s voice from a webinar saying, “…and our quarterly results…” to say, “…I need you to wire $250,000 to account number…”
Deployment: Spoof the CFO’s email, attach the video file, and send to the accounts payable clerk at 8:05 AM on a Monday.

Your Defense: Knowing the Weak Points in Their Process

Understanding these steps reveals where you can fortify your position:

Limit Source Data: Be mindful of your public video/audio footprint. The scammer’s first step is harder if data is scarce.
Question Extraordinary Requests: Any urgent request for money or action based on a video or voice message should trigger an absolute verification rule.
Verify Through a Known Channel: If your “boss” sends a video instruction, you must hang up/close the video and call them on their known, official number to confirm. Break the scammer’s controlled channel.
Look for the “Tells”: In the video, look for unnatural eye blinking, strange skin textures, hair that doesn’t move quite right, or audio that doesn’t perfectly sync with lip movements. But never rely on this alone—always verify.

The step-by-step creation of a deepfake video by a scammer is a process of patient, digital craftsmanship applied to a malicious end. It is not magic, but a toolkit. By pulling back the curtain, we see the strings on the puppet. And when we see the strings, we are no longer fooled by the dance. Your greatest defense is the conscious, deliberate pause between seeing something astonishing and believing it to be true.

Frequently Asked Questions (FAQs)

1. How long does it actually take to make a convincing deepfake?
For a low-quality “cheapfake” using an app: minutes. For a high-quality, custom fraud-driven deepfake using tools like DeepFaceLab: anywhere from 2-7 days. This includes data collection (hours), model training (12-72 hours of unattended computer time), rendering (hours), and audio syncing (minutes to hours). The time investment is dropping rapidly.

2. Do you need a supercomputer to do this?
No. A modern, high-end gaming laptop or desktop with a powerful graphics card (GPU) like an NVIDIA RTX 4070 or 4080 is more than sufficient. The GPU does the heavy lifting of training the AI model. Consumer-grade hardware has made this accessible.

3. Is the audio deepfake made separately?
Almost always, yes. Voice cloning is a separate AI process, often using different software (like ElevenLabs or Resemble.ai). The scammer creates the visual deepfake first, then generates the cloned audio track to match, and finally syncs them together in video editing software.

4. What’s the hardest part for the scammer?
Getting high-quality, varied source data of the target’s face. If the target only has a few selfies from the same angle, the resulting deepfake will be stiff and uncanny. The more video footage with different expressions, angles, and lighting, the better and more versatile the fake.

5. Can they make a deepfake from just a single photo?
For a static image, yes—apps can animate a single photo to make it smile or talk, but it will look cartoonish and low-quality. For a convincing video deepfake that can turn its head and show expression, multiple source images/videos are essential. One photo is not enough for a sophisticated scam.

6. Are there ethical deepfake tools that scammers use?
Yes, and this is a major issue. Powerful, open-source tools like DeepFaceLab are ethically neutral. They are used by visual effects artists and researchers for legitimate work. Scammers simply co-opt these same widely available tools for fraud. The technology itself is not illegal; its malicious use is.

7. How do they get the mouth to move correctly with the new audio?
Basic scammers just rely on the mouth movements from their source actor. More advanced ones use lip-sync AI models. These tools can take an audio track and generate corresponding mouth shapes (visemes), which can then be applied to the deepfake face, making it match the cloned voice more precisely.

8. What’s the difference between a “face-swap” and a full “synthetic” deepfake?
The step-by-step above describes a face-swap—the most common scammer method. A full synthetic deepfake (like those made by OpenAI’s Sora) generates an entirely new person and scene from a text prompt. This is far more complex and computationally expensive, and not yet the go-to for most targeted financial or sextortion scams.

9. If the technology is so accessible, why aren’t we flooded with deepfakes?
We are seeing a flood, but it’s targeted. High-quality fakes require focus and intent. They are used for high-value scams (corporate fraud, sextortion) where the payoff justifies the effort. Low-quality, mass-produced fakes for disinformation are indeed proliferating on social media.

10. Knowing all this, what’s the #1 thing I should do?
Institute a “Trust but Verify” protocol for all high-stakes digital communication. If a family member, your boss, or a public figure on video asks for something extraordinary, your immediate reaction should be: “I need to verify this independently.” Hang up. Close the tab. Make your own phone call to a known number. That single habit breaks the scammer’s entire workflow.