← All journal entries

From Flask Form to Production Upload Pipeline

What started as a simple web form to upload images to ATEM switchers turned into a multi-service pipeline with tally-aware scheduling, a custom-modified C# exe, and a cross-platform architecture spanning Linux and Windows.

Where It Started

November 2024. I’d been at this job for about a year and a half, working with Blackmagic ATEM video switchers every day. One of the routine tasks was uploading images to the ATEMs’ media pools. The process was manual: open ATEM Software Control, navigate to the media pool, drag in an image, select the right slot. Multiply that across dozens of ATEMs and different production types and it gets old fast.

On top of that, we’d regularly get requests to change images by a specific date and time. That meant someone had to be available at that exact moment to manually run the uploads. Not ideal.

While looking for a better way, I found atemlib — an open-source C# project on GitHub that uses the Blackmagic SDK to upload images to ATEM media pool slots via command line. Someone had already set it up on a Windows machine. You’d run something like:

text
mediaupload.exe 192.168.81.85 1 "C:\path\to\image.png"

And it would upload the image to slot 1 on that ATEM. Worked fine, but still required someone to manually run commands. I wanted a web interface where people could just pick a device, pick a game type, upload an image, and go.

I had barely written any code before this. But I figured — how hard can a file upload form be?

The First Version: A Flask App

The very first version was embarrassingly simple. A Flask app with Bootstrap, a dropdown for production type, some input fields for the ATEM IP and media pool slot numbers, and a file upload button. When you hit submit, Flask saved the image, validated it was 1920x1080, and called MediaUpload.exe via subprocess.check_output().

app.py python
check_output(f'"{MEDIA_EXE}" {ip} {slot_a} "{file_path}"', shell=True)

It took me days to get this working. I kept getting “No selected file” errors and couldn’t figure out why. My image dimension validation had width != 1920 and height != 1080 instead of or, so it was letting wrong-sized images through. At one point my submit button just stopped working entirely and I had to debug that. Classic beginner stuff.

But eventually it worked. You could pick a production type, enter an IP, choose an image, and it would upload to the ATEM. I also pulled our equipment spreadsheet into the app so you could see the list of ATEMs and click one to pre-fill the IP on the upload page. That felt like magic at the time.

The Problem: MediaUpload.exe Didn’t Work on Newer ATEMs

Shortly after getting the web app working, someone told me the upload was failing on some of our newer ATEM models. The older ATEMs worked fine but the newer firmware ones just errored out. MediaUpload.exe would crash with a COM exception.

The atemlib project hadn’t been updated in about 7 years. Nobody was maintaining it. I’d never opened Visual Studio in my life, never written C#, had no idea what COM components were. But the tool was essential to the whole system, so I had to figure it out.

I cloned the repo, opened the .sln file in Visual Studio, and just started poking around. After a lot of googling and trial and error, I discovered the issue: the project was building as “Any CPU” which on modern systems defaulted to 32-bit. The Blackmagic SDK’s COM components on our machines were registered as 64-bit. The architecture mismatch caused the COM class registration to fail entirely on newer ATEM firmware.

The fix was almost anticlimactic: Build → Configuration Manager → change platform to x64 → Rebuild. That was it. The 7-year-old abandoned project worked perfectly on every ATEM model once it was compiled for the right architecture.

text
Rebuild All: 3 succeeded, 0 failed, 0 skipped

That was my first time building a C# project. I didn’t understand half of what I was looking at in the codebase, but I got it working.

Moving to Django and Adding Scheduling

The Flask app was fine for one-off uploads, but we needed scheduling. We’d get requests like “change these images on this date at this time.” I wanted people to be able to set up the upload whenever they received the request and have it execute automatically at the right moment. No more watching the clock. Flask wasn’t going to cut it for background job scheduling, and I was already outgrowing it in other ways. I migrated the whole app to Django.

The scheduling system had a platform problem though. My Django AV Site runs on a Linux server — that’s where all the web stuff, the database, the scheduler lives. But MediaUpload.exe only runs on Windows because it depends on the Blackmagic SDK COM components. There’s no Linux equivalent.

So I came up with a two-machine approach: keep the AV Site on Linux where it belongs, but set up a second machine running Windows with a lightweight Go server. The Go server’s only job is to receive images and upload parameters from Django’s scheduler, then call MediaUpload.exe locally and report back the results.

text
Django AV Site (Linux) → HTTP POST with image → Go Server (Windows) → MediaUpload.exe → ATEM

The Django side has a background scheduler (run_scheduler.py) that runs as a management command, polling the database for pending jobs. When a job’s scheduled time arrives, it sends the image and parameters to the Go server, which handles the actual upload.

The Tally Problem: Don’t Upload While On Air

Here’s where it got interesting. These ATEMs are running live productions. The media pool slots hold images that get displayed on screen during broadcasts. Our switchers have macros that are triggered automatically — cycling through states, switching sources, keying images on and off program in a repeating pattern. If you upload a new image to a slot that’s currently being used on the live output, viewers see a flash of corrupted garbage while the image data gets overwritten. Not acceptable.

So I needed logic to check: is the slot I’m about to upload to currently loaded in a media player, and is that media player on program (live)?

First attempt was in run_scheduler.py using PyATEMMax (a Python ATEM library). Before each upload, connect to the ATEM, check media player sources, check tally status, decide if it’s safe:

run_scheduler.py python
# Check Media Player 1
mp1_index = atem.mediaPlayer.source[0].stillIndex
if mp1_index is not None:
    mp1_slot = mp1_index + 1  # Convert 0-based to 1-based
    # Check if this media player is on program
    for source_id in [1000, 3010]:
        flag = flags[source_id]
        if flag.program:
            slots_blocked.append(mp1_slot)

If a slot was blocked, the scheduler would skip it and upload the other slots first, then loop back and wait for the blocked ones to become available. Different production types had different rules — some could upload immediately because their images were never on air during the upload window, others needed the full tally check.

This worked… most of the time.

COM Errors and Connection Limits

Then the failures started. Random COM exceptions during upload, especially when multiple uploads were queued close together. Sometimes slot 2 and 4 would upload fine but slots 1 and 3 would crash.

I spent a long time debugging this. Eventually I found the root cause: ATEM switchers have a limited number of simultaneous connections (typically 5-9 depending on the model). My system was making a new connection for every tally check AND every upload. Check tally → disconnect → upload → disconnect → check tally → disconnect → upload. Each one of those is a full COM connection cycle. Hit that too fast and the ATEM runs out of connection slots, or doesn’t release the previous one in time.

I tried connection pooling in Python — keep one persistent connection and reuse it. That helped with the tally checks. But the actual upload still went through MediaUpload.exe which created its own separate connection. So for a 4-slot upload, you’d have: one Python connection for tally checking, plus four separate MediaUpload.exe connections (one per image, since each call was a separate process).

That’s potentially 5 connections hitting the ATEM in rapid succession. On a switcher that maxes out at 5-9 connections, with other things like ATEM Software Control and Companion already connected, we were right at the limit.

Going Back Into the C# Code

This is when I realized the tally checking and the uploading needed to happen in the same connection. Not Python checking tally and then a separate exe uploading — one process, one connection, doing both.

So I went back into the MediaUpload.exe Visual Studio project I’d fixed months earlier. This time I wasn’t just changing a build target — I was actually modifying C# code.

First change: batch mode. Instead of uploading one image per execution, accept multiple slot:filename pairs:

text
mediaupload.exe 192.168.81.85 "1:image1.png" "2:image2.png" "3:image3.png" "4:image4.png"

One connection, all images uploaded sequentially. This alone eliminated 75% of the connection churn.

Second change: built-in tally checking with a cycle-wait strategy. This was the clever part. Because the macros are triggered automatically, the ATEM cycles through states on its own — a slot goes live (on program), then goes cold, then live again, in a repeating pattern. You can’t just check “is it cold right now?” and upload, because a macro might bring it back live mid-upload. You need to catch the rhythm.

The logic I implemented works like this:

  1. If the slot is currently live (sourced to a media player that’s on program), wait for it to go cold first
  2. Then wait up to 40 seconds for it to go live again — this is the observation window to detect if the slot is in a macro rotation
  3. If it goes live, wait for cold again — now you’ve seen a full cycle and know you have the longest possible safe window
  4. If it never goes live in 40 seconds, assume it’s not in rotation and just upload
MediaUpload.cs csharp
// The core cycle-wait logic (simplified)
// Step 1: If currently live, wait for cold
while (!switcher.IsSlotSafeToUpload(slot))
    Thread.Sleep(50);

// Step 2: Wait up to 40s for it to go live (detect rotation)
while (elapsedSeconds < 40)
{
    if (!switcher.IsSlotSafeToUpload(slot))
    {
        wentLive = true;  // It's in rotation
        break;
    }
    Thread.Sleep(50);
}

// Step 3: If it went live, wait for cold again — now upload
if (wentLive)
    while (!switcher.IsSlotSafeToUpload(slot))
        Thread.Sleep(50);

“Safe to upload” means: the slot is either not sourced to any media player, or the media player it’s on isn’t currently on program. The whole thing has a 5-minute maximum timeout so it doesn’t hang forever.

Third change: smart skip ordering. Upload available slots first, then wait and retry blocked ones. This way if 3 out of 4 slots are free, they upload immediately while the system waits for the 4th.

I also added a -s flag to skip tally checking entirely for production types that don’t need it, cleaned up the progress logging (the original spammed Progress: 0% hundreds of times between images), and made sure the progress output was parseable so the Go server could track upload percentage.

Rebuilding in Visual Studio was still a bit of an adventure — I kept forgetting to set x64, kept opening folder view instead of the solution, kept hitting “Build” instead of “Rebuild” after changes. But the result was solid.

The Final Pipeline

Here’s how it all works now:

  1. User schedules upload via the Django web interface — picks the ATEM, the production type, the slots, the images, the time
  2. Django scheduler picks up the job when it’s time, determines the upload rules for that production type
  3. Scheduler sends the images and parameters to the Go server on Windows
  4. Go server calls MediaUpload.exe with all slot:image pairs in a single batch command
  5. MediaUpload.exe connects to the ATEM once, and for each slot runs the cycle-wait strategy — observing the macro rotation pattern to find the safest upload window, then uploading during the cold period
  6. Go server monitors the exe output, tracks progress, reports back to Django
  7. Django updates the job status with detailed logs — which slots uploaded, which were blocked and when they freed up, timestamps for everything

If anything fails, there are retries at multiple levels. The exe retries blocked slots. The Go server retries failed executions. Django retries failed jobs. And users can see all of this happening in real-time on the scheduling dashboard.

The Linux Dream (And Why Windows Still Wins… For Now)

Having a dedicated Windows machine just to run one exe always bugged me. I explored every option to get media uploads working natively on Linux.

First I tried Wine — running the .exe directly on Linux. It crashed immediately with could not load kernel32.dll. The exe is a .NET application that depends on COM components from the Blackmagic SDK, which are deeply Windows-specific. Wine couldn’t handle it.

Then I tried Mono — Linux’s .NET runtime. Got further, but hit System.PlatformNotSupportedException. The Blackmagic SDK COM interop just doesn’t exist outside Windows. Dead end.

Then I found OpenSwitcher (pyatem) — an open-source project that reverse-engineered the ATEM protocol and implemented it in Python, completely independent of the Blackmagic SDK. It has media upload functionality built in. I built a script using pyatem that replicated the upload workflow: connect to the ATEM, grab the video mode to get the target resolution, load and resize the image, convert pixel data from RGBA to the ATEM’s native format using pyatem’s rgb_to_atem converter, then call protocol.upload() which handles locking the media pool, sending the image data in compressed chunks, and unlocking when done. I even tried tuning the transport batch settings — increasing batch size and reducing delay between packets to speed up transfers.

It worked… sometimes. The upload mechanism in pyatem involves a multi-step handshake — lock the media pool store, transfer data in chunks with the ATEM sending back acknowledgments and transfer budget updates, then release the lock. If any step hiccups — a dropped packet, a timing issue, the ATEM not responding to the lock request fast enough — the whole thing breaks. Uploads would succeed on one ATEM and silently fail on another, or work three times in a row and then hang on the fourth.

I didn’t have time to dig deeper into the reliability issues, and the Windows Go server setup was already rock solid in production. So for now, Windows stays. But I’m sure I’ll revisit the pyatem approach in the future — getting rid of that Windows dependency entirely would be the ideal end state.

What I Learned

This project spans basically my entire journey of teaching myself to code. The first version was a Flask form where I couldn’t figure out why file uploads weren’t working. The latest version involves a multi-service architecture across Linux and Windows with a custom-modified C# executable doing real-time tally checking against live broadcast equipment.

A few specific takeaways:

Don’t be afraid of old code. MediaUpload.exe was abandoned for 7 years and broken on modern hardware. The fix was a one-line config change (build target to x64). Later, modifying the actual C# to add batch mode and tally checking wasn’t as scary as I thought — the codebase was well-structured and the Blackmagic SDK did most of the heavy lifting.

Connection limits matter. The COM errors that plagued us for weeks were ultimately about making too many connections too fast. The fix wasn’t more retries or longer delays — it was reducing the number of connections by doing more work per connection. Batch mode with single-connection tally checking eliminated the problem entirely.

Build the right thing at the right layer. We went through several iterations of where to put the tally logic — Python scheduler, Go server, the exe itself. It belonged in the exe because that’s where the ATEM connection already exists. Putting it there meant one connection instead of two separate systems both talking to the ATEM.

What started as “I just want a web form to upload images” turned into a production-grade automated content delivery system. The scope creep was real, but each addition solved a real problem that people actually ran into. And I learned a massive amount along the way.

← All journal entries