Why Core Audio Taps Silently Failed (And What Actually Works)

I’m building Debrief — a macOS-native call recording tool that captures both your microphone and system audio, transcribes them with speaker separation, and gives you a terminal workspace to review meetings and query the transcript with an LLM.

The first thing I had to solve was system audio capture. macOS doesn’t make this easy. Here’s what I tried, what went wrong, and what actually worked.

The Plan: Core Audio Process Taps

macOS 14.2 introduced AudioHardwareCreateProcessTap — a new API that lets you tap into all system audio output without needing a virtual audio driver like BlackHole. For a native app like Debrief, this seemed perfect:

Audio-only (no screen capture permission needed)
No purple recording indicator in the menu bar
Per-process filtering possible
Clean, first-party API

The implementation looked straightforward:

let tapDescription = CATapDescription()
tapDescription.processes = [] // global tap — all processes
tapDescription.isMixdown = true
tapDescription.isPrivate = true

var tapID: AudioObjectID = kAudioObjectUnknown
AudioHardwareCreateProcessTap(tapDescription, &tapID)

// Attach to an aggregate device
let description: [String: Any] = [
    kAudioAggregateDeviceNameKey: "Debrief-Tap",
    kAudioAggregateDeviceUIDKey: UUID().uuidString,
    kAudioAggregateDeviceIsPrivateKey: true,
    kAudioAggregateDeviceTapAutoStartKey: true,
    kAudioAggregateDeviceTapListKey: [[
        kAudioSubTapUIDKey: tapDescription.uuid.uuidString,
        kAudioSubTapDriftCompensationKey: true
    ]]
]

AudioHardwareCreateAggregateDevice(description as CFDictionary, &aggDeviceID)
AudioDeviceStart(aggDeviceID, deviceProcID)

All the right steps. Runs without errors. Logs show process tap started successfully. Then… nothing.

The First Wall: `OSStatus 1852797029`

Early runs returned OSStatus 1852797029 — the FourCC value 'nope', officially kAudioHardwareNotPermittedError. Apple’s HAL deliberately uses a single generic error code for all refusals, so you can’t tell from the error alone what went wrong.

After research (including a 9-agent parallel deep-dive across Apple docs, GitHub, and developer forums), the root causes are:

Wrong TCC permission tier — macOS has three separate audio permissions. “Screen & System Audio Recording” is for ScreenCaptureKit. Core Audio taps require a different permission: kTCCServiceAudioCapture, which appears separately as “System Audio Recording Only” in System Settings. Granting the wrong one still gives you 'nope'.
Missing NSAudioCaptureUsageDescription — without this key in your Info.plist, the permission dialog never appears and the system silently fails.
Ad-hoc code signing — every swift build with codesign --sign - produces a different signature. TCC treats each rebuild as a new app and forgets prior grants. The fix is signing with an Apple Development certificate (free with any Apple ID).
macOS 26 Tahoe regression — plain CLI executables no longer appear in Privacy settings. The binary must run from an .app bundle with a valid Info.plist.

We fixed all of these: bundled the daemon as Debrief.app, switched to Apple Development cert signing, added NSAudioCaptureUsageDescription and com.apple.security.device.audio-input to the entitlements. The 'nope' error went away.

The Second Wall: The IOProc That Never Fires

After fixing permissions, the tap created successfully — AudioHardwareCreateProcessTap returned noErr, the aggregate device came up, AudioDeviceStart succeeded. But system_chunks stayed permanently at zero across every test, with audio actively playing.

The logs were clean. No errors. Just silence:

{"system_chunks": 0, "mic_chunks": 99, "system_buffer_samples": 0}

We tried:

Tap-only aggregate device (no sub-device) — IOProc never called
Adding the output device as MainSubDevice — IOProc never called
Reading tap UID back via property vs using tapDescription.uuid directly — no difference
Both AirPods Pro and MacBook Pro Speakers — no difference
Audio playing vs silence — no difference

To isolate whether the problem was in our code or in Core Audio itself, we tested systemAudioDump — a minimal open-source implementation.

It worked. But then we read its source.

The Reveal: It Was Using ScreenCaptureKit All Along

The repo is named systemAudioDump and it “captures system audio”. But its implementation uses SCStream from ScreenCaptureKit — not Core Audio taps at all.

let stream = SCStream(filter: filter, configuration: cfg, delegate: self)
try stream.addStreamOutput(self, type: .audio, sampleHandlerQueue: ...)
try await stream.startCapture()

This was the inflection point. Core Audio process taps are a new, poorly-documented API with known HAL bugs (Apple’s own sample code has a confirmed bug: FB17411663 targets the wrong AudioObjectID). The IOProc not firing with a tap-only aggregate is likely a HAL clock source issue — without a real output device as a sub-device, the HAL has no clock to drive callbacks. But adding a sub-device creates other problems.

ScreenCaptureKit, by contrast, is battle-tested. OBS uses it. Granola (the meeting transcription app) uses it. It works.

The Switch: ScreenCaptureKit for System Audio

Rewriting SystemAudioCapture.swift to use SCStream took about 150 lines:

let config = SCStreamConfiguration()
config.capturesAudio = true
config.captureMicrophone = false
config.excludesCurrentProcessAudio = true
config.minimumFrameInterval = CMTime(value: 10, timescale: 1) // 0.1 fps — discard video

let filter = SCContentFilter(display: display,
    excludingApplications: [], exceptingWindows: [])

let scStream = SCStream(filter: filter, configuration: config, delegate: self)
try scStream.addStreamOutput(self, type: .audio, sampleHandlerQueue: processingQueue)
try scStream.addStreamOutput(self, type: .screen, sampleHandlerQueue: processingQueue)
try await scStream.startCapture()

You have to add both .audio and .screen output handlers — SCStream will log errors and behave erratically if you only add audio. But in the .screen handler, you just return immediately and discard the frame.

The audio handler receives CMSampleBuffer objects. We convert them to our internal AudioChunk format via AVAudioConverter, resampling to 48kHz stereo:

func stream(_ stream: SCStream, didOutputSampleBuffer buffer: CMSampleBuffer,
            of type: SCStreamOutputType) {
    guard type == .audio else { return } // discard video frames
    // convert CMSampleBuffer → [Float] via AVAudioConverter
    // yield to AsyncStream<AudioChunk>
}

First test after the switch:

{"system_chunks": 761, "mic_chunks": 149,
 "system_buffer_samples": 981120}

system.pcm — 5.5MB. Working.

What I’d Tell Myself at the Start

Core Audio process taps are real, but rough. The API exists, the permission model works, and commercial apps use it. But on macOS 26 Tahoe with a CLI/daemon architecture, getting the HAL to drive the IOProc requires specific aggregate device configuration that isn’t documented anywhere obvious. If you’re building something quickly and you already have Screen Recording permission, skip the pain.

The TCC permission split matters. “Screen & System Audio Recording” and “System Audio Recording Only” are genuinely different TCC services. Granting the wrong one gives you a clean startup and zero audio. Always reset with tccutil reset AudioCapture <bundle-id> before testing.

App bundles are mandatory on macOS 26. Plain CLI binaries don’t appear in Privacy settings and can’t present permission dialogs. Bundle your daemon as .app, even if it only runs headlessly.

Stable code signing is essential. Ad-hoc signing (codesign --sign -) breaks TCC grants on every rebuild. Use an Apple Development certificate — it’s free.

ScreenCaptureKit requires Screen Recording permission, not just audio. That’s a broader permission scope than Core Audio taps. Weigh this against your privacy story. For Debrief, the tradeoff is acceptable — the user explicitly starts a recording session, and the purple indicator only appears for screen captures, not audio-only SCStream.

Current State

Phase 1 of Debrief is complete: both channels captured, separated, written to disk as PCM. Mic via AVAudioEngine, system audio via ScreenCaptureKit. Device switching (AirPods → speakers mid-call) detected and handled automatically.

Next up: transcription model evaluation. WhisperKit runs fully on-device — the question is which model size gives the best quality/latency tradeoff on Apple Silicon before we commit to it for real-time use.