Designing a Swift library with data-race safety

I cut an initial release (0.1.0-alpha) of the library automerge-repo-swift. A supplemental library to Automerge swift, it adds background networking for sync and storage capabilities. The library extends code I initially created in the Automerge demo app (MeetingNotes), and was common enough to warrant its own library. While I was extracting those pieces, I leaned into the same general pattern that was used in the Javascript library automerge-repo. That library provides largely the same functionality for Automerge in javascript. I borrowed the public API structure, as well as compatibility and implementation details for the Automerge sync protocol. One of my goals while assembling this new library was to build it fully compliant with Swift’s data-race safety. Meaning that it compiles without warnings when I use the Swift compiler’s strict-concurrency mode.

There were some notable challenges in coming up to speed with the concepts of isolation and sendability. In addition to learning the concepts, how to apply them is an open question. Not many Swift developers have embraced strict concurrency and talked about the trade-offs or implications for choices. Because of that, I feel that there’s relatively little available knowledge to understand the trade-offs to make when you protect mutable state. This post shares some of the stumbling blocks I hit, choices I made, and lessons I’ve learned. My hope is that it helps other developers facing a similar challenge.

Framing the problem

The way I try to learn and apply new knowledge to solve these kinds of “new fangled” problems is first working out how to think about the problem. I’ve not come up with a good way to ask other people how to do that. I think when I frame the problem with good first-principles in mind, trade-offs in solutions become easier to understand. Sometimes the answers are even self-obvious.

The foremost principle in strict-concurrency is “protect your mutable state”. The compiler warnings give you feedback about potential hazards and data-races. In Swift, protecting the state uses a concept of an “isolation domain”. My layman’s take on isolation is “How can the compiler verify that only one thread is accessing this bit of data at a time”. There are some places where the compiler infers the state of isolation, and some of them still changing as we progress towards Swift 6. When you’re writing code, the compiler knows what is isolated (and non-isolated) – either by itself or based on what you annotated. When the compiler infers an isolation domain, that detail is not (yet?) easily exposed to developers. It really only shows up when there’s a mismatch in your assumptions vs. what the compiler thinks and it issues a strict-concurrency warning.

Sendability is the second key concept. In my layman’s terms again, something that is sendable is safe to cross over thread boundaries. With Swift 5.10, the compiler has enough knowledge of types to be able to make guarantees about what is safe, and what isn’t.

The first thing I did was lean heavily into making anything and everything Sendable. In hindsight, that was a bit of a mistake. Not disastrous, but I made a lot more work for myself. Not everything needs to be sendable. Taking advantage of isolation, it is fine – sometimes notably more efficient and easier to reason about – to have and use non-sendable types within an isolation domain. More on that in a bit.

My key to framing up the problem was to think in terms of making explicit choices about what data should be in an isolation region along with how I want to pass information from one isolation domain to another. Any types I pass (generally) need to be Sendable, and anything that stays within an isolation domain doesn’t. For this library, I have a lot of mutable state: networking connections, updates from users, and a state machines coordinating it all. All of it needed so a repository can store and synchronize Automerge documents. Automerge documents themselves are Sendable (I had that in place well before starting this work). I made the Automerge documents sendable by wrapping access and updates to anything mutable within a serial dispatch queue. (This was also needed because the core Automerge library – a Rust library accessed through FFI – was not safe for multi-threaded use).

Choosing Isolation

I knew I wanted to make at least one explicit isolation domain, so the first question was “Actor or isolated class?” Honestly, I’m still not sure I understand all the tradeoffs. Without knowing what the effect would be to start off with, I decided to pick “let’s use actors everywhere” and see how it goes. Some of the method calls in the design of the Automerge repository were easily and obviously async, so that seemed like a good first cut. I made the top-level repo an actor, and then I kept making any internal type that had mutable state also be it’s own actor. That included a storage subsystem and a network subsystem, both of which I built to let someone else provide the network or storage provider external to this project. To support external plugins that work with this library, I created protocols for the storage and network provider, as well as one that the network providers use to talk back to the repository.

The downside of that choice was two-fold – first setting things up, then interacting with it from within a SwiftUI app. Because I made every-darn-thing an actor, I hade to await a response, which meant a lot of potential suspension points in my code. That also propagated to imply even setup needed to be done within an async context. Sometimes that’s easy to arrange, but other times it ends up being a complete pain in the butt. More specifically, quite a few of the current Apple-provided frameworks don’t have or provide a clear path to integrate async setup hooks. The server-side Swift world has a lovely “set up and run” mechanism (swift-service-lifecycle) it is adopting, but Apple hasn’t provided a similar concept the frameworks it provides. The one that bites me most frequently is the SwiftUI app and document-based app lifecycle, which are all synchronous.

Initialization Challenges

Making the individual actors – Repo and the two network providers I created – initializable with synchronous calls wasn’t too bad. The stumbling block I hit (that I still don’t have a great solution to) was when I wanted to add and activate the network providers to a repository. To arrange that, I’m currently using a detached Task that I kick off in the SwiftUI App’s initializer:

public let repo = Repo(sharePolicy: .agreeable)
public let websocket = WebSocketProvider()
public let peerToPeer = PeerToPeerProvider(
    PeerToPeerProviderConfiguration(
        passcode: "AutomergeMeetingNotes",
        reconnectOnError: true,
        autoconnect: false
    )
)

@main
struct MeetingNotesApp: App {
    var body: some Scene {
        DocumentGroup {
            MeetingNotesDocument()
        } editor: { file in
            MeetingNotesDocumentView(document: file.document)
        }
        .commands {
            CommandGroup(replacing: CommandGroupPlacement.toolbar) {
            }
        }
    }

    init() {
        Task {
            await repo.addNetworkAdapter(adapter: websocket)
            await repo.addNetworkAdapter(adapter: peerToPeer)
        }
    }
}

Swift Async Algorithms

One of the lessons I’ve learned is that if you find yourself stashing a number of actors into an array, and you’re used to interacting with them using functional methods (filter, compactMap, etc), you need to deal with the asynchronous access. The standard library built-in functional methods are all synchronous. Because of that, you can only access non-isolated properties on the actors. For me, that meant working with non-mutable state that I set up during actor initialization.

The second path (and I went there) was to take on a dependency to swift-async-algorithms, and use its async variations of the functional methods. They let you “await” results for anything that needs to cross isolation boundaries. And because it took me an embarrasingly long time to figure it out: If you have an array of actors, the way to get to an AsyncSequence of them is to use the async property on the array after you’ve imported swift-async-algorithms. For example, something like the following snippet:

let arrayOfActors: [YourActorType] = []
let filteredResults = arrayOfActors.async.filter(...)

Rethinking the isolation choice

That is my first version of this library. I got it functional, then turned around and tore it apart again. In making everything an actor, I was making LOTS of little isolation regions that the code had to hop between. With all the suspension points, that meant a lot of possible re-ordering of what was running. I had to be extrodinarily careful not to assume a copy of some state I’d nabbed earlier was still the same after the await. (I still have to be, but it was a more prominent issue with lots of actors.) All of this boils down to being aware of actor re-entrancy, and when it might invalidate something.

I knew that I wanted at least one isolation region (the repository). I also want to keep mutable state in separate types to preserve an isolation of duties. One particular class highlighted my problems – a wrapper around NWConnection that tracks additional state with it and handles the Automerge sync protocol. It was getting really darned inconvenient with the large number of await suspension points.

I slowly clued in that it would be a lot easier if that were all synchronous – and there was no reason it couldn’t be. In my ideal world, I’d have the type Repo (my top-level repository) as an non-global actor, and isolate any classes it used to the same isolation zone as that one, non-global, actor. I think that’s a capability that’s coming, or at least I wasn’t sure how to arrange that today with Swift 5.10. Instead I opted to make a single global actor for the library and switch what I previously set up as actors to classes isolated to that global actor.

That let me simplify quite a bit, notably when dealing with the state of connections within a network adapter. What surprised me was that when I switched from Actor to isolated class, there were few warnings from the change. The changes were mostly warnings that calls dropped back to synchronous, and no longer needed await. That was quick to fix up; the change to isolated classes was much faster and easier than I anticipated. After I made the initial changes, I went through the various initializers and associated configuration calls to make more of it explicitly synchronous. The end result was more code that could be set up (initialized) without an async context. And finally, I updated how I handled the networking so that as I needed to track state, I didn’t absolutely have to use the async algorithsm library.

A single global actor?

A bit of a side note: I thought about making Repo a global actor, but I prefer to not demand a singleton style library for it’s usage. That choice made it much easier to host multiple repositories when it came time to run functional tests with a mock In-Memory network, or integration tests with the actual providers. I’m still a slight bit concerned that I might be adding to a long-term potential proliferation of global actors from libraries – but it seems like the best solution at the moment. I’d love it if I could do something that indicated “All these things need a single isolation domain, and you – developer – are responsible for providing one that fits your needs”. I’m not sure that kind of concept is even on the table for future work.

Recipes for solving these problems

If you weren’t already aware of it, Matt Massicotte created a GitHub repository called ConcurrencyRecipes. This is a gemstone of knowledge, hints, and possible solutions. I leaned into it again and again while building (and rebuilding) this library. One of the “convert it to async” challenges I encountered was providing an async interface to my own peer-to-peer network protocol. I built the protocol using the Network framework based (partially on Apple’s sample code), which is all synchronous code and callbacks. A high level, I wanted it to act similarly URLSessionWebSocketTask. This gist being a connection has an async send() and an async receive() for sending and receiving messages on the connection. With an async send and receive, you can readily assemble several different patterns of access.

To get there, I used a combination of CheckedContinuation (both the throwing and non-throwing variations) to work with what NWConnection provided. I wish that was better documented. How to properly use those APIs is opaque, but that is a digression for another time. I’m particular happy with how my code worked out, including adding a method on the PeerConnection class that used structured concurrency to handle a timeout mechanism.

Racing tasks with structured concurrency

One of the harder warnings for me to understand was related to racing concurrent tasks in order to create an async method with a “timeout”. I stashed a pattern for how to do this in my notebook with references to Beyond the basics of structured concurrency from WWDC23.

If the async task returns a value, you can set it up something like this (this is from PeerToPeerConnection.swift):

let msg = try await withThrowingTaskGroup(of: SyncV1Msg.self) { group in
    group.addTask {
        // retrieve the next message
        try await self.receiveSingleMessage()
    }

    group.addTask {
        // Race against the receive call with a continuous timer
        try await Task.sleep(for: explicitTimeout)
        throw SyncV1Msg.Errors.Timeout()
    }

    guard let msg = try await group.next() else {
        throw CancellationError()
    }
    // cancel all ongoing tasks (the websocket receive request, in this case)
    group.cancelAll()
    return msg
}

There’s a niftier version available in Swift 5.9 (which I didn’t use) for when you don’t care about the return value:

func run() async throws {
    try await withThrowingDiscardingTaskGroup { group in
        for cook in staff.keys {
            group.addTask { try await cook.handleShift() }
        }

        group.addTask { // keep the restaurant going until closing time
            try await Task.sleep(for: shiftDuration)
            throw TimeToCloseError()
        }
    }
}

With Swift 5.10 compiler, my direct use of this displayed a warning:

warning: passing argument of non-sendable type 'inout ThrowingTaskGroup<SyncV1Msg, any Error>' outside of global actor 'AutomergeRepo'-isolated context may introduce data races

guard let msg = try await group.next() else {
                          ^

I didn’t really understand the core of this warning, so I asked on the Swift forums. VNS (on the forums) had run into the same issue and helped explain it:

It’s because withTaskGroup accepts a non-Sendable closure, which means the closure has to be isolated to whatever context it was formed in. If your test() function is nonisolated, it means the closure is nonisolated, so calling group.waitForAll() doesn’t cross an isolation boundary.

The workaround to handle the combination of non-sendable closures and TaskGroup is to make the async method that runs this code nonisolated. In the context I was using it, the class that contains this method is isolated to a global actor, so it’s inheriting that context. By switching the method to be explicitly non-isolated, the compiler doesn’t complain about group being isolated to that global actor.

Sharing information back to SwiftUI

These components have all sorts of interesting internal state, some of which I wanted to export. For example, to provide information from the network providers to make a user interface (in SwiftUI). I want to be able to choose to connect to endpoints, to share what endpoints might be available (from the NWBrowser embedded in the peer to peer network provider), and so forth.

I first tried to lean into AsyncStreams. While they make a great local queue for a single point to point connection, I found they were far less useful to generally make a firehouse of data that SwiftUI knows how to read and react to. While I tried to use all the latest techniques, to handle this part I went to my old friend Combine. Some people are effusing that Combine is dead and dying – but boy it works. And most delightfully, you can have any number of endpoints pick up and subscribe to a shared publisher, which was perfect for my use case. Top that off with SwiftUI having great support to receive streams of data from Combine, and it was an easy choice.

I ended up using Combine publishers to make a a few feeds of data from the PeerToPeerProvider. They share information about what other peers were available, the current state of the listener (that accepts connections) and the browser (that looks for peers), and last a publisher that provides information about active peer to peer connctions. I feel that worked out extremely well. It worked so well that I made an internal publisher (not exposed via the public API) for tests to get events and state updates from within a repository.

Integration Testing

It’s remarkably hard to usefully unit test network providers. Instead of unit testing, I made a separate Swift project for the purposes of running integration tests. It sits in it’s own directory in the git repository and references automerge-repo-swift as a local dependency. A side effect is that it let me add in all sorts of wacky dependencies that were handy for the integration testing, but that I really didn’t want exposed and transitive for the main package. I wish that Swift Packages had a means to identify test-only dependencies that didn’t propagate to other packages for situations like this. Ah well, my solution was a separate sub-project.

Testing using the Combine publisher worked well. Although it took a little digging to figure out the correct way to set up and use expectations with async XCTests. It feels a bit exhausting to assemble the expectations and fulfillment calls, but its quite possible to get working. If you want to see this in operation, take a look at P2P+explicitConnect.swift. I started to look at potentially using the upcoming swift-testing, but with limited Swift 5.10 support, I decided to hold off for now. If it makes asynchronous testing easier down the road, I may well adopt it quickly after it’s initial release.

The one quirky place that I ran into with that API setup was that expectation.fulfill() gets cranky with you if you call it more than once. My publisher wasn’t quite so constrained with state updates, so I ended up cobbling a boolean latch variable in a sink when I didn’t have a sufficiently constrained closure.

The other quirk in integration testing is that while it works beautifully on a local machine, I had a trouble getting it to work in CI (using GitHub Actions). Part of the issue is that the current swift test defaults to running all possible tests at once, in parallel. Especially for integration testing of peer to peer networking, that meant a lot of network listeners, and browsers, getting shoved together at once on the local network. I wrote a script to list out the tests and run them one at a time. Even breaking it down like that didn’t consistently get through CI. I also tried higher wait times (120 seconds) on the expectations. When I run them locally, most of those tests take about 5 seconds each.

The test that was a real challenge was the cross-platform one. Automerge-repo has a sample sync server (NodeJS, using Automerge through WASM). I created a docker container for it, and my cross-platform integration test pushes and pulls documents to an instance that I can run in Docker. Well… Docker isn’t available for macOS runners, so that’s out for GitHub Actions. I have a script that spins up a local docker instance, and I added a check into the WebSocket network provider test – if it couldn’t find a local instance to work against, it skips the test.

Final Takeaways

Starting with a plan for isolating state made the choices of how and what I used a bit easier, and reaching for global-actor constrained classes made synchronous use of those classes much easier. For me, this mostly played out in better (synchronous) intializers and dealing with collections using functional programming patterns.

I hope there’s some planning/thinking in SwiftUI to update or extend the app structure to accomodate async hooks for things like setup and initialization (FB9221398). That should make it easier for a developer to run an async initializer and verify that it didn’t fail, before continuing into the normal app lifecycle. Likewise, I hope that the Document-based APIs gain an async-context to work with documents to likewise handle asynchronous tasks (FB12243722). Both of these spots are very awkward places for me.

Once you shift to using asynchronous calls, it can have a ripple effect in your code. If you’re looking at converting existing code, start at the “top” and work down. That helped me to make sure there weren’t secondary complications with that choice (such as a a need for an async initializer).

Better yet, step back and take the time to identify where mutable state exists. Group it together as best you can, and review how you’re interacting it, and in what isolation region. In the case of things that need to be available to SwiftUI, you can likely isolate methods appropriately (*cough* MainActor *cough*). Then make the parts you need to pass between isolation domains Sendable. Recognize that in some cases, it may be fine to do the equivalent of “Here was the state at some recent moment, if you might want to react to that”. There are several places where I pass back a summary snapshot of mutable state to SwiftUI to use in UI elements.

And do yourself a favor and keep Matt’s Concurrency Recipes on speed-dial.

Before I finished this post, I listened to episode 43 of the Swift Package Index podcast. It’s a great episode, with Holly Bora, compiler geek and manager of the Swift language team, on as a guest to talk about the Swift 6. A tidbit she shared was that they are creating a Swift 6 migration guide, to be published on the swift.org website. Something to look forward to, in addition to Matt’s collection of recipes!

Distributed Tracing with Testing on iOS and macOS

This weekend I was frustrated with my debugging, and just not up to digging in and carefully, meticulously analyzing what was happening. So … I took a left turn (at Alburquerque) and decided to explore an older idea to see if it was interesting and/or useful. My challenging debugging was all about network code, for a collaborative, peer to peer sharing thing; more about that effort some other time.

A bit of back story

A number of years ago when I was working with a solar energy manufacturer, I was living and breathing events, APIs, and running very distributed, sometimes over crap network connections, systems. One of the experiments I did (that worked out extremely well) was to enable distributed tracing across the all the software components, collecting and analyzing traces to support integration testing. Distributed tracing, and the now-popular CNCF OpenTelemetry project weren’t a big thing, but they were around – kind of getting started. The folks (Yuri Shkuro, at least) at Uber had released Jaeger, an open-source trace collector with web-based visualization, which was enough to get started. I wrote about that work back in 2019 (that post still gets some recurring traffic from search engines, although it’s pretty dated now and not entirely useful).

We spun up our services, enabled tracing, and ran integration tests on the whole system. After which, we had the traces available for visual review. It was useful enough that we ended up evolving it so that a single developer could stand up most of their pieces locally (with a sufficiently beefy machine), and capture and view the traces locally. That provided a great feedback loop as they could see performance and flows in the system while they were developing fixes, updates and features. I wanted to see, this time with an iOS/macOS focused library, how far I could get trying to replicate that idea (time boxed to the weekend).

The Experiment!

I’ve been loosely following the server-side swift distributed tracing efforts since it started, and it looked pretty clear that I could use it directly. Moritz Lang publishes swift-otel, which is a Swift native, concurrency supported library. With his examples, it was super quick to hack into my test setup. The library is set up to run with service-lifecycle pieces over SwiftNIO, so there’s a pile of dependencies that come in with it. To add to my library, I’d be a little hesitant, but an integration test thing, I’m totally good with that. There were some quirks to using it with XCTest, most of which I hacked around by shoving the tracer setup into a global actor and exposing an idempotent bootstrap call. With that in place, I added explicit traces into my tests, and then started adding more and more, including into my library, and could see the results in a locally running instance of Jaeger (running Jaeger using Docker).

Some Results

The following image is an overview of the traces generated by a single test (testCreate):

The code I’m working with is all pushing events over web sockets, so inside of the individual spans (which are async closures in my test) I’ve dropped in some span events, one of which is shown in detail below:

In a lot of respects, this is akin to dropping in os_signposts that you might view in Instruments, but it’s external to Xcode infrastructure. Don’t get me wrong, I love Instruments and what it does – it’s been amazing and really the gold standard in tooling for me for years – but I was curious how far this approach would get me.

Choices and Challenges

Using something like this in production – with live-running iOS or macOS apps – would be another great end-to-end scenario. More so if the infrastructure your app was working from also used tracing. There’s a separate tracing project at CNCF – OpenTelemetry Swift – that looks oriented towards doing just that. I seriously considered using it, but I didn’t see a way to use that package to instrument my library and not bring in the whole pile of dependencies. With the swift-distributed-tracing library, it’s an easy (and small) dependency add – and you only need to take the hit of the extra dependencies when you want to use the tracing.

And I’ll just “casually” mention that if you pair this with server-side swift efforts, the Hummingbird project has support for distributed tracing currently built in. I expect Vapor support isn’t too far off, and it’s a continued focus to add more distributed tracing support for a number of prevalent server-side swift libraries over this coming summer.

See for Yourself (under construction/YMMV/etc)

I’ve tossed up my hack-job of a wrapper for tracing during testing with iOS and macOS – DistributedTracer, if you want to experiment with this kind of thing yourself. Feel free to use it, although if you’re amazed with the results – ALL credit should go to Moritz, the contributors to his package and the contributors to swift-distributed-tracing, since they did the heavy lifting. The swift-otel library itself is undergoing some major API surface changes – so if you go looking, I worked from the current main branch rather than the latest release. Moritz shared with me that while the API was not completely solid yet, this is more of the pattern he wants to expose for an upcoming 1.0 release.

Onward from here

I might push the DistributedTracer package further in the future. I think there’s real potential there, but it is not without pitfalls. Some of the challenges stem from constantly exporting data from an iOS app, so there’s a privacy (and privacy manifest) bit that needs to be seriously considered. There are also challenges with collecting enough data (but not too much), related choices in sampling so that it aligns with traces generated from infrastructure, as well as how to reliably transfer it from device to an endpoint. Nothing that can’t be overcome, but it’s not a small amount of work either.

Weekend hacking complete, I’m calling this a successful experiment. Okay, now back to actually debugging my library…

Embedding a privacy manifest into an XCFramework

During WWDC 2023, Apple presented a number of developer-impacting privacy updates. One of the updates, introducing the concept of a privacy manifest, has a direct impact on the work I’ve been doing making the CRDT library Automerge available on Apple platforms. The two relevant sessions from WWDC 2023:

  • Get Started with Privacy Manifests (video) (notes)
  • Verify app dependencies with digital signatures (video) (notes)

During the sessions, the presenter shared that somewhere in the coming year (2024) Apple would start requiring privacy manifests in signed XCFrameworks. There was little concrete detail available then, and I’ve been waiting since for more information on how to comply. I expected documentation at least, and was hoping for an update in Xcode – specifically the xcodebuild command – to add an option that accepted a path to a manifest and included it appropriately. So far, nothing from Apple on that front.

About a week ago I decided to use a DTS ticket to get assistance on how to (properly) add privacy manifest to an XCFramework (and filed feedback: FB13626419). I hope that something is planned to make this easier, or at the minimum document a process, since it now appears to be an active requirement for new apps presented to the App Store. I highly doubt we’ll see anything between now and WWDC at this point. With any luck, we’ll see something this June (WWDC 24).

I have a hypothesis that, with the updates to enable signed binary dependencies, there could be “something coming” about a software bill-of-materials manifest. My over-active imagination thinks there are hints of that correlated with what swift is recording in Package.resolved, and seeming to start to take advantage of within the proposed new approach to swift testing. It would make a lot of sense to support better verification and clear knowledge of what you’re including in your apps, or depending on for your libraries (and extremely useful metadata for testing validation).

In the meantime, if you’re Creating an XCFramework and trying to figure out how to comply with Apple’s requests for embedded privacy manifests, hopefully this article helps you get there. As I mentioned at the top of this post, this is based on my open source work in Automerge-swift. I’m including the library and XCFramework (and show it off) in a demo application. I just finished working through the process of getting the archives validated and pushed to App Store Connect (with macOS and iOS deliverables). To be very clear, the person I worked with at DTS was both critical and super-helpful. Without this information I would have been wandering blindly for months trying to get this sorted. All credit to them for the assistance.

The gist of what needs to be done lines up with Apple’s general platform conventions for placing resources into bundles (detailed at Placing Content in a Bundle). The resource in this case is the file PrivacyInfo.xcprivacy, and the general pattern plays out as:

  • iOS and iOS simulator: place the resource at the root for that platform
  • macOS and Mac Catalyst: place the resource in a directory structure /Versions/A/Resources/

The additional quirk in this case is that with an XCFramework created from platform-specific static libraries, you also need to put that directory structure underneath the directory that is the platform signifier. (An example is shown below, illustrating this. I know it’s not super clear; I don’t either know, or have, the words to correctly describe these layers in the a directory structure.)

I do this with a bash script that copies the privacy manifest into the place relevant for each platform target. In the case of automerge-swift, we compile to support iOS, the iOS simulators (on x86 and arm architectures), macOS (on x86 and arm architectures), and Mac Catalyst (on x86 and arm architectures).

Once the files are copied into place, I code sign the bundle:

codesign --timestamp -v --sign "...my developer id..." ${FRAMEWORK_NAME}.xcframework

After which, compress it down using ditto, and compute the SHA256 checksum. That checksum is used to create a validation hash for a URL reference in a Package.swift. (If you want to see the scripts, have at – they’re on GitHub. The scripts are split at the end – one for CI that doesn’t sign, and one for release that does.)

Seeing the layout of the relevant files in an XCFramework was the most helpful piece for me to assemble this together, so let me share the directory structure of my XCFramework. The example below, called automergeFFI.xcframework, hopefully shows you the details without flooding you in extraneous files; it skips the header or code signature specific files:

automergeFFI.xcframework/
Info.plist
_CodeSignature/

macos-arm64_x86_64/Headers
macos-arm64_x86_64/libuniffi_automerge.a
Versions/A/Resources/
PrivacyInfo.xcprivacy

ios-arm64_x86_64-simulator
ios-arm64_x86_64-simulator/Headers
ios-arm64_x86_64-simulator/libuniffi_automerge.a
ios-arm64_x86_64-simulator/PrivacyInfo.xcprivacy

ios-arm64_x86_64-maccatalyst
ios-arm64_x86_64-maccatalyst/
Versions/A/Resources/
PrivacyInfo.xcprivacy
ios-arm64_x86_64-maccatalyst/Headers
ios-arm64_x86_64-maccatalyst/libuniffi_automerge.a

ios-arm64
ios-arm64/Headers
ios-arm64/libuniffi_automerge.a
ios-arm64/PrivacyInfo.xcprivacy

With this in place, signed and embedded as a normal dependency through Xcode, both the iOS demo app and the macOS demo app passed the pre-flight validation and moved on through to TestFlight.

A week on with a VisionPro

There are excellent reviews of the VisionPro “out there”, this post isn’t meant as another. It’s a record of my first experiences, thoughts, and scribbled notes for future me to look back on after a few iterations of the product.

I had been planning on getting a Vision Pro when it was first rumored. I put away funds from contracts and gigs, and when the time came and it was available for order, I still had sticker shock. When I bought one, I didn’t skimp, but I didn’t blow it out either. My goal is to learn this product – how it works and how to work with it, and to write apps that work beautifully on it. When the available-to-developers-only head-strap extension was announced, I grabbed it too. My prior experience with any headset is using an Oculus (now Meta) Quest 2, which was fun and illustrative – but I couldn’t use it more than a few hours before nausea would start to catch up with me.

Right off, the visual clarity of the Vision Pro blew me away. The displays are mind-bogglingly good, and the 3D effect is instantly crisp and clear. I found myself exploring the nooks and corners of the product that first evening, without a hint of nausea that I’d feared might happen. The two and a half hours of battery life came quickly.

Beyond the stunning visuals, I wanted to really understand and use the interaction model. From the API, I know it supports both indirect and direct interaction using hand-tracking. Most of the examples and interactions I had at the start were “indirect” – meaning that where I looked is where actions would trigger (or not) when I tapped my fingers together. It’s intuitive, easy to get started with very quickly, and (sometimes too) easy to forget it’s a control and accidentally invoke it.

In early window managers on desktop computers, there was a pattern of usage called “focus follows mouse” (which Apple pushed hard to move away from). The idea was that whichever window your mouse cursor was over is where keyboard input would be directed. The indirect interaction mode on Vision Pro is that on steroids, and it takes some getting used to. In several cases, I found myself looking away from the control while wanting to continue using it, with results that were messy – activating other buttons, etc.

Most of the apps (even iOS apps “just” running on Vision Pro) worked flawlessly and easily, and refreshingly didn’t feel as out of place as iOS designed apps feel on an iPad (looking at you Instagram). One of the most useful visual affordances is a slight sheen that the OS plays over areas that are clearly buttons or targeted controls, which makes a wonderful feedback loop so that you know you’re looking at the right control. The gaze tracking is astoundingly good – so much better than I though it would be – but it still needs some space for grace. iOS default distances mostly work, although in a densely packed field of controls I’d want just a touch more space between them myself. After wearing the device for a couple of hours, I’d find the tracking not as crisp and I’d have a bit more error. Apps that eschewed accessible buttons for random visuals and tap targets are deeply annoying in Vision Pro. You get no feedback affordances to let you know if you’re on target or not. (D&D Beyond… I’ve got to say, you’ve got some WORK to do)

Targeting actions (or not) gets even more complicated when you’re looking at touchable targets in a web browser. Video players in general are a bit of a tar pit in terms of useful controls and feedback. Youtube’s video player was better than some of the others, but web pages in general were a notable challenge – especially the ones flooded with ads, pop-overs, and shit moving around and “catching your eye”. The term becoming far more literal and relevant when you accidentally trigger an errant click after some side movement shifted my gaze, and now I’m looking at some *%&$!!# video ad that I want nothing to do with.

In a win to potential productivity for me, you can have windows everywhere. The currently-narrowish field of vision constrains it: you have move your head – instead of glance – to see some side windows. It’s a huge refresher to the “do one thing at a time” metaphor that didn’t exist on macOS, pervades iOS, and lives in some level of Dante’s inferno on iPadOS. I can see a path to being more productive with the visionOS “spatial computer” than I ever would be with an iPad. The real kicker for me (not yet explored), will be text selection – and specifically selecting a subrange of a bit of text. That use case is absolutely dreadful in Safari on iOS. For example, try and select the portion of the URL after the host name in the safari address bar. That seemingly simple task is a huge linchpin to my ability to work productively.

The weight and battery life of this first product release are definitely suboptimal. Easily survivable for me, but sometimes annoying. Given the outstanding technology that’s packed into this device, it’s not surprising. The headset sometimes feels like it’s slipping down my face, or I need to lift and reset it a bit to make it comfortable. For wearing the device over an hour or so while sitting upright, I definitely prefer to use the over-the-head strap – and I don’t give a shit what my hair looks like.

Speaking of caring what I look like – I despise the “persona” feature and won’t be using it. It’s straight into the gaping canyon of uncanny valley. I went through the process to set one up and took a look at it. I tried to be dispassionate about it, but ultimately fled in horror and don’t want a damn thing to do with it. I don’t even want to deal with FaceTime if that’s the only option. I’d far prefer to use one of those stylized Memoji, or be able to provide my own 3D animation puppet that was mapped to my facial expressions. I can make a more meaningful connection to a stylized image or puppet than I can to the necrotic apparition of the current Persona.

And a weird quirk: I have a very mobile and expressive face, and can raise and lower either eyebrow easily. I use that a lot in my facial expressions. The FaceTime facial expression tracking can’t clue in to that – it’s either both or not at all. While I’m impressed it can read anything about my eyebrows while wearing the Vision Pro, that’s a deal killer for representing my facial expressions.

Jumping back to something more positive – in terms of consuming media, the Vision Pro is a killer device right where it is now. The whole space of viewing and watching photos and video is amazing. The panoramas I’ve collected while traveling are everything I hoped for. The immersive 180° videos made me want to learn how to make some of those, and the stereoscopic images and video (smaller field of view, but same gist) are wonderful. It’s a potent upgrade to the clicking wheels of the 3D viewFinder from my childhood. Just watching a movie was amazing – either small and convenient to the side, or huge in the field of view – at my control – with with a truly impressive “immersive theater” mode that’s really effective. It’s definitely a solo experience in that respect – I can’t share watching a movie cuddled up on the couch, but even with the high price point – the video (and audio) quality of Vision Pro makes a massive theater out of the tightest cubby. In that respect, the current Vision Pro is a very comparable value to a large home theater.

Add on the environments (I’m digging Mt Hood a lot) – with slightly variable weather and environmental acoustics, day and night transitions – it’s a tremendous break. I’d love to author a few of those. A sort of crazy, dynamic stage/set design problem with a mix of lighting, sounds, supportive visual effects, and the high definition photography to backdrop it all. I was familiar with the concept from the Quest, but the production quality in the Vision Pro is miles ahead, so much more inviting because of that.

I looked at my M1 MacBook Pro and tapped on the connect button and instantly loved it. The screen on the laptop blanked out, replaced by a much larger, high resolution floating display above it. I need to transition my workspace to really work this angle, as its a bit tight for a Vision Pro. Where I work currently, there are overhead pieces nearby that impinge on the upper visual space, prompting warnings and visual intrusions when I’m looking around to keep me from hitting anything. Using the trackpad on the Mac as a pointer within Vision Pro is effective, and the keyboard is amazing. Without a laptop nearby, I’d need (or want) at least a keyboard connected – the pop-up keyboard can get the job done (using either direct or indirect interaction), but it’s horrible for anything beyond a few words.

I have a PS5 controller that I paired with my iPad for playing games, and later paired with the Mac to navigate in the Vision Pro simulator in Xcode. I haven’t paired it with the Vision Pro, but that’s something I’d really like to try – especially for a game. For the “immerse you in an amazing world” games that I enjoy, I can imagine the result. With the impressive results of the immersive environments, there’s a “something” there that I’d like to see. Something from Rockstar, Ubisoft, Hello World Games, of the Sony or Microsoft studios. No idea if that’ll appear as something streamed from a console, or running locally – but the possibilities are huge by leveraging the high visual production values that Vision Pro provides. I’m especially curious what Disney and Epic Games might do together – an expansion or side-track from their virtual sets, creating environments and scenes that physically couldn’t otherwise exist – and then interacting within them. I’m sure they’re thinking about the same. (Hey, No Man’s Sky – I’m ready over here!)

As a wrap up, my head’s been flooded with ideas for apps that lean into the capabilities of Vision Pro. Most are of the “wouldn’t it be cool!” variety, a few are insanely outlandish and would take a huge team of both artists and developers to assemble. Of the ones that aren’t so completely insane, the common theme is the visualization and presentation of information. A large part of my earlier career was more operationally focused: understanding large, distributed systems, managing services running on them, debugging things when “shit went wrong” (such as a DC bus bar in a data center exploding when a water leak dripped on it and shorted it out, scattering copper droplets everywhere). I believe there’s a real potential benefit to seeing information with another dimension added to it, especially when you want to look at what would classically be exposed as a chart, but with values that change over time. There’s a whole crazy world of software debugging and performance analysis, distributed tracing, and correlation with logging and metrics. All of which benefit from making it easier to quickly identify failures and resolve them.

I really want to push what’s available now in a volume 3D view. That’s the most heavily constrained 3D representation in visionOS today, primarily to keep anyone from knowing where you’re gazing as a matter of privacy. Rendering and updating 3D visualizations in a volume lets you “place” it anywhere nearby, change your position around it, and ideally interact with it to explore the information. I think that’s my first real target to explore.

I am curious where the overlap will appear with webGL and how that presents into the visionOS spatial repertoire. I haven’t yet explored that avenue, but it’s intriguing, especially for the data visualization use case.

Unicode strings are always harder than you think

I recently released an update to the Swift language bindings to Automerge (0.5.7), which has a couple of great updates. My favorite part of that update was work to enable WebAssembly compilation support, mostly because I learned an incredible amount about swift-wasm and fixed a few misconceptions that I’d held for unfortunately too long. The other big thing was a fix to how I’d overlaid on Automerge text – fixing an unfortunately longer-standing issue than I realized, in how it deals with Unicode strings.

To that note, let me introduce you to my new best friend for testing this sort of thing:

🇬🇧👨‍👨‍👧‍👦😀

This little string is a gem, in that the character glyphs are varying lengths of composed unicode scalars: 2, 7, and 1 – respectively. The real issue is that I mistook Automerge’s integer indexing. I originally thought it was UTF8 characters, when in fact it’s Unicode scalars – and there’s a BIG difference in results when you start trying to delete pieces of a string thinking one is the other.

When the core library updated to 0.5.6, one of the pieces added was an updateText() method, which took an updated value, computes the differences to the strings, and applies the relevant changes for you – all of which you previously had to compute yourself. I’d been using CollectionDifference in Swift, a lovely part of the built-in standard library features, to compute the diffs – but as soon as you hit some of those composed unicode scalar characters, it all fell apart. The worst part was that I thought it was working – I even had tests with emoji, I just didn’t know enough (or dig far enough) to pick the right ones to verify things earlier.

Fortunately, Miguel (an iOS developer in Spain) pointed out the mistake in a pull request that added tests illustrating the problem, including that lovely little string combination above. Thanks to Miguel’s patience – and pull request – the bugs were highlighted, and I’m very pleased to have them fixed, released, and best of all – a better understanding of unicode scalars and Swift strings.

When you’re working in Swift alone, it’s never been a topic I needed to really know – as the APIs do a beautiful job of hiding the details. While it’s sometimes a real pain to manage String.Index and distances between them for selections, the “String as a sequence of Characters” has really served me well. It’s an issue when you’re jumping into foreign libraries that don’t have full-unicode string support built in from the ground up, so I’m glad I’ve learned the detail.

Thanks Miguel!

Questions about the data to create LLMs for embeddings

Simon Willison has a fantastic article about using LLM embeddings in his October blog post: Embeddings: What they are and why they matter. The article is great, a perfect introduction, but I’ve been struggling to find the next steps. I’ve been aware of embeddings for a while, and there’s a specific use case I have: full-text, multi-lingual search. Most Full-text search (FTS) algorithms embedded today are highly language specific, and pre-date the rise of power of massive data collections slammed together into these transformer large-language models.

There’s a few out there, a number easily available, and from my ad-hoc experiments – they’re darned good and could likely fit the bill. Except… for so many of them, there’s two key problems I haven’t been able to sort. The first is – where’s the data come from that trained the model. This is the biggest problem child for me – not because the results aren’t amazing and effective, but because I couldn’t put together even a rudimentary open-source project without having some self-confidence about the providence of how everything came together. Last thing I want is for an open-source project to run deeply afoul of copyright claims. From what I’ve found in my research so far, this problem is endemic to LLMs – with OpenAI and others carefully keeping “their data” close to their chest, both because the size is outrageous to catalog and they want to have their proprietary secrets. Well, and there’s a ton of pending lawsuits and legal arguments about whether training an LLM can be considered fair-use of even clearly copyrighted content.

The second problem is size of the model and performance. I can subjectively tell that smaller models perform “less effectively” than larger models, but I’ve yet to come up with any reasonable way to quantify that loss. It’s hard enough to quantify search relevance and rankings – it’s SO subjective that it effectively becomes data intensive to get a reasonable statistical sample for evaluation. Adding in the variable of different sizes of LLMs for to use with model embeddings for search just add to it.

With the monster models hosted by OpenAI, I kind of suspect that data management for training, and updating, those models will be key going forward. It’s clear enough they’re being trained off “content on the Internet” – but more and more content is now being generated by LLMS – both images and text. The _very_ last thing you’d want to do it train an LLM on another LLM’s generated (hallucinated) content. It would, I suspect, seriously dilute the encoded knowledge and data. Seems like carefully curated data-sets are the key for going forward.

If anyone reading my blog is aware of a “clean data sourced LLM”, even just based on the English language, I’d love to know about it. Ideally I’d find something that was multi-lingual, but I think that data collection and curation would be as much (or more) work than any consumption of the data itself. Something that required the resources of an academic institution or corporation, rather than what an individual could pull together. Or at least it feels pretty damn overwhelming to me.

Automerge for Swift

I’ve been interested in the idea of CRDTs, and the use cases they enable, for a number of years. The core ideas are pretty straight forward to understand and implement, but when you start applying them into something like collaborative text editing, or more efficient size encoding, there’s a lot of additional complexity. There are existing, popular CRDT libraries (Yjs, Automerge, and others) that have worked out algorithms and at least some of the performance tradeoffs for these more complex scenarios. So while I have the swift-native library basics of CRDTs available as a swift package, I’ve been interested in seeing those more popular libraries become equally available on macOS and iOS platforms.

Over the summer, I started working with the Automerge team to bring its Rust-language core to Swift. While it’s evolving, and hasn’t hit what I’d consider a 1.0 milestone, the functionality is sufficiently complete to use effectively in iOS or macOS apps to enable interactive collaboration, or local-first documents with seamless offline sync. So today I introduced Automerge on the Swift Forums, highlighting the latest 0.5.2 release of Automerge-swift. Along with this release, and its API documentation, I’ve also published an open-source, cross-platform (macOS and iOS) document-based SwiftUI app that showcases using Automerge: MeetingNotes.

MeetingNotes has its own app-level documentation providing a walk-through of how it uses Automerge alongside SwiftUI to create iOS and macOS apps that can interactively collaborate with editing.

There’s still quite a bit of future development planned for this project, improvements in the core library as well as more effort to make seamless synchronization between various languages and platforms easier to implement. While everything is open-source, If you’re interested in more commercial-style support of this library for you apps, the research lab Ink and Switch can help provide some of that support, and funding further development of Automerge and related projects with that support. (Full disclosure: some of this summer’s work on the Automerge-swift library was funded by Ink and Switch)

SwiftUI Field Notes: Document instance lifetimes are shorter than they appear

I have been working on a demonstration app using a SwiftUI document-based lifecycle. I learned, in that hard way that hopefully sticks with you, that instances of ReferenceFileDocument or FileDocument don’t last as long as you might think.

This has been “a thing” for a long while, and may be well known to anyone who’s done UIDocument-based apps with UIKit, but it sure as hell caught me by surprise. From what I’ve learned, if the system (not sure where or what is arranging this) sees an updated file that you have open (using either ReferenceFileDocument or FileDocument), the SwiftUI app can initialize a new instance of it, with data loaded from that updated file, and slip it into place, removing the existing instance.

As far as I can tell, the old instance doesn’t (immediately) get deallocated (although that could easily be my own fault through some memory reference loop), and there’s no hooks from the application that let you know “Hey, this was ripped out and replaced”, or even “This instance is terminal, clean up your crap!”

In my case, I was hanging a network coordinator class that was doing peer to peer (bonjour/zero-conf) browsing, listening, and managing connection for that document instance. In my testing, the networking was all working beautifully, and then it would “just disappear” as though it had never existed. The rest of the document content looked exactly as you’d expect. EXCEPT – here’s the good bit – if that previous instance had been listening – then I’d see that listener when I re-activated the networking. It continued to exist in the background until the termination of the app. Tracking down that surprising additional listener is what led me to realize the instance was getting replaced within the app, and in turn this PSA to future me (and anyone else building Document-based apps).

What I need to do is re-architect the demonstration app to treat the document type as ephemeral, not stable for the duration of the document. I should probably move the network browser and listener setup to the App level, and deal with the idea that documents can come and go, and that the system manages that – to at least some extent. Hell of an undocumented surprise though (yes, I filed it: FB12821731).

So if you’re working on a Document-based app, I highly recommend you treat the instance of the files as ephemeral. It might save you some hair-pulling and debugging.

Getting your custom file type recognized by iOS and macOS

When you’re making an app for iOS or macOS, there’s a file – Info.plist – that provides key information between your app and the operating system that it’s running on. (If you’re unfamiliar with it, Info.plist is a property list – a mix of dictionaries & arrays with values that stretches way back in Apple development history. It’s roughly akin to a JSON or YAML file, but an older format, and formally cobbled in XML.) Given that it goes way back, what’s expected in that file and the format of how it’s structured, is quirky and organic. But the file (and what’s in it) is critical – when you need to provide details to operating system features about your app, they go into the Info.plist file. Examples include permission strings, expecting services, types supported.

In large part – it is a declaration to the operating system of how to work with your app – or what to allow it (or expect it) to do. When you make a Document based app, you define the type(s) it can open, such as like public.png. You can also define a custom type that represents your App’s file format – for example your App’s specific model, using Codable to serialized it out into a JSON based file. For more detailed into (in video format), watch Uniform Type Identifiers – a reintroduction.

When I was recently making a demo app, I ran into the issue where the app could read and write files without any trouble – but when I double-clicked on the document icon on macOS (or tapped it to open when browsing Files in iOS), nothing would happen. Likewise, Airdrop a file would move it – but not open the app associated with it. It was driving me nuts, but there were other parts of the code I wanted to get working, so I stuck it on the back burner to solve later.

Yesterday, I finally dug up the answer. There’s an Info.plist key that has to be set just so: NSUbiquitousDocumentUserActivityType. This is a key that Xcode tends to create automatically for you under “Document types”, and its default value is listed as ${bundle identifier}.something. I don’t recall if I manually set that “something” to my App document type’s file extension, or if it did that for me, but here’s the catch:

If you want the operating system to associate your app with this file type, it’s critical that the value of this type be exactly the same value as the identifier of the document type. In my demo app case, that identifier for the file type is com.github.automerge.meetingnotes, using a file extension of meetingnotes. My app’s bundle identifier is com.github.automerge.meetingnotes. And because the key listed the bundle identifier and then the extension, it was appending it together, making the value com.github.automerge.meetingnotes.meetingnotes. No Bueno: because they didn’t match, the operating systems on iOS and macOS didn’t act as I thought they should.

Once I set it manually to the same value as the type identifier, things just “started working”.

There’s a lot more to type identifiers (which has perhaps the worst acronym ever in the various Apple frameworks – Uniform Type Identifiers), and the wikipedia page is pretty useful as well, with some hard-to-know, but useful common identifiers for types. Apple does a reasonable job of keeping an archive of all the types they publicly support, but there’s a lot more they – or others – support that don’t get much of a mention. And it’s a key link across interacting between apps like drag and drop, copy/paste, airdrop, and the more recent Transferrable.

If you can possibly get away with it, don’t edit the Info.plist file directly. There’s an expected – but mostly undocumented – structure to it, and no linter to let you know when you put something in the wrong place, or made a typo in a dictionary key. In Xcode, there’s an Info panel associated with the target that makes you App – and the Xcode development team has done a pretty good job of making that a much easier path to set those values. It doesn’t stop you from making typos (or in my case – bad assumptions) for the values – but at least it manages the structures and names of the keys for you.

So future me – when you’re trying to figure out why the OS isn’t respecting that awesome new type, check the Info.plist values. A surprising amount needs to “line up there”, and there’s nothing but your noggin’ the validate the results.

What you need to know learning to use Blender on a Mac laptop

My nephew was visiting last week, and we started up an online tutorial course: The Ultimate Blender Low Poly Course 2 (kind of click-bait title, but it’s been a decent course so far). For me, it was an excuse to finally get past a long deferred desire of “I want to learn how to use Blender” This nephew is into 3D modeling and animation, so I thought he might enjoy it too. In hindsight, we didn’t get very far through the course – and he knew 99% of what the course was covering, but I learned a hell of a lot. I opted to do it all from my M1 MacBook Pro, which came with its own set of challenges. Sharing how to work around those limits is why this post exists.

If you’re thinking about learning Blender, maybe to make 3D content to incorporate into apps for visionOS, I’ve got a few suggestions. These are things to pay attention to that none of the tutorials or YouTube videos really covered.

First thing, while you’re likely going to find more convenient shortcuts with an extended keyboard (with a number pad) and a 3-button mouse. It is nowhere near as critical as most of the tutorial instructors, Reddit comments, and YouTube videos make it out to be. That said, the user interface of Blender is very oriented around expecting that you have a mouse and keyboard setup – so if you’re thinking of doing “serious work” with Blender, it might be best learn with that setup up front. If you’re working from a stock Apple laptop and just want to do something in Blender, it’s very doable and you’re not going to be crippled.

Focus Follows Mouse

Where your mouse cursor resides on the screen is HUGELY important in blender! When you invoke commands in Blender, using the mouse buttons (in our case, multi-touch gestures) or using a keyboard shortcut, where the mouse cursor resides within the blender app window makes a huge difference. The position of the mouse/trackpad pointer affects what commands are available and how they work. This user-interface pattern was more frequently seen back in the early days of graphical unix workstations, and I’ve known it by the phrase “focus follows mouse”. This isn’t, however, how most Mac apps (or iOS apps) work. Even knowing it is relevant, this was one of the trickiest “muscle memory” habits to overcome.

A few examples to illustrate what I mean:

To move a 3D mesh (or object) within the 3D Viewport on Blender, the natural instinct for most people using a Mac is to click and drag it – which doesn’t work. Instead, you need to click once on the object to select it, and then use the keyboard shortcut g to “grab” the object. Then, as you move the mouse, the object will move with it, and you click once more (when you’re done moving it) to lock the object into the new location. However, if your mouse happens to be over a different part of the Blender app than the 3D viewpoint (or outside the Blender app’s window entirely), the g command to grab the object in order to move it never engages. A huge amount of interaction with Blender is enabled through keyboard shortcuts, and all of them are effected by where the pointer resides on the screen.

The other place this makes a difference that experienced Blender users seem to know implicitly (and which I’m slowly learning) are adjustment commands. There’s an origin point associated with objects, and the distance from this origin point on the screen to where your pointer resides is that starting point for making adjustments. An example is scaling an object. With the object selected (click once), tap the s key to enable scale, and then move the pointer around with the trackpad. The distance between where your pointer started and the origin point of the mesh dictates how fast something scales, which can lead to some very surprising results. I’ve made the mistake more than once of having the mouse very close to that origin point, and suddenly a small shift in the mouse is a massive scaling increase.

Navigating 3D viewports with a trackpad on macOS

The fundamentals of moving around in 3D space are critical for doing modeling and/or animation in Blender. Fortunately, if you’re working with “just a trackpad”, there’s a number of default multi-touch gestures that get the job done. I believe it’s on by default, but just to be safe – check the preferences in Blender to make sure multi-touch gestures are enabled.

choose menu Edit > Preferences … (or use the command-, key combination)
MovementTrackpad Gesture
Pan (sweep the view horizontally and vertically, but not closer or fartherHold down shift while doing a two-finger drag on the trackpad.
Zoom (moving the view closer or farther)Hold down either the control or command key while doing a two-finger drag on the trackpad.
Rotate (circling around a point in space)Two-finger drag on the trackpad.
Emulate a right-click (often used to invoke a context menu)Tap the trackpad with two fingers.
Trackpad Movement Gestures

If you get all out of whack and the trackpad doesn’t appear to be responding, a good fail-safe is to use the 3D Viewport’s menu View > Frame Selected, which resets the view to frame whatever object you have selected, and resets any zooming, panning, etc effects that you made with the trackpad, letting you “start over” from a new position.

Widget Navigation

The 3D viewport has a “widget controller” than you can also use to change the angle of the view.

The “Widget” control

Most of the time, viewing the 3D viewport (as in the example above) is in perspective mode. By grabbing this widget and dragging it around, you can change the angle – similar to rotation – for viewing your scene.

An example of the perspective view

I find it just as easy to use the multi-touch gestures, but the widget also has another purpose that’s super useful (especially when you don’t have a number keypad) – you can select an orthographic view. If you click once on X, Y or Z, the widget will jump to a flat (“orthographic”) straight-on view from that angle. For example, if you tap Y, the view jumps to a front-on view. Tap that same Y key again, and it jumps to a back-on view, but again orthographic.

An example of the orthographic view

I’ve seen a number of instances in tutorials where the instructor will say “hit 7 on your keypad” (which is the shortcut for the top-down orthographic view). When you’re working from a standard laptop keyboard, the number 7 on the top of the keyboard doesn’t cut it – so tapping on the Z on the widget instead is the fastest way I’ve found to get there.

The colors on the widget are also super relevant. They’re used consistently in blender to indicate specific axis:

  • Z – blue – vertical up and down
  • Y – green – front to back
  • X – red – side to side

This isn’t the same coordinate system as most Mac APIs (RealityKit and SceneKit switch Y and Z), but when you’re modeling or moving objects and use a constraint modifier, Blender provides some visual cues to let you know which axis are being allowed.

As an example, selecting the default box turns that box orange on the outside edges, letting you know it’s selected.

With it selected (and your mouse/trackpad pointer in the 3D viewport!), tape the s key to scale the object. The object gets a white outline, and you can see “scale” is listed in the upper-left corner of the viewport, as well as the current scale values.

If you then press shiftz, that activates a constraint on the scaling: any axis other than Z – so it’s allowing scaling along the X or Y axis. That shows up as red and green axis lines from the object’s origin.

Now as you move the mouse/trackpad pointer, you can see the box scale, and the upper-left shows the constraints enabled.

When you’re done scaling, click on the trackpad, and it locks in that scale value.

Reconfiguring the Blender App Window

The interior of the window in the Blender app is very configurable. So much so, that you can accidentally do it and (in my case) be completely lost as to what just happened. So best to know about it up front.

The default Blender app window on startup has 4 active panels within it.

The upper left is a 3D viewport, the panel directly beneath it is a Timeline view, and right side is split up into two panels – the upper is an Outliner, and below it a Properties panel. If you move your cursor over the sides of one of the panels, the cursor will change to show you can you can move that edge.

A panel resize cursor

You’ll probably be pretty familiar with that mechanic – its effectively the same as in a number of Mac apps.

However, if you move the cursor to the intersection of one of those panels – either interior to the window, or along the outside edges, you’ll see a new cross-hair cursor.

A panel move cursor

I don’t know what this cursor is formally called, but I’m calling it “panel move cursor” – and it has different effects depending on where the intersection is that you selected.

If you selected an interior corner – as in the screenshot example above – then clicking and dragging will change the cursor into a chevron pointing in a direction. That chevron is indicating the panel that will be expanded to cover the region in the direction the chevron is pointing, wiping out the existing panel.

If you selected a corner on the outside of the window and drag inward, it creates a new panel (I think it defaults to always creating a 3D viewport panel), pushing any existing panels around to make space.

Unfortunately, once you’ve done any of this moving, there isn’t an “undo” that you can invoke to reset the panels to their earlier configuration. It’d be nice if there were undo support here. However, its probably more important to simply know what happened, and you can work your way back to whatever configuration you like.

So much more…

There’s so much more, but I’m only at the start of my own process of learning. Editing meshes, rigging them, doing basic animation. And that’s only a small part of what Blender is capable of doing. I’m so impressed with the Blender Foundation and Studio, pushing the tooling further and making it available at the same time.

All the above is just the start of a journey of learning and exploration.