Atomics in Objective-C (opens in new tab)

(biasedbit.com)

41 pointsovokinder11y ago20 comments

20 comments

18 comments · 5 top-level

asveikau11y ago· 7 in thread

> This post talks about the use of OS low level atomic functions

This is a pet peeve of mine, to call that an "OS" feature. In all recent CPUs I know of, atomic ops are not a privileged operation, and there is absolutely nothing for the operating system to manage in a traditional sense. You don't trap into the kernel and have it compare-and-swap, you just, um, compare and swap.

Maybe your OS provides a convenient C API, but it is not "OS" functionality. It's just instructions on your CPU. You could just as well write them inline. In many common uses, that's what ends up happening - the atomic ops are put inline with the rest of your code.

mikeash11y ago

If your use of atomics consists entirely of calling functions provided in a library as part of the OS, what's wrong with calling them "OS atomic functions"? That is what they are. The fact that you can accomplish the "atomic" part without the "OS functions" part doesn't change the fact that, as written, the article discusses the "OS functions" part.

asveikau11y ago

Well, it looks like Apple named this particular API "OS atomics", a name which makes me cringe a bit.

But, a few points:

1. There is historically a distinction between "operating system" and "shared library shipping with the operating system". I think I am losing this battle though.

2. On a number of platforms (not sure if Apple's "OS atomics" concept counts here), the atomic wrappers are not even functions in a shared library. They may be declared inline in a header file. Or they may be compiler intrinsics, where the compiler doesn't generate any function call in any circumstances. Is that an "OS function"? Not really.

In either case #1 or case #2, I think "OS atomics" is a dumb name. We are really talking about CPU features, not OS features. If it doesn't generate a trap into kernel mode that doesn't very much sound like the OS doing the work.

Calling it "the OS" sounds more like a fundamental misunderstanding of dynamic linking and what it is. I hear so many variants of this core misunderstanding all over the place. Thinking that spinlocks need kernel help is one such manifestation.

1 more reply

ovokinderOP11y ago

Fair. I just didn't know how to rephrase that small sentence without unfolding into the two paragraphs you just wrote.

How would you rephrase that? Just "low level atomic", "atomic"?

stcredzero11y ago

If atomic operations were compatible across processor "families" then you'd have "the family atomics." (Obscure Dune reference.)

1 more reply

pdpi11y ago

hardware-provided atomic?

jevinskie11y ago

At least for ARM on Linux, the kernel does provide cmpxchg in the VDSO, but I think that is for support on older ARM architectures. IIRC, the ARMv7 does not need to use the kernel helpers.

https://www.kernel.org/doc/Documentation/arm/kernel_user_hel...

asveikau11y ago

Yes, pre-ARMv6 does not have the load-link/store-conditional instructions.

The Linux kernel hack for that is actually kind of awesome. Notably, it's not a syscall and you don't enter the kernel to do them. Since pre-ARMv6 is always single core, it simply becomes a matter of detecting you are in the middle of an atomic op at interrupt time, and patching the result. This means the atomic op has to happen at a well-known (kernel-provided) address. Details here: http://lwn.net/Articles/314561/

I'm also aware of some older systems that needed kernel intervention for atomic ops. But on x86, ARMv6+, even no longer relevant arches like SPARC, POWER, ... this is not the case. It really is rare that the kernel needs to do this job these days.

Edit: ARMv6, not ARMv7, per the link I provided...

richardwhiuk11y ago· 3 in thread

This is all claimed to do this for 'performance' but there's no figures in this document as to whether the incrementAndGet / decrementAndGet is any faster than @synchronize.

(I suspect it probably is, but fundamentally, @synchronize is implemented using compare and swap / other processor atomics, so it's probable that the difference is very slight - e.g. there's only a measurable difference if the thread is descheduled while holding a lock).

ovokinderOP11y ago

The goal of the article isn't about sheer performance — there are plenty of notes about it. If it was about pure performance, it'd be recommending moving away from objc classes and methods and using C functions or C++ classes instead, like std::atomic<>.

It's meant to be a somewhat-easy-to-digest introduction to lock-free design, where applicable.

What @synchronized ends up doing is far more complex — it has to be, to ensure the correctness of its purposes: https://github.com/opensource-apple/objc4/blob/master/runtim...

ksherlock11y ago

@synchronized calls objc_sync_enter and objc_sync_exit. Source code is available[0]. Best case, the thread already has the object locked and only needs to increment a lock count. Worst case, it spin locks, searches through a linked list of existing locks, then needs to malloc a new entry and create a new mutex for it.

0: http://www.opensource.apple.com/source/objc4/objc4-646/runti...

azinman211y ago

It's actually more than you'd think -- @synchronized has to deal with re-entrant locks, try/catches that also release locks accordingly while bubbling exceptions, etc. There's a lot more to it than compare and swap.

liuliu11y ago· 2 in thread

Or just use std::atomic and other std::mutex in Objective-C++. Under Objective-C world, no memory semantics are well-defined, and all these are hacks on pile of other hacks.

azinman211y ago

Can you explain more about hacks piled on other hacks?

richardwhiuk11y ago

@synchronization defines a memory barrier as I understand.

lyinsteve11y ago· 1 in thread

Why no mention of GCD here? GCD is very, very good at synchronizing access to shared resources.

The most Cocoa-compatible way of handling background execution of expensive procedures is always going to be best executed, quickest, using Grand Central Dispatch.

For example:

    @interface Foo ()
    @property (nonatomic) dispatch_queue_t backgroundQueue;
    @end
    
    @implementation Foo

    - (BOOL) veryExpensiveMethod:(id)arg completion:(void (^)())completion {
        dispatch_async(self.backgroundQueue, ^{
            if (_counter++ > N) {
                _counter--;
                return NO;
            }
            // Critical section
            _counter--;
            return YES;
            dispatch_async(dispatch_get_main_queue(), completion);
        };

That will ensure every call to -veryExpensiveMethod is run in sequence, and won't require waiting on your end.

These problems have been solved, better.

ovokinderOP11y ago

You're missing the most important point of the entire Throttler goal: gracefully returning fast, with success or failure. Nowhere is it stated that the goal is to enqueue tasks for execution.

If you had read 'til the end you would have found multiple statements that OSAtomic* is merely an alternative. Not a silver bullet. Not the fastest.

From the conclusion:

"It's very important to understand that every example in this article could have legitimately been solved with different concurrency primitives — like semaphores and locks — without any noticeable impact to a human playing around with your app."

Also, "(...) is always going to be best executed, quickest, using GCD." is kind of a blanket statement. I'd be careful around the use of "always".

fleitz11y ago

This is called a semaphore, it's already implemented in GCD.

Also, why talk about performance and then make obj-c method calls...?

It's quite easy using NSProxy to create a throttler that will wrap any object, then you can abstract throttling from the behavior of the underlying object.

  @interface Throttler : NSProxy {
     dispatch_semaphore_t _semaphore;
     id _object;
  }
  - initWithObject:(id)obj concurrentOperations:(int)ops;

  @end
  @implementation Throttler

  - (id) initWithObject:(id)obj concurrentOperations:(int)ops {
     if(self = [super init]){
       _semaphore = dispatch_semaphore_create(ops)
       _object = obj;
     }
     return self;
  }

  - (void) forwardInvocation:(NSInvocation*)invocation {
     if(dispatch_semaphore_wait(_semaphore,0)){
        @try {
        [invocation setTarget: _object];
        [invocation invoke];

        }
        @catch (NSException* e){
          @throw e;
        }
        @finally {
        dispatch_semaphore_signal(_semaphore);
        }
        return;
     }
     @throw [NSException
          exceptionWithName:@"InsufficientResourceException"
          reason:@"Insufficient Resource"
          userInfo:nil];
  }

  @end

https://developer.apple.com/library/mac/documentation/Genera...

j / k navigate · click thread line to collapse

20 comments

18 comments · 5 top-level

asveikau11y ago· 7 in thread

> This post talks about the use of OS low level atomic functions

mikeash11y ago

asveikau11y ago

Well, it looks like Apple named this particular API "OS atomics", a name which makes me cringe a bit.

But, a few points:

1. There is historically a distinction between "operating system" and "shared library shipping with the operating system". I think I am losing this battle though.

1 more reply

ovokinderOP11y ago

Fair. I just didn't know how to rephrase that small sentence without unfolding into the two paragraphs you just wrote.

How would you rephrase that? Just "low level atomic", "atomic"?

stcredzero11y ago

If atomic operations were compatible across processor "families" then you'd have "the family atomics." (Obscure Dune reference.)

1 more reply

pdpi11y ago

hardware-provided atomic?

jevinskie11y ago

At least for ARM on Linux, the kernel does provide cmpxchg in the VDSO, but I think that is for support on older ARM architectures. IIRC, the ARMv7 does not need to use the kernel helpers.

https://www.kernel.org/doc/Documentation/arm/kernel_user_hel...

asveikau11y ago

Yes, pre-ARMv6 does not have the load-link/store-conditional instructions.

Edit: ARMv6, not ARMv7, per the link I provided...

richardwhiuk11y ago· 3 in thread

This is all claimed to do this for 'performance' but there's no figures in this document as to whether the incrementAndGet / decrementAndGet is any faster than @synchronize.

ovokinderOP11y ago

It's meant to be a somewhat-easy-to-digest introduction to lock-free design, where applicable.

What @synchronized ends up doing is far more complex — it has to be, to ensure the correctness of its purposes: https://github.com/opensource-apple/objc4/blob/master/runtim...

ksherlock11y ago

0: http://www.opensource.apple.com/source/objc4/objc4-646/runti...

azinman211y ago

liuliu11y ago· 2 in thread

Or just use std::atomic and other std::mutex in Objective-C++. Under Objective-C world, no memory semantics are well-defined, and all these are hacks on pile of other hacks.

azinman211y ago

Can you explain more about hacks piled on other hacks?

richardwhiuk11y ago

@synchronization defines a memory barrier as I understand.

lyinsteve11y ago· 1 in thread

Why no mention of GCD here? GCD is very, very good at synchronizing access to shared resources.

The most Cocoa-compatible way of handling background execution of expensive procedures is always going to be best executed, quickest, using Grand Central Dispatch.

For example:

    @interface Foo ()
    @property (nonatomic) dispatch_queue_t backgroundQueue;
    @end
    
    @implementation Foo

    - (BOOL) veryExpensiveMethod:(id)arg completion:(void (^)())completion {
        dispatch_async(self.backgroundQueue, ^{
            if (_counter++ > N) {
                _counter--;
                return NO;
            }
            // Critical section
            _counter--;
            return YES;
            dispatch_async(dispatch_get_main_queue(), completion);
        };

That will ensure every call to -veryExpensiveMethod is run in sequence, and won't require waiting on your end.

These problems have been solved, better.

ovokinderOP11y ago

You're missing the most important point of the entire Throttler goal: gracefully returning fast, with success or failure. Nowhere is it stated that the goal is to enqueue tasks for execution.

If you had read 'til the end you would have found multiple statements that OSAtomic* is merely an alternative. Not a silver bullet. Not the fastest.

From the conclusion:

Also, "(...) is always going to be best executed, quickest, using GCD." is kind of a blanket statement. I'd be careful around the use of "always".

fleitz11y ago

This is called a semaphore, it's already implemented in GCD.

Also, why talk about performance and then make obj-c method calls...?

It's quite easy using NSProxy to create a throttler that will wrap any object, then you can abstract throttling from the behavior of the underlying object.

  @interface Throttler : NSProxy {
     dispatch_semaphore_t _semaphore;
     id _object;
  }
  - initWithObject:(id)obj concurrentOperations:(int)ops;

  @end
  @implementation Throttler

  - (id) initWithObject:(id)obj concurrentOperations:(int)ops {
     if(self = [super init]){
       _semaphore = dispatch_semaphore_create(ops)
       _object = obj;
     }
     return self;
  }

  - (void) forwardInvocation:(NSInvocation*)invocation {
     if(dispatch_semaphore_wait(_semaphore,0)){
        @try {
        [invocation setTarget: _object];
        [invocation invoke];

        }
        @catch (NSException* e){
          @throw e;
        }
        @finally {
        dispatch_semaphore_signal(_semaphore);
        }
        return;
     }
     @throw [NSException
          exceptionWithName:@"InsufficientResourceException"
          reason:@"Insufficient Resource"
          userInfo:nil];
  }

  @end

https://developer.apple.com/library/mac/documentation/Genera...

j / k navigate · click thread line to collapse