forked from Mirror/GodMode9
fetching the thread id requires coprocessor access which means doing funky switches between thumb and arm - it's faster to just allocate a single pointer and do an indirect load when necessary
fetching the thread id requires coprocessor access which means doing funky switches between thumb and arm - it's faster to just allocate a single pointer and do an indirect load when necessary