Project Ne10: https://github.com/projectNe10/Ne10
是由ARM官方人员创建并维护的一套基于neon SIMD指令集的优化函数库,可以用于提高多媒体,信号处理等计算的速度(类似于Intel的MMX和AMD的3D NOW!)。其实这个也是由于ARM意识到了现在很少会有iOS或Android等这些热门平台开发人员会去使用汇编优化的问题,才建立了这个开源项目。想想当初上学时学数字图像处理做算法,发现用MMX可以提高算法速度,然后吭哧了半天MMX的各种寄存器,指令集,然后很欢喜的看到提高了几十ms的速度后,那个欢乐啊,现在的程序猿真是幸福!
git clone了项目在Mac上用Xcode Version 5.0 (5A1413)试验一下,记录一下在我的iPhone 4S上的试验结果:
———- test_abs_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4348 4657 -7.11% 0.93:1
2 5520 3222 0.00% 1.71:1
3 10289 6343 0.00% 1.62:1
4 14372 7611 0.00% 1.89:1
———- test_abs_case0 end
———- test_addc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 5020 1580 68.53% 3.18:1
2 7748 3684 52.45% 2.10:1
3 15100 4752 68.53% 3.18:1
4 19118 7298 61.83% 2.62:1
———- test_addc_case0 end
———- test_add_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4616 2309 49.98% 2.00:1
2 8028 4675 41.77% 1.72:1
3 15070 8562 43.19% 1.76:1
4 20951 9608 54.14% 2.18:1
———- test_add_case0 end
———- test_divc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 10578 1538 85.46% 6.88:1
2 28010 4235 84.88% 6.61:1
3 38125 5119 86.57% 7.45:1
4 51974 8861 82.95% 5.87:1
———- test_divc_case0 end
———- test_div_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 10606 6069 42.78% 1.75:1
2 25453 11234 55.86% 2.27:1
3 38300 17651 53.91% 2.17:1
4 56101 21420 61.82% 2.62:1
———- test_div_case0 end
———- test_dot_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 11768 3313 71.85% 3.55:1
3 16532 16516 0.10% 1.00:1
4 22882 9516 58.41% 2.40:1
———- test_dot_case0 end
———- test_len_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 21486 8246 0.00% 2.61:1
3 27567 9561 0.00% 2.88:1
4 24488 12465 0.00% 1.96:1
———- test_len_case0 end
———- test_mlac_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 10508 2050 80.49% 5.13:1
2 17911 4988 72.15% 3.59:1
3 30495 8349 72.62% 3.65:1
4 40371 11440 71.66% 3.53:1
———- test_mlac_case0 end
———- test_mla_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 7155 2170 69.67% 3.30:1
2 16950 5371 68.31% 3.16:1
3 28118 10698 61.95% 2.63:1
4 39532 16147 59.15% 2.45:1
———- test_mla_case0 end
———- test_mulc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4290 1533 64.27% 2.80:1
2 9580 3920 59.08% 2.44:1
3 15983 5068 68.29% 3.15:1
4 21423 7349 65.70% 2.92:1
———- test_mulc_case0 end
———- test_mul_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4395 1879 57.25% 2.34:1
2 9892 5185 -84965.60% 1.91:1
3 17533 7318 -240628334592.00% 2.40:1
4 23438 10409 -8780143.00% 2.25:1
———- test_mul_case0 end
———- test_normalize_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 39249 8217 -0.79% 4.78:1
3 56597 12083 0.00% 4.68:1
4 71775 1491083754550155717447254016.00% 4.81:1
———- test_normalize_case0 end
———- test_rsbc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4830 1493 69.09% 3.24:1
2 8038 3685 54.16% 2.18:1
3 14810 5122 65.42% 2.89:1
4 31140 7790 74.98% 4.00:1
———- test_rsbc_case0 end
———- test_setc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 2192 1630 25.64% 1.34:1
2 2158 1478 31.51% 1.46:1
3 3406 2282 33.00% 1.49:1
4 4738 3744 20.98% 1.27:1
———- test_setc_case0 end
———- test_subc_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 3716 1643 55.79% 2.26:1
2 8186 3934 51.94% 2.08:1
3 14287 4744 66.79% 3.01:1
4 18993 7253 61.81% 2.62:1
———- test_subc_case0 end
———- test_sub_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
1 4559 1876 58.85% 2.43:1
2 7698 5110 33.62% 1.51:1
3 14911 7602 49.02% 1.96:1
4 21311 10519 50.64% 2.03:1
———- test_sub_case0 end
———- test_addmat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 27258 11065 59.41% 2.46:1
3 76646 41121 46.35% 1.86:1
4 139892 76687 45.18% 1.82:1
———- test_addmat_case0 end
———- test_detmat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 7518 3358 55.33% 2.24:1
3 26128 20230 22.57% 1.29:1
4 61125 63793 -4.36% 0.96:1
———- test_detmat_case0 end
———- test_identitymat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 6048 4154 31.32% 1.46:1
3 6285 10980 -74.70% 0.57:1
4 12034 18413 -53.01% 0.65:1
———- test_identitymat_case0 end
———- test_invmat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 49487 15768 68.14% 3.14:1
3 82152 77991 5.07% 1.05:1
4 290689 276519 4.87% 1.05:1
———- test_invmat_case0 end
———- test_mulmat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 51628 12736 75.33% 4.05:1
3 187783 50993 72.84% 3.68:1
4 460018 76611 83.35% 6.00:1
———- test_mulmat_case0 end
———- test_mulcmatvec_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 17157 5389 68.59% 3.18:1
3 37867 13746 63.70% 2.75:1
4 66345 28575 56.93% 2.32:1
———- test_mulcmatvec_case0 end
———- test_submat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 27445 11715 57.31% 2.34:1
3 76219 34167 55.17% 2.23:1
4 138741 72923 47.44% 1.90:1
———- test_submat_case0 end
———- test_transmat_case0 start
N-component Vector C Time in ms NEON Time in ms Time Savings Performance Ratio
2 5850 4910 16.07% 1.19:1
3 14634 22954 -56.85% 0.64:1
4 34708 49176 -41.68% 0.71:1
———- test_transmat_case0 end
———- test_cfft_case0 start
FFT Length C Time in ms NEON Time in ms Time Savings Performance Ratio
1024 567586 310824 45.24% 1.83:1
1024 526782 295161 43.97% 1.78:1
256 118986 61773 48.08% 1.93:1
256 113545 69541 38.75% 1.63:1
64 20285 10517 48.15% 1.93:1
64 19536 11901 39.08% 1.64:1
16 3851 2252 41.52% 1.71:1
16 3673 2429 33.87% 1.51:1
———- test_cfft_case0 end
———- test_rfft_case0 start
FFT Length C Time in ms NEON Time in ms Time Savings Performance Ratio
128 26018 21417 17.68% 1.21:1
128 28331 17209 39.26% 1.65:1
512 153267 90792 40.76% 1.69:1
512 155260 84182 45.78% 1.84:1
———- test_rfft_case0 end
———- test_fir_case0 start
FIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
32 202501 139033 31.34% 1.46:1
3 37981 31680 16.59% 1.20:1
7 63419 45430 28.37% 1.40:1
———- test_fir_case0 end
———- test_fir_decimate_case0 start
FIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
7 75817 7642 89.92% 9.92:1
32 136753 11993 91.23% 11.40:1
32 132916 16126 87.87% 8.24:1
———- test_fir_decimate_case0 end
———- test_fir_interpolate_case0 start
FIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
27 154269 97347 36.90% 1.58:1
32 197848 93542 52.72% 2.12:1
27 157669 88028 44.17% 1.79:1
32 171598 83136 51.55% 2.06:1
———- test_fir_interpolate_case0 end
———- test_fir_lattice_case0 start
FIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
3 46292 72688 -57.02% 0.64:1
1 22684 14614 35.58% 1.55:1
3 44353 77546 -74.84% 0.57:1
1 23664 16630 29.72% 1.42:1
———- test_fir_lattice_case0 end
———- test_fir_sparse_case0 start
FIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
5 333908 238858 28.47% 1.40:1
5 216119 84433 60.93% 2.56:1
5 248743 179371 27.89% 1.39:1
———- test_fir_sparse_case0 end
———- test_iir_lattice_case0 start
IIR Length&Taps C Time in ms NEON Time in ms Time Savings Performance Ratio
9 979040 579107 40.85% 1.69:1
9 549613 507107 7.73% 1.08:1
33 1845453 1207897 34.55% 1.53:1
———- test_iir_lattice_case0 end
可以看到还是有很明显的效率提升的!另外,要注意,现在github上的代码是用iOS6 SDK写的CMAKE脚本,在最新iOS7 SDK下编译需要修改ios_config.cmake文件中的
– “$ENV{IOS_DEVELOPER_PATH}/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS6.1.sdk/”)
为
+ “$ENV{IOS_DEVELOPER_PATH}/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/”)
,另外Xcode示例项目还要注意target为6.0以上,否则会提示storyboard用了5以前版本不支持的autolayout特性!
博主友情提示:
如您在评论中需要提及如QQ号、电子邮件地址或其他隐私敏感信息,欢迎使用>>博主专用加密工具v3<<处理后发布,原文只有博主可以看到。