OSCAR Multicore Suite is a software suite for assist parallelizing applications. A sequential C code is accepted as input.
- OSCAR Multicore Estimator : Analyze C code statically and estimate execution time. Data dependency and loop parallelization is reported.
- OSCAR Multicore Profiler : Assist software profiling. Automatically insert the code for time measurement to minimize its overhead.
- OSCAR Parallel Compiler : Automatically generates parallel C code from a sequential C code. This tool utilize both loop parallelism and task parallelism.
OSCAR Multicore Suite have been confirmed its execution on the following systems.
- Ubuntu 18.04 LTS (x86_64)
- Ubuntu 20.04 LTS (x86_64)
- macOS Catalina (10.15) with XCode 12.4 (x86_64)
- macOS Big Sur (11) with XCode 12.5 (x86_64/arm64)
- WSL2 with Windows 10 (x64)
- To install OSCAR Multicore Suite, execute the following command.
- The installer will automatically copy the binaries and its setting files to the /opt/oscartech/v5.0beta1 .
- Before use OSCAR Multicore Suite, following command should be executed to set environmental variables
Additional installation commands for macOS
First you need to install Xcode command line tools by the following command if you did not install it before.
For macOS 10.15 or higher, you need to remove quarantine. This shell script will execute
xattr -dr com.apple.quarantine for each binary.
For Apple Silicon devices, you need code sign to files. You can ad-hoc sign by executing following scripts. This script will execute
xattr -cr and
codesign -s for each binary
OSCAR Multicore Estimator
- Analyse each C source files and check the cost and dependency of the functions
- Check the functions are suitable for parallelization.
Tools for analyze: otc-est-1file, makefile etc.
With makefile, execute CC as
otc-est-1file. At this time all options (includes and macros) should be passed to
otc-est-1file. Each functions involved in the C files are analysed and result will be outputted in the console window.
if OSCAR Multicore Suite is executed by the command line
|$ OT_CC=gcc otc-est-1file -D_MACRO1 -c func1.c|
otc-est-1file will analyse all functions within the specified C file and output the following information in console window.
test3.c is in the folder, /opt/oscartech/v5.0beta1/share/sample/est .
There is two functions (func2() and func()) inside the test3.c and the cost of func2() is larger than func().
otc-est by specify targeted function. By this execution, deeper information of the target function will be outputted.
if OSCAR Multicore Suite is executed by the command line
|$ OT_FUNC=func2 otc-est -D_MACRO1 -c func1.c|
otc-est will analyse the specified function and output the following information to the console window.
By this analyze, func2() includes one loop and the loop is parallelizable (DOALL). Estimated speed up is 4.0 times (from sequential execution) if you parallelize this loop.
If you have the multiple functions for analyze, change target function name and re-execute
How to read the information
***** ANASYSIS SUMMARY START *****
[Block name] (Execution Time[ms]) : Filename:line no. [Function name (if the block is function call)]
Processor Num : No. of processors specified by the option
Estimated Speed Up : Parallelism of the code by using [Processor Num]
Maximum Speed Up : Theoretical No. of parallelism
Total Time : Estimated execution time of the inputted code [ms]
Parallelized Time : Estimated parallel execution time of the inputted code by using [Processor Num] [ms]
Critical Path Time : Estimated critical path execution time of the inputted cod [ms]
Cost Model Arch : SHIM file name used to estimate execution time
***** ANASYSIS SUMMARY END *****
LOOP_PARALLEL_INFO: output loop parallelism
Format: (Filename:lineno) [loop parallelism info] (Function name)
[loop parallelism info] includes the following messages.
loop was recognized as DOALL : loop can be parallelized
loop was recognized as DOSUM : loop can be parallelized as a reduction loop
outer loop is DOALL or DOSUM : parallelize outer loops
include OUTER_JUMP : loop cannot be parallelized (it includes outer jumps like break, return or goto)
has loop carried dependence : loop cannot be parallelized by data dependencies
To parallelize this loop, private the variable(s) : if you can thread private the variables, loop can be parallelized
DATA_DEP_INFO: data dependencies for Task level parallelism
Format: [Block name] : -> [Block name for dependencies] : (Variable names for dependencies)
The file location of the Block name is shown on "ANALYSIS SUMMARY"
|OT_ARCH||specify name for cost estimation. Current supported architecture are following two architectures.|
raspi4 (default): Raspberry Pi 4
xeon: Intel Xeon Platinum 9282
|OT_CC||specify compiler name for binary generation. (ex gcc, clang) if this ENV is keep brank, binary generation will be OFF.|
|OT_CNUM||specify number of cores to parallelize (default: 4)|
|OT_FUNC||specify target function name for analyze (default: main) In case of "int func(void)", just specify "func" to this ENV.|
OSCAR Multicore Profiler
Prepare Runtime Library
copy the following files to the same directory of the target C file.
Compilation and Execution
otc-prof by specify targeted function. By this execution, profiler code will be outputted as .prof.c
|$ OT_FUNC=func2 otc-prof -D_MACRO1 -c func1.c|
Compile targeted .prof.c and profile.c and execute the program then result XML file will be outputted.
Structure of the XML file
<module>: name = "Function name"
<block>: id = "Block name" kind = "Block kind" profile_count = "Executed times" profile_cost = "Execution time" estimated_ratio = "parcentage of the sofware execution time"
| <module name="conj_grad"> |
<block id="1" kind="block" estimated_ratio="73.190346">
<block id="1" kind="bb" profile_id="120" profile_flg="1" profile_count="16" profile_cost="526" estimated_ratio="0.000451"> ======================== chop ===========================
<block id="4" kind="loop" profile_id="123" profile_flg="1" profile_count="16" profile_cost="86053345" estimated_ratio="73.782555">
<block id="1" kind="bb" profile_id="129" profile_flg="1" profile_count="400" profile_cost="0" estimated_ratio="0.000000"> </block>
<block id="2" kind="loop" profile_id="130" profile_flg="1" profile_count="400" profile_cost="67814367" estimated_ratio="58.144367">
<block id="1" kind="bb" profile_id="137" profile_flg="1" profile_count="560000" profile_cost="0" estimated_ratio="0.000000">
======================== chop ===========================
This XML is generated for conj_grad() function
- loop4 inside the conj_grad() is profiled and the parcentage of execution time is 74% of its application
- loop2 inside the loop4 is profiled and the parcentage of execution time is 58% of its application
Feedback and Contributions
If you wan to to file issues or make feature/enhancement requests use OSCAR-Multicore-Suite repository on GitHub.