OSCAR Multicore Suite (beta)

OSCAR Multicore Suite is a software suite for assist parallelizing applications. A sequential C code is accepted as input.

  • OSCAR Multicore Estimator : Analyze C code statically and estimate execution time. Data dependency and loop parallelization is reported.
  • OSCAR Multicore Profiler : Assist software profiling. Automatically insert the code for time measurement to minimize its overhead.
  • OSCAR Parallel Compiler : Automatically generates parallel C code from a sequential C code. This tool utilize both loop parallelism and task parallelism.

Quick Start

System Requirements

OSCAR Multicore Suite have been confirmed its execution on the following systems.

  • Linux
    • Ubuntu 18.04 LTS (x86_64)
    • Ubuntu 20.04 LTS (x86_64)
  • macOS
    • macOS Catalina (10.15) with XCode 12.4 (x86_64)
    • macOS Big Sur (11) with XCode 12.5 (x86_64/arm64)
  • Windows
    • WSL2 with Windows 10 (x64)


OSCAR Multicore Suite binaries are released on OSCAR-Multicore-Suite repository on GitHub. Check the release page of the repository. (For WSL2 use linux binary.)


  • To install OSCAR Multicore Suite, execute the following command.

     $ sudo ./install

  • The installer will automatically copy the binaries and its setting files to the /opt/oscartech/v5.0beta1 .
  • Before use OSCAR Multicore Suite, following command should be executed to set environmental variables.

     $ source /opt/oscartech/v5.0beta1/bin/set_env.sh

Additional installation commands for macOS

First you need to install Xcode command line tools by the following command if you did not install it before.

     $ xcode-select –install

For macOS 10.15 or higher, you need to remove quarantine. This shell script will execute xattr -dr com.apple.quarantine for each binary.

     $ /opt/oscartech/v5.0beta1/bin/mac_rm_quarantine.sh

For Apple Silicon devices, you need code sign to files. You can ad-hoc sign by executing following scripts. This script will execute xattr -cr and codesign -s for each binary

     $ /opt/oscartech/v5.0beta1/bin/mac_sign.sh

OSCAR Multicore Estimator

Main Flow

  1. Analyse each C source files and check the cost and dependency of the functions
  2. Check the functions are suitable for parallelization.

First Step

Tools for analyze: otc-est-1file, makefile etc.

With makefile, execute CC as otc-est-1file. At this time all options (includes and macros) should be passed to otc-est-1file. Each functions involved in the C files are analysed and result will be outputted in the console window.

Makefile after changes

if OSCAR Multicore Suite is executed by the command line

  $ OT_CC=gcc otc-est-1file -D_MACRO1 -c func1.c

otc-est-1file will analyse all functions within the specified C file and output the following information in console window.

test3.c is in the folder, /opt/oscartech/v5.0beta1/share/sample/est .

There is two functions (func2() and func()) inside the test3.c and the cost of func2() is larger than func().

Second Step

Execute otc-est by specify targeted function. By this execution, deeper information of the target function will be outputted.

Makefile after changes

if OSCAR Multicore Suite is executed by the command line

  $ OT_FUNC=func2 otc-est -D_MACRO1 -c func1.c

otc-est will analyse the specified function and output the following information to the console window.

By this analyze, func2() includes one loop and the loop is parallelizable (DOALL). Estimated speed up is 4.0 times (from sequential execution) if you parallelize this loop.

If you have the multiple functions for analyze, change target function name and re-execute otc-est.

How to read the information

[Block name] (Execution Time[ms]) : Filename:line no. [Function name (if the block is function call)]
Processor Num : No. of processors specified by the option
Estimated Speed Up : Parallelism of the code by using [Processor Num]
Maximum Speed Up : Theoretical No. of parallelism
Total Time : Estimated execution time of the inputted code [ms]
Parallelized Time : Estimated parallel execution time of the inputted code by using [Processor Num] [ms]
Critical Path Time : Estimated critical path execution time of the inputted cod [ms]
Cost Model Arch : SHIM file name used to estimate execution time

LOOP_PARALLEL_INFO: output loop parallelism
Format: (Filename:lineno) [loop parallelism info] (Function name)

[loop parallelism info] includes the following messages.

loop was recognized as DOALL : loop can be parallelized
loop was recognized as DOSUM : loop can be parallelized as a reduction loop
outer loop is DOALL or DOSUM : parallelize outer loops
include OUTER_JUMP : loop cannot be parallelized (it includes outer jumps like break, return or goto)
has loop carried dependence : loop cannot be parallelized by data dependencies
To parallelize this loop, private the variable(s) : if you can thread private the variables, loop can be parallelized

DATA_DEP_INFO: data dependencies for Task level parallelism
Format: [Block name] : -> [Block name for dependencies] : (Variable names for dependencies)

The file location of the Block name is shown on “ANALYSIS SUMMARY”

OT_ARCHspecify name for cost estimation. Current supported architecture are following two architectures.
raspi4 (default): Raspberry Pi 4
xeon: Intel Xeon Platinum 9282
OT_CCspecify compiler name for binary generation. (ex gcc, clang) if this ENV is keep brank, binary generation will be OFF.
OT_CNUMspecify number of cores to parallelize (default: 4)
OT_FUNCspecify target function name for analyze (default: main) In case of “int func(void)”, just specify “func” to this ENV.

OSCAR Multicore Profiler

Prepare Runtime Library

copy the following files to the same directory of the target C file.



Compilation and Execution

Execute otc-prof by specify targeted function. By this execution, profiler code will be outputted as .prof.c

  $ OT_FUNC=func2 otc-prof -D_MACRO1 -c func1.c

Compile targeted .prof.c and profile.c and execute the program then result XML file will be outputted.

Structure of the XML file

<module>: name = “Function name”

  <block>: id = “Block name” kind = “Block kind” profile_count = “Executed times” profile_cost = “Execution time” estimated_ratio = “parcentage of the sofware execution time”

 <module name=”conj_grad”>
<block id=”1″ kind=”block” estimated_ratio=”73.190346″>
<block id=”1″ kind=”bb” profile_id=”120″ profile_flg=”1″ profile_count=”16″ profile_cost=”526″ estimated_ratio=”0.000451″> ======================== chop ===========================    
<block id=”4″ kind=”loop” profile_id=”123″ profile_flg=”1″ profile_count=”16″ profile_cost=”86053345″ estimated_ratio=”73.782555″>
<block id=”1″ kind=”bb” profile_id=”129″ profile_flg=”1″ profile_count=”400″ profile_cost=”0″ estimated_ratio=”0.000000″>         </block>
<block id=”2″ kind=”loop” profile_id=”130″ profile_flg=”1″ profile_count=”400″ profile_cost=”67814367″ estimated_ratio=”58.144367″>
<block id=”1″ kind=”bb” profile_id=”137″ profile_flg=”1″ profile_count=”560000″ profile_cost=”0″ estimated_ratio=”0.000000″>
======================== chop ===========================

This XML is generated for conj_grad() function

  • loop4 inside the conj_grad() is profiled and the parcentage of execution time is 74% of its application
  • loop2 inside the loop4 is profiled and the parcentage of execution time is 58% of its application

Case Study

Parallelization of TeraStitcher using OSCAR Parallel Compiler

OSCAR Parallel Compiler is applied to TeraStitcher. Automatically accelerate the “displcompute” phase of TeraStitcher by parallelizing both the compute_NCC() function and the compute_NCC_map() function.

As a result, OSCAR Parallel Compiler gave us 3.4 times speed-up by using Intel Core i7 6850K (4 cores). And it also gave us 6.8 times speed-up by using AMD Ryzen Threadripper 2990WX (32 cores).

TeraStitcher version 1.11.11 (1ef5a44)
dataset: example dataset (distributed in Demo page of TeraStitcher)

Feedback and Contributions

If you wan to to file issues or make feature/enhancement requests use OSCAR-Multicore-Suite repository on GitHub.

Known Issues

Known issues are described in Release Notes in OSCAR-Multicore-Suite repository on GitHub.