CUDA/OpenCL/C++ development or conversion from MATLAB, writing highly efficient, readable, unit tested code
Every deliverable code gets unit tested automatically with the latest testing infrastructure
Fast efficient implementations, on CPU and/or GPU
CPU Optimizations using SIMD SSE/AVX, parallel programming. Usage of atomic commands, lock free data structures, etc. Finding hotspots, detecting and analyzing of various types of bottlenecks.