CUDA/OpenCL/C++ or conversion from MATLAB, writing clean, readable, unit tested code
Every deliverable code gets unit tested automatically with the latest testing infrastructure
Fast efficient implementations, on CPU and/or GPU
CPU Optimizations using SIMD SSE/AVX, parallel programming. Usage of atomic commands, lock free data structures, etc. Finding hotspots, detecting and analyzing of various types of bottlenecks.