Monday 4 September 2017

Video from ARM presenting our collaborative DNN co-design technology

ARM shared a video from Embedded Vision Summit'17 with a brief demonstration of our open-source technology to collaboratively optimize Deep Learning Applications (sw/hw/model co-design) across diverse hardware and software stack:
 It is an on-going project bringing industry, academia and end-users together to collaboratively co-design more efficient software and hardware for emerging workloads such as deep learning:

Wednesday 31 May 2017

Difference between MILEPOST GCC (machine learning based self-tuning compiler) and Collective Knowledge Framework

I recently received several questions about the differences between MILEPOST GCC compiler and Collective Knowledge Framework. This motivated me to write this slightly nostalgic post with the R&D history behind MILEPOST GCC and our CK framework.

MILEPOST GCC is an extended GCC which includes:

1) Interactive Compilation Interface aka ICI - a plugin based framework to expose or change various information and optimization decisions inside compilers at fine-grain level via external plugins. I originally developed it for Open64 and later collaborated with Zbigniew Chamski and colleagues from Google and Mozilla to make it a standard plugin framework for GCC.

2) Feature extractor developed by Mircea Namolaru from IBM as an ICI plugin to expose low-level program features at a function level (see available features here). It was also extended by Jeremy Singer (ft57–65).

However, to keep MILEPOST project complexity under control, I decided to separate MILEPOST GCC from an infrastructure to auto-tune workloads, build models and use them to predict optimizations. Therefore, I developed the first version of the cTuning framework to let users auto-tune GCC flags for shared benchmarks and data sets, use MILEPOST GCC to extract features for these benchmarks, build predictive models (possibly on the fly, i.e. via active learning), and then use them to predict optimizations for previously unseen programs (using ICI to change optimizations).

However, since it was still taking really too long to train models (my PhD students, Yuriy Kashnikov and Abdul Memon, spent 5 months preparing experiments in 2010 for our MILEPOST GCC paper), we decided to crowdsource autotuning via a common repository across diverse hardware provided by volunteers and thus dramatically speed up training process. Accelerating training process and improving the diversity of a training set is the main practical reason why my autotuning frameworks use crowdtuning mode by default nowadays ;) … 

The first cTuning framework turned out very heavy and difficult to install and port (David Del Vento and his interns from NCAR used it in 2010 to tune their workloads and provided lots of useful feedback — thanks guys!). This motivated me to develop a common research SDK (Collective Knowledge aka CK) to simplify, unify and automate general experiments in computer engineering.

CK framework lets the community share their artifacts (benchmarks, data sets, tools, models, experimental results) as customizable and reusable Python components with JSON API. So, you can take advantage from already shared components to quickly prototype your own research workflows such as benchmarking, multi-objective autotuning, machine-learning based optimization, run-time adaptation, etc. That is rather then re-building numerous ad-hoc in-house tools or scripts for autotuning and machine-learning based optimization which rarely survive after PhD students are gone, you can now participate in collaborative and open research with the community, reproduce and improve collaborative experiments, and build upon them ;) … That’s why ACM is now considering using CK for unified artifact sharing (see CK on the ACM DL front page).

You can also take advantage of integrated and cross-platform CK package manager which can prepare your workflow and install missing dependencies on Linux, Windows, MacOS and Android.

For example, see highest ranked artifact from CGO’17 shared as a customizable and portable CK workflow at GitHub.

To conclude my nostalgic overview of the MILEPOST project and CK ;) — MILEPOST GCC is now added to the CK as a unified workflow while taking advantage of a growing number of shared benchmarks, data sets, and optimization statistics (see CK GitHub repo).

I just didn’t have time to provide all the ML gluing, i.e. building models from all optimization statistics and features shared by the community at cKnowledge.org/repo . But it should be quite straightforward, so I hope our community will eventually help implement it. We are now particularly interested to check the prediction accuracy from different models (SVM, KNN, DNN, etc) or to find extra features which improve optimization prediction.

Friday 10 March 2017

Enabling open and reproducible computer systems research: the good, the bad and the ugly

14 March 2017, CNRS webinar, Grenoble, France
Slides are now available here!

A decade ago my research nearly stalled. I was investigating how to crowdsource performance analysis and optimization of realistic workloads across diverse hardware provided by volunteers and combine it with machine learning [1]. Often, it was simply impossible to reproduce crowdsourced empirical results and build predictive models due to the continuously changing software and hardware stack. Worse still, lack of realistic workloads and representative data sets in our community severely limited the usefulness of such models.

All these problems forced motivated me to develop an open-source framework and repository (cTuning.org) to share, validate and reuse workloads, data sets, tools, experimental results and predictive models, while involving the community in this effort [2]. This experience, in turn, helped us initiate so-called Artifact Evaluation (AE) at the premier ACM conferences on parallel programming, architecture and code generation (CGO, PPoPP, PACT and SC). AE aims to independently validate experimental results reported in the publications, and to encourage code and data sharing.

I would like to invite you to my webinar “Enabling open and reproducible research at computer systems conferences: the good, the bad and the ugly” at CNRS Grenoble on 14 March 2017, 1:30PM (UTC+1). I will share our practical experience organizing Artifact Evaluation over the past three years, along with encountered problems and possible solutions. You can find further info at this GitHub page including links to the video stream and the pad for notes.

On the one hand, we have received incredible support from the research community, ACM, universities and companies. We have even received a record number of artifact submissions at the CGO/PPoPP'17 AE (27 vs 17 two years ago) sponsored by NVIDIA, dividiti and cTuning foundation. We have also introduced Artifact Appendices and co-authored the new ACM Result and Artifact Review and Badging policy now used at Supercomputing. 

On the other hand, the use of proprietary benchmarks, rare hardware platforms, and totally ad-hoc scripts to set up, run and process experiments all place a huge burden on evaluators. It is simply too difficult and time-consuming to customize and rebuild experimental setups, reuse artifacts and eventually build upon others’ efforts - the main pillars of open science!

I will then present Collective Knowledge (CK), our humble attempt to introduce a customizable workflow framework with a unified JSON API and a cross-platform package manager, which can automate experimentation and enable interactive articles, while automatically adapting to the ever evolving software and hardware [3]. I will also demonstrate a practical CK workflow for collaboratively optimizing deep learning engines (such as Caffe and TensorFlow) and models across different compilers, libraries, data sets and diverse platforms from constrained mobile devices to data centers (CK-Caffe on GitHub / Android app to crowdsource DNN optimization) [4].

Finally, I will describe our open research initiative to publicly evaluate artifacts and papers which we have successfully validated at CGO-PPoPP’17, and plan to keep building upon in the future [5]. 

I am looking forward to your participation and feedback! Please feel free to contact me at Grigori.Fursin@cTuning.org or grigori@dividiti.com if you have any questions or comments!

References
[3] Collective Knowledge: towards R&D sustainability”, Proceedings of the Conference on Design, Automation and Test in Europe (DATE), 2016
[4] Optimizing Convolutional Neural Networks on Embedded Platforms with OpenCL”, IWOCL'16, Vienna, Austria, 2016
[5] “Community-driven reviewing and validation of publications”, Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering @ PLDI’14, Edinburgh, UK

Wednesday 15 February 2017

Our CGO'07 paper on machine learning based workload optimization received the CGO "Test of Time" award!

I had a really nice surprise at the last International Symposium on Code Generation and Optimization (CGO) - our CGO'07 research paper on "rapidly selecting good compiler optimizations using performance counters" co-authored with my colleagues from INRIA and the University of Edinburgh has won the "test of time" award! This award recognises outstanding papers published at GGO one decade earlier, whose influence is still strong today!

When preparing that paper, I really suffered a lot from the continuously changing software and hardware stack when performing and processing huge amounts of experiments to build and train models which could predict optimizations. That experience eventually motivated me to continue my work on machine learning based optimization as a community effort [1,2] while sharing all my benchmarks, data sets, models, tools and scripts as customizable and reusable components. It also motivated me to develop an open-source framework and repository to crowdsource empirical experiments (such as multi-objective optimization of deep learning and other realistic workloads) across diverse hardware and input provided by volunteers which later became known as the Collective Knowledge (CK):
    Therefore, I would really like to thank the community for such a strong support of our open and reproducible research initiative during past 10 years and for all the constructive feedback and help to develop common experimental infrastructure and methodology!

    For example, CK now assists various Artifact Evaluation initiatives at the premier ACM conferences on parallel programming, architecture and code generation (CGO, PPoPP, PACT, SC), which aim to encourage sharing of code and data, and independently validate experimental results from published papers:
    We also use CK to crowdsource benchmarking and optimizations of realistic workloads across embedded devices such as mobile phones and tablets, while publicly sharing all optimization statistics for further collaborative analysis and mining:
    dividiti (a startup based in Cambridge, UK) and the cTuning foundation (non-profit research organization) also use above technology to lead interdisciplinary research with ARM, General Motors and other companies to build faster, smaller, more power efficient and more reliable software and hardware:
    Hope you will also join our community effort to accelerate computer systems' research and enable cheap and efficient computing from IoT devices to supercomputers!



    Tuesday 17 January 2017

    Artifact Evaluation discussion session at CGO/PPoPP'17

    News:  notes from this joint CGO-PPoPP AE session are now available online.

    We would like to invite all researchers to an open CGO-PPoPP'17 Artifact Evaluation discussion on February 6 (Monday) at 17:15-17:45 (room 400/402, Hilton Austin, Texas, USA).

    The program is the following:
    • Briefly presenting Artifact Evaluation results for CGO'17 and PPoPP'17

    • Announcing joint CGO/PPoPP'17 distinguished artifact awards:
      • 500$ cheque presented by Grigori Fursin from dividiti for the highest-ranked artifact implemented using Collective Knowledge (open-source framework to share artifacts as customizable and reusable Python components with JSON API, automate software installation/detection and quickly prototype cross-platform experimental workflows).
    • Discussing how to improve future AE and make it more scalable, introduce a new option of open reviews, discuss open challenges in computer engineering, and share knowledge about tools and techniques to enable collaborative and reproducible computer systems' research.
    We had a record number of artifact submissions this time: 27 vs 17 two years ago. It is really great to see that researchers are now taking AE seriously, but it also highlighted new issues:

    1) A growing number of diverse artifacts made it somewhat difficult to find AE members with appropriate knowledge, skills and access to rare hardware and software.

    2) Ad-hoc experimental setups placed considerable burden on AE members and committee when installing, running and processing very complex experiments particularly when native environment is required (for example, for performance analysis and tuning) and Docker/VM images are not suitable.

    3) It is still not clear whether we are ready to demand full validation of all experiments from a paper or still allow partial validation. However, we do understand that the complexity of experiments, lack of common experimental frameworks and methodology makes full validation of some experiments really challenging if possible.

    Note that to solve some of these issue we tried for the first time "open reviewing" this year: for example, we asked the community to help us evaluate several open-source artifacts already publicly available at the time of submission. It turned out very well (see links to public discussions) since we managed to find researchers with an access to rare hardware and appropriate skills. Furthermore, public comments helped authors communicate with reviewers directly (note that reviewers can still be anonymous) and fix all encountered issues immediately rather than waiting for the rebuttal.

    We really want to know your options and suggestions about how to solve these and improve AE. Therefore we hope you will be able to join us at this discussion session! Also do not hesitate to contact Artifact Evaluation Steering Committee directly! Remember that new AE procedures may affect you at the future conferences!

    Looking forward to your participation and suggestions!

    Monday 9 January 2017

    Exciting internships at dividiti (deep learning, runtime adaptation, SW/HW co-design)

    We wish you a very happy and successful New Year!

    If you are passionate about performance analysis and optimization, run-time adaptation and SW/HW co-design, as well as collaborative and reproducible experimentation, we would like to draw your attention to several exciting internships at dividiti available for HiPEAC PhD students:
    1. Collective Knowledge on Deep Learning (apply here).
    2. Crowdtuning and runtime adaptation of open-source CPU/GPU libraries (apply here).
    3. Solving grand challenges in computer systems via knowledge sharing and crowdsourcing (apply here).
    You can find general information about HiPEAC internships here. Our internships will be for 3-6 months between February and December 2017 in our fantastic office in Cambridge, UK. Please apply before 1 February 2017!

    Collective Knowledge on Deep Learning

    You will contribute to our growing suite of open-source tools for crowd-benchmarking and crowd-tuning of deep learning applications (CK-Caffe, CK-TensorFlow, CK-TinyDNN, CK-TensorRT, etc.), being developed in collaboration with our customers and partners.We aim to collectively grow optimisation knowledge on deep learning to meet the performance, prediction accuracy and cost requirements for deployment on a wide range of form factors - from sensors to self-driving cars.

    Sounds interesting? Please read more about our initiatives in the latest HiPEAC newsletter (1, 2), try out our Android app and... apply!

    Crowdtuning and runtime adaptation of open-source CPU/GPU libraries

    Several open-source libraries are readily available (e.g. OpenBLAS, MAGMA, ViennaCL, clBLAS, CLBlast). Unfortunately, in terms of performance they generally trail behind closed-source libraries (e.g. Intel's MKL, NVIDIA's cuBLAS). First, developers typically expose only a few optimization parameters (“knobs”) for tuning, as it’s a very tedious, time-consuming and hardware-specific process. Second, developers have no effective means for optimization knowledge transfer between projects.

    You will contribute to an ambitious and exciting open-source initiative to enable library crowd-tuning via our Collective Knowledge framework and repository. This initiative will allow the community to easily compare various implementations of library routines across different data sets and diverse hardware, gradually expose more and more optimization choices, continuously crowd-tune such routines, share optimization statistics in a public repository, and automatically assemble the best and possibly adaptive solution for a given platform.

    Sounds interesting? Please read more about our initiatives in the latest HiPEAC newsletter (1, 2), and apply!

    Solving grand challenges in computer systems via knowledge sharing and crowdsourcing

    You will contribute to solving grand challenges in computer systems research by sharing research artefacts and crowdsourcing experimentation! Please read more about our approach and startup by following the links below and apply!