ProAct Analytical Platform

Design & Development from Scratch

Starting discovery phase of the project we had presentation, Excel sheets with some description of how indicators should be calculated and preliminary structure of datasets.

As a result our team of 6 specialists together with analysts and methodologists of Government Transparency Institute and consultants of World Bank have developed ProAct Platform containing analytical tools that provide access to open data from national electronic procurement systems of 46 countries and to open data of World Bank and IDB financed contracts for over 100 countries.

Introduction

About our client

The prototype has been developed for the World Bank in collaboration with the Government Transparency Institute and the Centre for the Study of Corruption at the University of Sussex. The methodology for the platform is based in part on the methodology developed for the www.opentender.eu project, and the Global Integrity Anticorruption Evidence (GI-ACE) program funded by the UK’s FCDO. The ProACT platform benefits from data collected under these projects.

Our approach

The MK team in close cooperation with stakeholders has helped the client to develop the prototype in framework of moving goals forming user stories into features and proposing technical approaches and solutions solving customer’s requests on the fly.

Link

procurementintegrity.org

Characteristics

6

specialists in MK project team

46

procurement datasets of countries

2

procurement datasets of World Bank and IDB financed contracts worldwide

120

countries’ data in datasets

60

data ranges

21

contracts

5+

suppliers

1

buyers across 120 countries

Technologies

ProAct – Global Procurement Anticorruption & Transparency platform prototype development

Contractor

The World Bank

Collaborators

Government Transparency Institute

Delivery period

Prototype: January 2020 – June 2021
Modifications: May 2022 – July 2022

Goal/Business challenge

Provide easy access to global available public procurement data from 48 datasets.
Enable users to identify, analyze and monitor public procurement performance and integrity risks to inform preventive actions, process improvements, and policy reform.
Help prevent corruption and promote transparency and integrity in public expenditures on goods works and services.

Results

Value delivered:

optimal solutions to fit in customer budget and timeframes;
reduced maintenance costs in AWS, automation of datasets’ uploads;
fast performance, customization, and scalability of product.

Services

Services provided:

Architecture development
Infrastructure development on AWS
Front- and Back-end code development
DevOps optimization and support

The project consists of following parts

Webserver VM, which runs frontend and backend servers;
ELS – elasticsearch with processed data (indexes buyers, suppliers etc);
On-demand EMR cluster, for datasets processing with results stored in ELS;
S3 buckets: emr cluster bucket – used for storage of EMR processing code (Scala code for Spark); datasets bucket – used for raw datasets storage;
Lambda function, which fires EMR when new dataset is uploaded to s3 bucket.

Datasets Upload

CI/CD pipeline is organized in such manner, that by event of uploading the new dataset or several datasets into the root of S3 folder the script is triggered and dataset(s) start to be processed/calculated. During this process the following is done, with confirmation to channel in Slack for every action:

Check the file structure to confirm if it is a dataset.
Reading, transforming and normalizing input data
Check if ElasticSearch is up
Notify if the cluster is not ready to scale
Check if ElasticSearch is ready to upscale
Start import with CI/CD pipeline
Start ElasticSearch upscale
Notification about finish of dataset import
Notification about ElasticSearch stop

In case when several datasets are uploaded together they are put in the line for processing.
In case of line or big size of dataset ElasticSearch builds several EMR clusters. In this case the user will see notifications on Slack.
The upload of new dataset for the country that already has an old dataset in the system, does not require the operator to delete old dataset, but only to upload a new one.
During processing the import script calculates the meanings and compares the new data with data in the system. If it finds differences, they are updated by taking meanings from the new dataset.

EMR Cluster

Repo with Scala code for EMR Cluster. Repo has .gitlab-ci.yml file, which is responsible for builds and uploads of Scala code to the emr cluster bucket)

Repo with Terraform code used to create EMR cluster on demand for processing, along with upscaling of ELS (for timely import of massive amounts of data from EMR) and downscaling it back when processing is finished. Lambda function fires EMR import when the dataset is uploaded to the bucket and is located in python/lambda folder.

Update to the lambda code fires up deployment of the code to the AWS.

Generic Workflow description

GOT QUESTIONS?

Request a call