10. Requirement Analysis/Scoping

Introduction

As some of the clients are trying to move from Github to Jira there is a need to migrate the relevant the data which they have in Github. This data may include lot of entities which need to be migrate from source system that is Github in our case to target system which would be Jira. These two system are products belonging to two different ecosystems and they do no provide much scope for migration out of the box between themselves.

The aim of this documents is to identify the different entities from Github which needs to be migrated to Jira, decided on the steps of migration and also to give a report of entities which were imported successfully(or failed).

Requirement

We have to pull the data out of Github and push it to Jira. We have identified the entities from Github which needs to be migrated to Jira. They are as follows,

Github Entity	Description
Issues	Issues can be bugs, enhancements, change requests or any other requests related to the repository
Comments	Its a thread of discussion on a issues. Each issue can have multiple comments and each comment can have multiple attachments in it.
Attachments	Attachments can be attached to comments only and the contents of the description are marked as first comment on the issue
Assignee	User account to which the issue is assigned
Projects	Github also has somethings called as projects which at first may look synonymous to projects in Jira but is is not. A single Github repository can have multiple projects and projects in Github are similar to boards in Jira. The confusion arises due to the naming convention used over here.
Milestones	These are similar to sprints in Jira with their own start and end date
Labels	These are similar to labels in JIRA(multi select control)
Users	Users are nothing but accounts to which issues can be assigned

Current State of the Github Account

Entity	Count	Details
Repository	1	Name - SiftScience/code
Issues	16,801	2861(Open Issues), 13940 (Closed Issues)
Projects	8	Set up Digicert access for sre@, GCS costs Q2 2020, bigtable backup 2020, Migrate expr from AWS to GCP, Prod model GCP migration (groundwork), Diagnose Mongo Issues Jan 2020, Mongo Upgrade, Ubuntu 18
Milestones	174	51(open), 123(closed)
Labels	116
Users	31	Users with “sift” in their username (these are the unique assignee with “sift” in their username)

Prerequisites for migration

Following are the prerequisites required for migration,

User account with full admin access for APIs and data download from Github
User account with full admin access for pushing data to JIRA
Separate system on which we can login and work(This migration involves client data so we have to use a cloud instance rather than our local machine)
Confirmation of repository/repositories to be migrated
Confirmation of users to be migrated(Client has to confirm whether the user accounts should be created in Jira or not)
Confirmation of the data to be migrated as whole(all the entities ex. repository, issues, labels, milestones ….)

Approaches for migration

Pulling data from Github - Basically there are two ways to pull the data out of Github

First Way - Using the export option in the UI

Way to pull data	Using the export option in the UI
Details	There is an option to export the Github data in the account settings page. The data is exported in the compressed format. The archive will contain your profile data, plan, and any email addresses connected with your account in addition to the issues, pull requests, comments, reviews, releases, projects, events, attachments, milestones, settings and much more for each of your repositories along with basic information about the users who have interacted with them. This export should be done using the account with admin privileges so that you get all the data exported in the compressed format
Pros	It is easier to execute and requires less effort. Contains data in JSON format which can be easily processed.
Cons	There are no ids(unique identifiers) available in the exported json data so we may not be able to cross verify with Github after the migration of data ends and in case if we have few failures. It may become tedious as we may have to sift through many issues manually to cross verify the migrations. This can prove to be a hurdle while generating the report.

Second Way - Using the API to make the request and fetch the data

Way to pull data	Using developer API
Details	Github data can be perceived in terms of the follow hierarchy, You have organization at the top followed by repositories. Each repository has issues, comments and attachments. There are labels which can cut across different issue belonging to different repositories. Milestones belong to repository and there could be organization wide projects too shared by multiple repositories.
Pros	We can generate a report and cross verify the migrated data manually. We can hand over this report with ids of each entity to client so that they can cross verify at their end too. We can have more control over what information we have to pull from Github We will have access to unique ids for each entity in Github so that if something fails then we can identify what has exactly failed.
Cons	Extra effort is required to implement such a bridge to pull data from Github and generate a CSV file that Jira can understand
Apis required for migration	`GET /orgs/:org` - get organization details by Id `GET /repos/:owner/:repo` - get repositories by owner and repository id `GET /repos/:owner/:repo/issues` - get issues of a repository Most of this api have links in thir responses to the object we need

Process of Migration

Download the data from Github (using either of the above two methods)(Generate the report for downloaded data if the API(second) method was used)
Transform the data by processing it
Write the transformed data to a csv file in a format understood by Jira server
Upload the data to Jira server and check if all the issues are migrated properly in Jira server
Write the code to pull data from Jira server and cross verify it with report generated while pulling data from Github(Only if API method is used)
Take a backup from the Jira server
Upload the backup to Jira cloud and then check if somethings has failed or not(ideally if the Jira server backup has all the data then there should not be any issue in this step)

Task Breakdown and Estimation

Task	Estimation in (man hours)	Comment
Get credentials for admin for Github from client	NA
Get separate VM for development work	NA
Installations on development environment & testing of Github account credentials
Implement pulling data from Github using the api(second method) to connect and pull the data
Implement the Github data processing and data transformation
Implement the ability to export the transformed data in CSV format(compatible with Jira server)
Ability to generate basic reports
Testing