10. Requirement Analysis/Scoping

Introduction

As some of the clients are trying to move from Github to Jira there is a need to migrate the relevant data which they have in Github. This data may include lot of entities which need to be migrate from source system that is Github in our case to target system which would be Jira cloud. These two systems belong to two different ecosystems and they do no provide much scope for migration out of the box between themselves.

The aim of this documents is to identify the different entities from Github which needs to be migrated to Jira cloud, decide on the steps of migration and also to give a report of entities which were imported successfully(or failed).

Note: Throughout the document Jira refers to Jira Cloud

Requirements

We have to pull the data out of Github and push it to Jira. We have identified the entities from Github which needs to be migrated to Jira. They are as follows,

Sl No#	Github Entity	Description

Sl No#	Github Entity	Description
1	Issues	Issues can be bugs, enhancements, change requests or any other requests related to the repository
2	Comments	Its a thread of discussion on a issues. Each issue can have multiple comments and each comment can have multiple attachments in it.
3	Attachments	Attachments can be attached to comments only and the contents of the description are marked as first comment on the issue
4	Assignee	User account to which the issue is assigned
5	Projects	Github also has somethings called as projects which at first may look synonymous to projects in Jira but is is not. A single Github repository can have multiple projects and projects in Github are similar to boards in Jira. The confusion may arise due to the naming convention used over here.
6	Milestones	These are similar to sprints in Jira with their own start and end date
7	Labels	These are similar to labels in JIRA(multi select control)
8	Users	Users are nothing but accounts to which issues can be assigned

Current State of the Github Account

Sl No#	Entity	Count	Details

Sl No#	Entity	Count	Details
1	Repository	1	Name - SiftScience/code
2	Issues	16,801	2861(Open Issues), 13940 (Closed Issues)
3	Projects	8	Set up Digicert access for sre@, GCS costs Q2 2020, bigtable backup 2020, Migrate expr from AWS to GCP, Prod model GCP migration (groundwork), Diagnose Mongo Issues Jan 2020, Mongo Upgrade, Ubuntu 18
4	Milestones	174	51(open), 123(closed)
5	Labels	116
6	Users	31	Users with “sift” in their username (these are the unique assignee with “sift” in their username)

Prerequisites for migration

Following are the prerequisites required for migration,

User account with full admin access for APIs and data download from Github
User account with full admin access for pushing data to JIRA
Separate system on which we can login and work(This migration involves client data so we have to use a cloud instance rather than our local machine)
Confirmation of repository/repositories to be migrated
Confirmation of users to be migrated(Client has to confirm whether the user accounts should be created in Jira or not)
Confirmation of the data to be migrated as whole(all the entities ex. repository, issues, labels, milestones ….)

Approaches for migration

Pulling data from Github - Basically there are two ways to pull the data out of Github

First Way - Using the export option in the UI

Way to pull data	Using the export option in the UI
Details	There is an option to export the Github data in the account settings page. The data is exported in the compressed format. The archive will contain your profile data, plan, and any email addresses connected with your account in addition to the issues, pull requests, comments, reviews, releases, projects, events, attachments, milestones, settings and much more for each of your repositories along with basic information about the users who have interacted with them. This export should be done using the account with admin privileges so that you get all the data exported in the compressed format
Pros	It is easier to execute and requires less effort. Contains data in JSON format which can be easily processed.
Cons	There are no ids(unique identifiers) available in the exported json data so we may not be able to cross verify with Github after the migration of data ends and in case if we have few failures. It may become tedious as we may have to sift through many issues manually to cross verify the migrations. This can prove to be a hurdle while generating the report.

Second Way - Using the API to make the request and fetch the data

Way to pull data	Using developer API
Details	Github data can be perceived in terms of the follow hierarchy, You have organization at the top followed by repositories. Each repository has issues, comments and attachments. There are labels which can cut across different issue belonging to different repositories. Milestones belong to repository and there could be organization wide projects too shared by multiple repositories.
Pros	We can generate a report and cross verify the migrated data manually. We can hand over this report with ids of each entity to client so that they can cross verify at their end too. We can have more control over what information we have to pull from Github We will have access to unique ids for each entity in Github so that if something fails then we can identify what has exactly failed.
Cons	Extra effort is required to implement such a bridge to pull data from Github and generate a CSV file that Jira can understand It is not possible to get the attachments using the api
Apis required for migration	`GET /orgs/:org` - get organization details by Id `GET /repos/:owner/:repo` - get repositories by owner and repository id `GET /repos/:owner/:repo/issues` - get issues of a repository Most of this api have links in their responses to the object we need

Third Way (Hybrid Approach) - Using the manual export to fetch the data(Used to get attachment details) and using the api approach to fetch other details(Other than attachments)

Way to pull data	Using Organization data export and API
Details	We can use best of the above two approaches to get the work done. In this approach we can export the organization data using the manual exports (or organization migration api) to get hold of the attachments attached to issues(we cannot get attachments using apis) and then fetch the other information about the organization like repositories, issues, comments, labels, milestones, releases, projects etc using the relevant github apis
Pros	We can download all the data for an organization using this approach
Cons	Export option in github UI for any user account does not give the ability to export organization data. It only allows us to download the data which is belongs to that account and not to any organization. We may have to use Github’s Async organization migration API to get around this problem
High level steps for pulling the data	Export the organization data using Either UI or organization migration api (for attachments) `POST /orgs/:org/migrations` - to start the process of generating the organization data `GET /orgs/:org/migrations/:migrationId` - to check the status of migration process `GET /orgs/:org/migrations/:migrationId/archive` - to download the compressed file with organization data and attachments Us the Github api to pull the data for other entities(repositories, issues, comments, projects, milestones, labels)

Process of Migration

Download the data from Github (using either of the above three methods)
Transform the data by processing it
Write the transformed data to a csv file in a format understood by Jira
Upload the CSV data to Jira server and check if all the data was imported properly or not(We execute this step to make sure that we do not mess up cloud instance by directly importing data to cloud. We can fix server easily as we have more control over it)
If the CSV data was properly imported to Jira server than upload the same data to Jira cloud

Importing Data in JIRA

Data can be imported in Jira cloud using the CSV format. We can refer to the following link to generate the data in CSV format for both data and attachments.
Jira server - https://confluence.atlassian.com/adminjiraserver087/importing-data-from-csv-998872306.html
Jira Cloud - https://confluence.atlassian.com/adminjiracloud/importing-data-from-csv-776636762.html

Note : For importing attachments it is required to have the attachment data available over http/https so that it is accessible to Jira server/Cloud directly

High Level Tasks

Sl No	Task	Comment

Sl No	Task	Comment
1	Get credentials for admin for Github from client & Jira Access	Action required from Client it is required to export the attachment data
2	Get separate VM for development work	Action required from Client
3	There should be a mapping document to show which field in Github map to which field Jira	Empyra team to create the mapping and get confirmation from the customer
4	Installations on development environment & testing of Github account credentials	Yes we have
5	There should be a way for the utility to pull data from Github	Yes we have
6	There should be a way for the utility to transform the data from Github format to Jira understandable format	Analysis is done. Transformation to be worked upon.
7	There should be a way for the utility to export the transformed data to CSV format understandable by JIRA	Action from Empyra team
8	There should be a way for the utility to generate report of data pulled from Github	Action from Empyra team
9	There should be a way for the utility to generate report of data exported to CSV format	Action from Empyra team
10	There should be a way for the utility to check the data in Jira cloud using api calls and generate report of data exported	Action from Empyra team
11	Testing	Action from Empyra team

Doubts and Clarifications

Question	Details	Clarification

Question	Details	Clarification
Do we need to migrate pull request details?	Github repositories also have pull requests marked raised by users. if we have to migrate them then they have to be migrated to Bitbucket cloud	No, There is no need to migrate repositories
Do we need to migrate the closed issues?	There are some closed issue present in the repository which somebody had already worked upon and have resolved them so do we have to migrate them also?	Yes
Do we have to create epics in Jira cloud for issues marked with label ‘epic’?	There are some issue with labels 'Epic' against them so do we have to create epic in Jira cloud for such issues?	Yes
Do we have to migrate users also?	There are users in the SiftScience accounts so Do we have to migrate these users too to Jira cloud?	Mapping between Github users and Jira users is required
What kind of ticket should we create in Jira for each issue in Github?	Github has only one issuetype called as issues and Jira has a separate field called as issuetypes, So how should we identify which one is a bug, task, story or any other type?	Open Story What if the issue types doesn’t have any label?
Does client uses Jira cloud or after github migration they will start using it?	Wanted to understand whether Jira cloud is having data/configuration at present or not.	Client is already using Jira Cloud
Do we have list of labels that we need to support for migration?	The labels in Github should be migrated to Jira. Now Sift uses labels itself for different purposes. Some of them could be used to mark issuetypes like bugs and feature requests while others may be used for marking them with specific technology of development.	Yes we have the list
What if a issue in github is marked with two labels which are considered as different projects?	ex. if an issue AP-1 has two labels project 1 and project 2 under it in Github then while moving it to Jira under which Jira project should this issue be put? project 1 or project 2?	Generate a report of such issues having more than one labels(project)

Dependency from the Client

Item	Details	Status

Item	Details	Status
Project List	Client has to create three more projects	Done
Labels	Done	Done
Issue Types	Done	Done
Field Mapping from JIRA to Github		Done
Milestone	Done(We have only one sprint in JIRA. We will have to migrate all the issues under this sprint)	No Need to migrate Milestones
User List	We need usernames of the users from JIRA	Done
Access to JIRA		Done

Tasks for POC

Task	Details	Aim	Status

Task	Details	Aim
Migrate : 10 Closed and 10 Open issues Migrate: 5 Milestones and 5 project with attachments	In Gihub, create two four milestones and add few issues under each Now close two of the milestones and keep other two open. Also tag issues in these milestone with projects try to migrate the data	To check if we can create closed sprints in Jira or not
Migrate User data	In Github invite some users to join the organization Migrate these users to JIRA	To check if we have enough information to create a user in Jira from Github
Migrate Comments	In Github create comments under different issues. Migrate these comments under issues in JIRA	To check if we can migrate the comments under a project and a issue or not
Migrate Attachments	In github create an attachment, Migrate this attachment to JIRA	To check if we can migrate the attachment to proper issue or not
Check the release version	Try to migrate releases from Github to Jira	To see if we can migrate releases data or not

Work Breakdown(Stories & Estimations)

#	Story	zTasks	Status

#	Story	zTasks	Status
1	Utility should be able to pull data from Github
		Utility should be able to pull api data from Github	Done
		Utility should be able to download the migration data	Done
		Utility should be able to save the data locally to the hard drive	Done
		Utility should be able to join the organization migration(attachment) data to data pulled from api	Done
		Utility should be able to generate the report based on data pulled from Github	Done
2	Utility should be able to transform the data in JIRA compatible format
		Utility should be able to read and transform the data into CSV format compatible with JIRA	Done
		Utility should be able to generate the report of the transformed data	Done
3	Configure the test JIRA server instance to emulate the cloud
		Create users in the jira instance	Done
		Create all projects with same name as labels in Github and custom fields	Done
		Person should be able to import the data in Jira server	Done
		Utility should be able to generate a report to validate the data in JIRA server against the data imported in CSV format	Done
		Validate the migrated data	Done
4	Import the transformed data into Jira cloud
		Person should be able to import the data in JIRA cloud	Done
		Validate the migrated data	Done