Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 9 Next »

Introduction

As some of the clients are trying to move from Github to Jira there is a need to migrate the relevant the data which they have in Github. This data may include lot of entities which need to be migrate from source system that is Github in our case to target system which would be Jira. These two system are products belonging to two different ecosystems and they do no provide much scope for migration out of the box between themselves.

The aim of this documents is to identify the different entities from Github which needs to be migrated to Jira, decided on the steps of migration and also to give a report of entities which were imported successfully(or failed).

Requirement

We have to pull the data out of Github and push it to Jira. We have identified the entities from Github which needs to be migrated to Jira. They are as follows,

Github Entity

Description

Issues

Issues can be bugs, enhancements, change requests or any other requests related to the repository

Comments

Its a thread of discussion on a issues. Each issue can have multiple comments and each comment can have multiple attachments in it.

Attachments

Attachments can be attached to comments only and the contents of the description are marked as first comment on the issue

Assignee

User account to which the issue is assigned

Projects

Github also has somethings called as projects which at first may look synonymous to projects in Jira but is is not. A single Github repository can have multiple projects and projects in Github are similar to boards in Jira. The confusion arises due to the naming convention used over here.

Milestones

These are similar to sprints in Jira with their own start and end date

Labels

These are similar to labels in JIRA(multi select control)

Users

Users are nothing but accounts to which issues can be assigned

Current State of the Github Account

Entity

Count

Details

Repository

1

Name - SiftScience/code

Issues

16,801

2861(Open Issues), 13940 (Closed Issues)

Projects

8

Set up Digicert access for sre@,
GCS costs Q2 2020,
bigtable backup 2020,
Migrate expr from AWS to GCP,
Prod model GCP migration (groundwork),
Diagnose Mongo Issues Jan 2020,
Mongo Upgrade,
Ubuntu 18

Milestones

174

51(open), 123(closed)

Labels

116

Users

31

Users with “sift” in their username
(these are the unique assignee with “sift” in their username)

Prerequisites for migration

Following are the prerequisites required for migration,

  • User account with full admin access for APIs and data download from Github

  • User account with full admin access for pushing data to JIRA

  • Separate system on which we can login and work(This migration involves client data so we have to use a cloud instance rather than our local machine)

  • Confirmation of repository/repositories to be migrated

  • Confirmation of users to be migrated(Client has to confirm whether the user accounts should be created in Jira or not)

  • Confirmation of the data to be migrated as whole(all the entities ex. repository, issues, labels, milestones ….)

Approaches for migration

Pulling data from Github - Basically there are two ways to pull the data out of Github

  • First Way - Using the export option in the UI

Way to pull data

Using the export option in the UI

Details

There is an option to export the Github data in the account settings page. The data is exported in the compressed format. The archive will contain your profile data, plan, and any email addresses connected with your account in addition to the issues, pull requests, comments, reviews, releases, projects, events, attachments, milestones, settings and much more for each of your repositories along with basic information about the users who have interacted with them. This export should be done using the account with admin privileges so that you get all the data exported in the compressed format

Pros

It is easier to execute and requires less effort.
Contains data in JSON format which can be easily processed.

Cons

There are no ids(unique identifiers) available in the exported json data so we may not be able to cross verify with Github after the migration of data ends and in case if we have few failures.
It may become tedious as we may have to sift through many issues manually to cross verify the migrations. This can prove to be a hurdle while generating the report.

  • Second Way - Using the API to make the request and fetch the data

Way to pull data

Using developer API

Details

Github data can be perceived in terms of the follow hierarchy, You have organization at the top followed by repositories. Each repository has issues, comments and attachments. There are labels which can cut across different issue belonging to different repositories. Milestones belong to repository and there could be organization wide projects too shared by multiple repositories.

Pros

We can generate a report and cross verify the migrated data manually. We can hand over this report with ids of each entity to client so that they can cross verify at their end too.
We can have more control over what information we have to pull from Github
We will have access to unique ids for each entity in Github so that if something fails then we can identify what has exactly failed.

Cons

Extra effort is required to implement such a bridge to pull data from Github and generate a CSV file that Jira can understand

Apis required for migration

GET /orgs/:org - get organization details by Id
GET /repos/:owner/:repo - get repositories by owner and repository id
GET /repos/:owner/:repo/issues - get issues of a repository
Most of this api have links in thir responses to the object we need

Process of Migration

  • Download the data from Github (using either of the above two methods)(Generate the report for downloaded data if the API(second) method was used)

  • Transform the data by processing it

  • Write the transformed data to a csv file in a format understood by Jira server

  • Upload the data to Jira server and check if all the issues are migrated properly in Jira server

  • Write the code to pull data from Jira server and cross verify it with report generated while pulling data from Github(Only if API method is used)

  • Take a backup from the Jira server

  • Upload the backup to Jira cloud and then check if somethings has failed or not(ideally if the Jira server backup has all the data then there should not be any issue in this step)

Task Breakdown and Estimation

Task

Estimation in (man hours)

Comment

Get credentials for admin for Github from client

NA

Get separate VM for development work

NA

Installations on development environment & testing of Github account credentials

Implement pulling data from Github using the api(second method) to connect and pull the data

Implement the Github data processing and data transformation

Implement the ability to export the transformed data in CSV format(compatible with Jira server)

Ability to generate basic reports

Testing

  • No labels