Introduction
As some of the clients are trying to move from Github to Jira there is a need to migrate the relevant data which they have in Github. This data may include lot of entities which need to be migrate from source system that is Github in our case to target system which would be Jira cloud. These two systems belong to two different ecosystems and they do no provide much scope for migration out of the box between themselves.
The aim of this documents is to identify the different entities from Github which needs to be migrated to Jira cloud, decide on the steps of migration and also to give a report of entities which were imported successfully(or failed).
Note: Throughout the document Jira refers to Jira Cloud
Requirement
We have to pull the data out of Github and push it to Jira. We have identified the entities from Github which needs to be migrated to Jira. They are as follows,
Github Entity | Description |
---|---|
Issues | Issues can be bugs, enhancements, change requests or any other requests related to the repository |
Comments | Its a thread of discussion on a issues. Each issue can have multiple comments and each comment can have multiple attachments in it. |
Attachments | Attachments can be attached to comments only and the contents of the description are marked as first comment on the issue |
Assignee | User account to which the issue is assigned |
Projects | Github also has somethings called as projects which at first may look synonymous to projects in Jira but is is not. A single Github repository can have multiple projects and projects in Github are similar to boards in Jira. The confusion arises due to the naming convention used over here. |
Milestones | These are similar to sprints in Jira with their own start and end date |
Labels | These are similar to labels in JIRA(multi select control) |
Users | Users are nothing but accounts to which issues can be assigned |
Current State of the Github Account
Entity | Count | Details |
---|---|---|
Repository | 1 | Name - SiftScience/code |
Issues | 16,801 | 2861(Open Issues), 13940 (Closed Issues) |
Projects | 8 | Set up Digicert access for sre@, |
Milestones | 174 | 51(open), 123(closed) |
Labels | 116 | |
Users | 31 | Users with “sift” in their username |
Prerequisites for migration
Following are the prerequisites required for migration,
User account with full admin access for APIs and data download from Github
User account with full admin access for pushing data to JIRA
Separate system on which we can login and work(This migration involves client data so we have to use a cloud instance rather than our local machine)
Confirmation of repository/repositories to be migrated
Confirmation of users to be migrated(Client has to confirm whether the user accounts should be created in Jira or not)
Confirmation of the data to be migrated as whole(all the entities ex. repository, issues, labels, milestones ….)
Approaches for migration
Pulling data from Github - Basically there are two ways to pull the data out of Github
First Way - Using the export option in the UI
Way to pull data | Using the export option in the UI |
Details | There is an option to export the Github data in the account settings page. The data is exported in the compressed format. The archive will contain your profile data, plan, and any email addresses connected with your account in addition to the issues, pull requests, comments, reviews, releases, projects, events, attachments, milestones, settings and much more for each of your repositories along with basic information about the users who have interacted with them. This export should be done using the account with admin privileges so that you get all the data exported in the compressed format |
Pros | It is easier to execute and requires less effort. |
Cons | There are no ids(unique identifiers) available in the exported json data so we may not be able to cross verify with Github after the migration of data ends and in case if we have few failures. |
Second Way - Using the API to make the request and fetch the data
Way to pull data | Using developer API |
Details | Github data can be perceived in terms of the follow hierarchy, You have organization at the top followed by repositories. Each repository has issues, comments and attachments. There are labels which can cut across different issue belonging to different repositories. Milestones belong to repository and there could be organization wide projects too shared by multiple repositories. |
Pros | We can generate a report and cross verify the migrated data manually. We can hand over this report with ids of each entity to client so that they can cross verify at their end too. |
Cons | Extra effort is required to implement such a bridge to pull data from Github and generate a CSV file that Jira can understand |
Apis required for migration |
|
Process of Migration
Download the data from Github (using either of the above two methods)(Generate the report for downloaded data if the API(second) method was used)
Transform the data by processing it
Write the transformed data to a csv file in a format understood by Jira server
Upload the data to Jira server and check if all the issues are migrated properly in Jira server
Write the code to pull data from Jira server and cross verify it with report generated while pulling data from Github(Only if API method is used)
Take a backup from the Jira server
Upload the backup to Jira cloud and then check if somethings has failed or not(ideally if the Jira server backup has all the data then there should not be any issue in this step)
Task Breakdown and Estimation
Task | Estimation in (man hours) | Comment |
---|---|---|
Get credentials for admin for Github from client | NA | |
Get separate VM for development work | NA | |
Installations on development environment & testing of Github account credentials | ||
Implement pulling data from Github using the api(second method) to connect and pull the data | ||
Implement the Github data processing and data transformation | ||
Implement the ability to export the transformed data in CSV format(compatible with Jira server) | ||
Ability to generate basic reports | ||
Testing |
Questions
1.Do we need to migrate pull request details?
2.Do we need to migrate the closed issues?
3.Some issues are marked as epic so do we nee to create an epic for that?
4.Will customer give list of users and user metrics?
5.
POC
Migrate 2 Closed and 2 Open Milestones along with Project and Repo
Migrate Users related
Migrate Comments
Migrate Attachments
Check the release option