10. Requirement Analysis/Scoping
Introduction
As some of the clients are trying to move from Github to Jira there is a need to migrate the relevant data which they have in Github. This data may include lot of entities which need to be migrate from source system that is Github in our case to target system which would be Jira cloud. These two systems belong to two different ecosystems and they do no provide much scope for migration out of the box between themselves.
The aim of this documents is to identify the different entities from Github which needs to be migrated to Jira cloud, decide on the steps of migration and also to give a report of entities which were imported successfully(or failed).
Note: Throughout the document Jira refers to Jira Cloud
Requirements
We have to pull the data out of Github and push it to Jira. We have identified the entities from Github which needs to be migrated to Jira. They are as follows,
Sl No# | Github Entity | Description |
---|---|---|
1 | Issues | Issues can be bugs, enhancements, change requests or any other requests related to the repository |
2 | Comments | Its a thread of discussion on a issues. Each issue can have multiple comments and each comment can have multiple attachments in it. |
3 | Attachments | Attachments can be attached to comments only and the contents of the description are marked as first comment on the issue |
4 | Assignee | User account to which the issue is assigned |
5 | Projects | Github also has somethings called as projects which at first may look synonymous to projects in Jira but is is not. A single Github repository can have multiple projects and projects in Github are similar to boards in Jira. The confusion may arise due to the naming convention used over here. |
6 | Milestones | These are similar to sprints in Jira with their own start and end date |
7 | Labels | These are similar to labels in JIRA(multi select control) |
8 | Users | Users are nothing but accounts to which issues can be assigned |
Current State of the Github Account
Sl No# | Entity | Count | Details |
---|---|---|---|
1 | Repository | 1 | Name - SiftScience/code |
2 | Issues | 16,801 | 2861(Open Issues), 13940 (Closed Issues) |
3 | Projects | 8 | Set up Digicert access for sre@, |
4 | Milestones | 174 | 51(open), 123(closed) |
5 | Labels | 116 |
|
6 | Users | 31 | Users with “sift” in their username |
Prerequisites for migration
Following are the prerequisites required for migration,
User account with full admin access for APIs and data download from Github
User account with full admin access for pushing data to JIRA
Separate system on which we can login and work(This migration involves client data so we have to use a cloud instance rather than our local machine)
Confirmation of repository/repositories to be migrated
Confirmation of users to be migrated(Client has to confirm whether the user accounts should be created in Jira or not)
Confirmation of the data to be migrated as whole(all the entities ex. repository, issues, labels, milestones ….)
Approaches for migration
Pulling data from Github - Basically there are two ways to pull the data out of Github
First Way - Using the export option in the UI
Way to pull data | Using the export option in the UI |
Details | There is an option to export the Github data in the account settings page. The data is exported in the compressed format. The archive will contain your profile data, plan, and any email addresses connected with your account in addition to the issues, pull requests, comments, reviews, releases, projects, events, attachments, milestones, settings and much more for each of your repositories along with basic information about the users who have interacted with them. This export should be done using the account with admin privileges so that you get all the data exported in the compressed format |
Pros | It is easier to execute and requires less effort. |
Cons | There are no ids(unique identifiers) available in the exported json data so we may not be able to cross verify with Github after the migration of data ends and in case if we have few failures. |
Second Way - Using the API to make the request and fetch the data
Way to pull data | Using developer API |
Details | Github data can be perceived in terms of the follow hierarchy, You have organization at the top followed by repositories. Each repository has issues, comments and attachments. There are labels which can cut across different issue belonging to different repositories. Milestones belong to repository and there could be organization wide projects too shared by multiple repositories. |
Pros | We can generate a report and cross verify the migrated data manually. We can hand over this report with ids of each entity to client so that they can cross verify at their end too. |
Cons | Extra effort is required to implement such a bridge to pull data from Github and generate a CSV file that Jira can understand |
Apis required for migration |
|
Third Way (Hybrid Approach) - Using the manual export to fetch the data(Used to get attachment details) and using the api approach to fetch other details(Other than attachments)
Way to pull data | Using Organization data export and API |
Details | We can use best of the above two approaches to get the work done. In this approach we can export the organization data using the manual exports (or organization migration api) to get hold of the attachments attached to issues(we cannot get attachments using apis) and then fetch the other information about the organization like repositories, issues, comments, labels, milestones, releases, projects etc using the relevant github apis |
Pros | We can download all the data for an organization using this approach |
Cons | Export option in github UI for any user account does not give the ability to export organization data. It only allows us to download the data which is belongs to that account and not to any organization. |
High level steps for pulling the data |
|
Process of Migration
Download the data from Github (using either of the above three methods)
Transform the data by processing it
Write the transformed data to a csv file in a format understood by Jira
Upload the CSV data to Jira server and check if all the data was imported properly or not(We execute this step to make sure that we do not mess up cloud instance by directly importing data to cloud. We can fix server easily as we have more control over it)
If the CSV data was properly imported to Jira server than upload the same data to Jira cloud
Importing Data in JIRA
Data can be imported in Jira cloud using the CSV format. We can refer to the following link to generate the data in CSV format for both data and attachments.
Jira server - https://confluence.atlassian.com/adminjiraserver087/importing-data-from-csv-998872306.html
Jira Cloud - https://confluence.atlassian.com/adminjiracloud/importing-data-from-csv-776636762.html
Note : For importing attachments it is required to have the attachment data available over http/https so that it is accessible to Jira server/Cloud directly
High Level Tasks
Sl No | Task | Comment |
---|---|---|
1 | Get credentials for admin for Github from client & Jira Access | Action required from Client it is required to export the attachment data |
2 | Get separate VM for development work | Action required from Client |
3 | There should be a mapping document to show which field in Github map to which field Jira | Empyra team to create the mapping and get confirmation from the customer |
4 | Installations on development environment & testing of Github account credentials | Yes we have |
5 | There should be a way for the utility to pull data from Github | Yes we have |
6 | There should be a way for the utility to transform the data from Github format to Jira understandable format | Analysis is done. Transformation to be worked upon. |
7 | There should be a way for the utility to export the transformed data to CSV format understandable by JIRA | Action from Empyra team |
8 | There should be a way for the utility to generate report of data pulled from Github | Action from Empyra team |
9 | There should be a way for the utility to generate report of data exported to CSV format | Action from Empyra team |
10 | There should be a way for the utility to check the data in Jira cloud using api calls and generate report of data exported | Action from Empyra team |
11 | Testing | Action from Empyra team |
Doubts and Clarifications
Question | Details | Clarification |
---|---|---|
Do we need to migrate pull request details? | Github repositories also have pull requests marked raised by users. if we have to migrate them then they have to be migrated to Bitbucket cloud | No, There is no need to migrate repositories |
Do we need to migrate the closed issues? | There are some closed issue present in the repository which somebody had already worked upon and have resolved them so do we have to migrate them also? | Yes |
Do we have to create epics in Jira cloud for issues marked with label ‘epic’? | There are some issue with labels 'Epic' against them so do we have to create epic in Jira cloud for such issues? | Yes |
Do we have to migrate users also? | There are users in the SiftScience accounts so Do we have to migrate these users too to Jira cloud? | Mapping between Github users and Jira users is required |
What kind of ticket should we create in Jira for each issue in Github? | Github has only one issuetype called as issues and Jira has a separate field called as issuetypes, So how should we identify which one is a bug, task, story or any other type? | Open Story What if the issue types doesn’t have any label? |
Does client uses Jira cloud or after github migration they will start using it? | Wanted to understand whether Jira cloud is having data/configuration at present or not. | Client is already using Jira Cloud |
Do we have list of labels that we need to support for migration? | The labels in Github should be migrated to Jira. Now Sift uses labels itself for different purposes. Some of them could be used to mark issuetypes like bugs and feature requests while others may be used for marking them with specific technology of development. | Yes we have the list |
What if a issue in github is marked with two labels which are considered as different projects? | ex. if an issue AP-1 has two labels project 1 and project 2 under it in Github then while moving it to Jira under which Jira project should this issue be put? project 1 or project 2? | Generate a report of such issues having more than one labels(project) |
Dependency from the Client
Item | Details | Status |
---|---|---|
Project List | Client has to create three more projects | Done |
Labels | Done | Done |
Issue Types | Done | Done |
Field Mapping from JIRA to Github |
| Done |
Milestone | Done(We have only one sprint in JIRA. We will have to migrate all the issues under this sprint) | No Need to migrate Milestones |
User List | We need usernames of the users from JIRA | Done |
Access to JIRA |
| Done |
Tasks for POC
Task | Details | Aim | Status |
---|---|---|---|
Migrate : 10 Closed and 10 Open issues Migrate: 5 Milestones and 5 project with attachments | In Gihub, create two four milestones and add few issues under each | To check if we can create closed sprints in Jira or not |
|
Migrate User data | In Github invite some users to join the organization | To check if we have enough information to create a user in Jira from Github |
|
Migrate Comments | In Github create comments under different issues. | To check if we can migrate the comments under a project and a issue or not |
|
Migrate Attachments | In github create an attachment, Migrate this attachment to JIRA | To check if we can migrate the attachment to proper issue or not |
|
Check the release version | Try to migrate releases from Github to Jira | To see if we can migrate releases data or not |
|
Work Breakdown(Stories & Estimations)
# | Story | zTasks | Status |
---|---|---|---|
1 | Utility should be able to pull data from Github |
|
|
|
| Utility should be able to pull api data from Github | Done |
|
| Utility should be able to download the migration data | Done |
|
| Utility should be able to save the data locally to the hard drive | Done |
|
| Utility should be able to join the organization migration(attachment) data to data pulled from api | Done |
|
| Utility should be able to generate the report based on data pulled from Github | Done |
2 | Utility should be able to transform the data in JIRA compatible format |
|
|
|
| Utility should be able to read and transform the data into CSV format compatible with JIRA | Done |
|
| Utility should be able to generate the report of the transformed data | Done |
3 | Configure the test JIRA server instance to emulate the cloud |
|
|
|
| Create users in the jira instance | Done |
|
| Create all projects with same name as labels in Github and custom fields | Done |
|
| Person should be able to import the data in Jira server | Done |
|
| Utility should be able to generate a report to validate the data in JIRA server against the data imported in CSV format | Done |
|
| Validate the migrated data | Done |
4 | Import the transformed data into Jira cloud |
|
|
|
| Person should be able to import the data in JIRA cloud | Done |
|
| Validate the migrated data | Done |