Hello Guys,
This blog is part zero of eight part series of porting your RDBMS database to a NO-SQL wide column key value pair database.
Just an introduction!
To begin with, I have been working on a use case where the reporting web application and the analytics application was to be segregated and expected to point to another highly scalable and reliable database, as currently the applications are pointing to the Transactional database where the IoT based devices are continuously pumping in data and the amount of data being queried by the reporting applications was enormous. The web application is built in .Net technology and MS SQL Server was used as transactional database. Lots of logic is embedded into stored procedures and also, there are more than thousand stored procedures. Having said that I would not mind calling the database as intelligent database.
After lots of research and discussion we decided to leverage Hadoop Ecosystem for the reporting and analytics. Having expertise in Microsoft technology and being more well versed with RDBMS data modeling, you are bound to be skeptical on how Hadoop Ecosystem will help to resolve your problem. The more you read on Hadoop ecosystem, the more you are confused. Also worth noting is there is nothing like stored procedures in Hadoop Ecosystem. So the question was, where do I port the logic embedded in the stored procedure? Anyways, I started looking over strategies to make Hadoop ecosystem part of the infrastructure.
We started speaking with consulting firms and started looking at various flavors of Hadoop Ecosystem. MapR also approached us and provided us with a workable architecture. Logically it did make sense but we wanted to have a feel of it and see it working and so we decided to implement few use cases as proof of concept (POC). MapR professional services (PS) helped us in implementing the POC. We did look at various options like either using MapR-FS (HDFS) and Mapr-DB (HBASE). Mapr-DB was found more feasible to our use case so we started architecting the solution using Mapr-DB. HDFS is good for Static data set or data set that can be refreshed or synced at low frequecy or high intervals. On the other hand Mapr-DB is good for high volume transactional database or near real time sync. Also note Mapr-DB is multi-model No-SQL database. It can have tables in form of multi column family wide column key-value pair or can be a document database.
One thing I learnt and would like to share is there is no way to port the stored procedures into Hadoop Ecosystem, do not waste time to research on it. I will be sharing the steps we executed. For now, I am just listing them but I will explain each step more thoroughly, so it will provide clear picture. Here are the steps:
- List sample use cases or reports of different categories or types
- As per the use cases, model the sample data set
- Create the wide column key-value pair data model. I will be using this sample RDBMS data model to explain and create the wide column key-value pair data model.
- Row-key design
- Data Import.
- Create C# application to fetch data from MapR-DB
- Implement change data capture (CDC) for continuous flow of data
- Using Drill create view for Tableau live connection.
The above steps are common and should work for HBASE or any other No-SQL wide column / key value pair database. I hope it helps you to port your application to Hadoop Ecosystem or other No-SQL databases.
Thanks for reading!
Ashfaq Chougle
Continue to Part One: Sample Use Cases - From SQL Server to Wide Column No-SQL