Effective Solutions Through Partnership

Category Archives: Data Management

Big Data and Hadoop: A high-level overview for the layperson

Data Management, Information Technology, Internet of Things, Sacramento, Technology

By Sid Richardson, PMP, CSM

I have been in the data warehousing practice since 1994, when I implemented a successful Distributed Data Warehouse for a flagship banking product, followed by co-developing Oracle’s Data Warehouse Methodology. In August 1997, I was invited to speak at the Data Warehouse Institute Conference in Boston.

Over the years, I’ve researched and implemented what I would consider some small scale/junior Big Data systems. I have an interest in Big Data and wanted to share my learnings on Big Data and Hadoop as a high-level overview for the layperson / busy executive.

What is Big Data?

Big Data defines an IT approach used to process the enormous amounts of available information from social media, emails, log files, text, camera/video, sensors, website clickstreams, Radio Frequency Identification (RFID) tags, audio, and other sources of information in combination with existing computer files and database data.

In the 1990s, three major trends occurred to make up Big Data: “Big” Transaction Data, “Big” Interaction Data, and “Big” Data Processing.

In 2001, Big Data was defined by Doug Laney, former Vice President and Distinguished Analyst with the Gartner Chief Data Officer (CDO) research and advisory team. Mr. Laney defined Big Data by the “three Vs”:

    1. Velocity – Speed of incoming data feeds.
    2. Variety – Unstructured data, social media, documents, images.
    3. Volume – Large quantities of data.

IBM decided to add two more Vs:

    1. Veracity – Accuracy of the data.
    2. Value – To define Big Data.

Why do we need Big Data?

In a nutshell: We need Big Data because there is a lot of data to process, for example:

Also noted by The Economist, the abundance of data and tools to capture, process, and share all this information already exceeds the available storage space (and the number of eyes on the planet to review and analyze it all!)

According to Forbes’s 2018 article, “How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read,” there are 2.5 quintillion bytes of data created each day. And, over the last two years alone, 90 percent of the data in the world was generated.

Clearly, the creation of data is expanding at an astonishing pace—from the amount of data being produced to the way in which it’s re-structured for analysis and used. This trend presents enormous challenges, but it also presents incredible opportunities.

You’re probably thinking, alright, I get the big data thing, but why couldn’t data warehouses perform this role? Well, data warehouses are large, complex, and expensive projects that typically run approximately 12-18 month-long durations with high failure rates (The failure rate of data warehouses across all industries is high—Gartner once estimated that as many as 50 percent of data warehouse projects would have only limited acceptance or fail entirely).

A new approach to handle Big Data was born: Hadoop.

What is Hadoop?

In a nutshell, Hadoop is a Java-based framework governed by the Apache Software Foundation (ASF) that initially addressed the ‘Volume’ and ‘Variety’ aspects of Big Data and provided a distributed, fault-tolerant, batched data processing environment (one record at a time, but designed to scale to Petabyte-sized file processing).

Hadoop was created out of a need to substantially reduce the cost of storage for massive volumes of data for analysis and does so by emulating a distributed parallel processing environment by networking many cheap, existing commodity processors and storage together, rather than using dedicated hardware and storage solutions.

Why Hadoop?

The Challenges with Hadoop

There is a limited understanding about Hadoop across the IT industry. Hadoop has operational limitations and performance challenges—you need to resort to several extended components to make it work and to make it reliable. And, Hadoop is becoming more fragmented, pulled by different commercial players trying to leverage their own solutions.

In summary…

The Hadoop Framework addresses a number of previous challenges facing the processing of Big Data for analysis. The explosion in deployment of data capture devices across all industries world-wide necessitated a more cost-effective way to store and access the massive volumes of data accumulating by the second!

I hope this blog post has provided you with a better understanding of some key Big Data and Hadoop concepts and technologies. Have you worked with Big Data and/or Hadoop? Let us know your thoughts and experiences in the comments!

P.S. If you have gotten this far and are curious where the name Hadoop comes from, here you go! The name ‘Hadoop’ was coined by one of the sons of Doug Cutting, a software designer and advocate and creator of open-source search technology. Mr. Cutting’s son gave the name ‘Hadoop’ to his toy elephant and Mr. Cutting used the name for his open source project because it was easy to pronounce.

About the Author: Mr. Richardson’s passion is Data Warehousing, Business Intelligence, Master Data Management and Data Architectures. He has helped Fortune 500 companies in the US, Europe, Canada, and Australia lead large-scale corporate system and data initiatives and teams to success. His experience spans 30 years in the Information Technology space, specifically with experience in data warehousing, business intelligence, information management, data migrations, converged infrastructures and recently Big Data. Mr. Richardson’s industry experience includes: Finance and Banking, government, utilities, insurance, retail, manufacturing, telecommunications, healthcare, large-scale engineering and transportation sectors.

KAI Partners Staff Profile: The Data Architect

Business Analysis, Data Architect, Data Management, Data Science, Government, IT Modernization, KAI Partners, Learning, Project Management, Sacramento, Technology, Training

There are many paths to success and while not everyone takes the same path, we often manage to arrive at the same destination. In our KAI Partners Staff Profile series, we share interviews and insight from some of our own employees here at KAI Partners. Our staff brings a diversity in education, professional, and life experience, all of which demonstrate that the traditional route is not necessarily the one that must be traveled in order to achieve success.

Today, we bring you the journey of Ajay Bhat, Senior Data Architect, KAI Partners Inc. who works as Enterprise Data Architect for one of KAI Partners’ public sector clients. His role involves managing different Data Management activities and architecting solutions to meet the client’s needs.

KAI Partners, Inc.: How did you get into your line of work?

Ajay Bhat: My first job as GET (Graduate Engineer Trainee) was assisting in doing Business Process Reengineering and helping implement Enterprise Resource Planning (ERP). Though a Mechanical Engineer by background, my first job introduced me to various IT tools used for ERP implementation.  Over a period of time, I got trained in different ERP softwares.

KAI: Are there any certifications or trainings you’ve gone through that have helped in your career?

AB: Staying up with technology is something that I have always liked. I have completed certifications in Oracle, JAVA, and SAS. I did some self-learning courses in Big Data technologies and Data Science. I also went back to school to get my MBA in Business Intelligence from University of Colorado, Denver.

KAI: What is your favorite part about your line of work and why?

AB: Problem solving is my favorite part of my job. When I go to work, there is always an issue to resolve that involves some aspect of critical thinking. Using technology to implement solutions is another thing I like about my job.

KAI: What is one of the most common question you receive from clients and what counsel or advice do you give them?

AB: Depending on the project, the questions may vary, but most frequently I am asked how I am able to switch the roles on a project so fast. One day I may be a Database programmer, DBA another day, data Modeler, BI guy, or Data Architect some other day. Switching between roles is what I do frequently. My answer to this is that any role is a series of small logical steps. It may seem quite overwhelming from a distance, but if we break it down into a series of logical steps, it is doable. This directly applies to any problem solving I do in my day-to-day life as well.

Now that we’ve learned more about Ajay’s data architecture work, here’s a little more about him!

Quick Q&A with Ajay:

Daily, must-visit website(s):

Preferred genre of music or podcast to listen to: Classic jazz, Bollywood music

Best professional advice received: At the end of day it is just another day at work, do your best.

Book you can read over and over again: Autobiography of a Yogi by Paramahansa Yogananda

Most-recent binge-watched show: I don’t binge watch now, but did binge “24” a while ago

 About Ajay: Ajay currently supports a public sector client doing Data Management. Besides work, he loves outdoor activities, racquetball, running and a game of chess. He also practices meditation regularly.

What the KAI Partners Team is Thankful for in 2017

Communications, Data Management, KAI Partners, Organizational Change Management (OCM), Project Management, Project Management Professional (PMP), Prosci, Sacramento, Small Business, Training

From the KAI Partners team to yours, we wish you a happy, healthy, and stress-free Thanksgiving holiday.

Planning for Test Data Preparation as a Best Practice

Best Practices, Data Management, Project Management

By Paula Grose

After working in and managing testing efforts on and off for the past 18 years, I have identified a best practice that I use in my testing projects and I recommend it as a benefit to other testing projects, as well.

This best practice is test data preparation, which is the process of preparing the data to correlate to a particular test condition.

Oftentimes, preparing data for testing is a big effort that people underestimate and overlook. When you test the components of a new system, it’s not as simple as just identifying your test conditions and then executing the test—there are certain factors you should take into account as you prepare your test environment. This includes what existing processes, if any, are in place to allow for the identification or creation of test data that will match to a test condition.

A test case may consist of multiple test conditions. For each test condition, you must determine all the test data needs. This includes:

  • Input data
  • Reference data
  • Data needed from other systems to ensure synchronization between systems
  • Data needed to ensure each test will achieve its expected result

Planning for test data preparation can greatly reduce the time required to prepare the data. At the overall planning stage for testing, there are many assessments that should be conducted, including:

  • Type of testing that will be required
  • What testing tools are already available
  • Which testing tools may need to be acquired

If, at this point, there are no existing processes that allow for easy selection and manipulation of data, you should seek to put those processes in place. Most organizations have a data guru who is capable of putting processes in place for this effort—or at least can assist with the development of these processes.

The goal is to provide a mechanism that will allow the selection of data based on defined criteria. After you do this, you can perform an evaluation as to whether the existing data meets the need—or identify any changes that must be made. If changes are required, the process must facilitate these changes and provide for the loading/reloading of data once changes are made.

One word of caution concerning changing existing data: You must be certain that the existing data is not set up for another purpose. Otherwise, you may be stepping on someone else’s test condition and cause their tests to fail. If you don’t know for sure, it is always better to make a copy of the data before any changes are made.

About the Author: Paula Grose worked for the State of California for 33 years, beginning her work in IT as a Data Processing Technician and over time, performing all aspects of the Systems Development Life Cycle. I started in executing a nightly production process and progressed from there. As a consultant, Paula has performed IV&V and IPOC duties focusing on business processes, testing, interfaces, and data conversion. She currently leads the Data Management Team for one of KAI Partners’ government sector clients. In her spare time, she is an avid golfer and enjoys spending time with friends, and playing cards and games.

Civic & Gov Tech Showcase Event Recap

Community Service, Data Management, Government, Information Technology, Managing/Leadership, Project Management, Sacramento, Small Business, Technology

Photo Credit: Innovate Your State

By Guest Blogger Tony Oliver, Penny Wise Consulting Group

One of KAI Partners’ own partners, Tony Oliver of Penny Wise Consulting Group, recently attended the second annual Civic & Gov Tech Showcase in Sacramento. According to event sponsors Innovate Your State and the City of Sacramento Mayor’s Office for Innovation & Entrepreneurship, “The Civic & Gov Tech Showcase is an opportunity to connect civic minded entrepreneurs, government leaders and potential investors to showcase innovation and encourage collaboration and support of new technologies to improve government.” We asked Tony to share with us his thoughts on the event, so take it away, Tony!

It was an interesting event, with the first 30 minutes focusing on discussing the benefits of Sacramento for companies either looking to move from the Bay Area or seeking to open local offices.

After a 30-minute break, there were two hours of quick 8-minute presentations by startups with a focus on government or civic issues. The main thread weaving them together: These are the types of projects that once upon a time, the bigger tech outfits would (or would not) address, often only if million-dollar contracts were present. Now, however, these smaller outfits are offering the counties and cities an opportunity to fix big problems with their nimbler, web-enabled software.

A few of these startups caught my eye, including:

  • A.R.G.O Labs: More diverse than the others, its focus is on leveraging civic data science tools to deliver public services more efficiently
  • Caravan Studios: More of a container to solve problems; their presentation actually had a case: disseminating information to disadvantaged students from the nation’s 100k public schools on where to find free activities and meals during the summer
  • CityGrows: Streamlines and provides a transparent view of how permits, processes, etc. are offered by the cities
  • Civic CrowdAnalytics: Crunches civic data with Natural Language Processing, i.e., sentiment analysis
  • MeWe: Inspection software used to carve a big dent in the backlog of public sector inspections
  • Organizer: A volunteer-focused startup
  • Pinpoint Predictive: More of people analytics, similar to Civic Crow Analytics above
  • ShiftSpark: Citizen lobbying
  • SpeakEasy Political: Templates and assistance on direct marketing campaigns for issues
  • Support Pay: Combination of communication platform and payment and verification system for child support expenses and collaboration
  • Voter: Tinder-like, used to connect with like-minded voters

It sounds like this year’s Civic & Gov Tech Showcase featured some great startups that can help positively influence our government and political processes! KAI Partners thanks Tony for his thoughts on this event and will be looking at these companies—and others making their way to the Sacramento area—in the future.

About the Author: Tony Oliver is a project manager by trade, a marketing guru by profession, and a lifelong learner from birth. His best trait is an inquisitive mind, which drives his desire to understand not just the “what” but also the “how” and more importantly, the “why” and “why not?” Tony is experienced in supply, pricing, demand, and consumption analysis and holds an MBA in marketing from a top 20 school (UNC Chapel Hill) and an undergraduate English Literature degree from Georgetown University. With 15+ years of experience with Intel and Cisco, Tony is fully bilingual (English, Spanish) with a working knowledge of French, as well as a seasoned public speaker and instructor of Project Management and Presentation Skills courses.

« previous page · next page »