Effective Solutions Through Partnership

Category Archives: Internet of Things

Big Data and Hadoop: A high-level overview for the layperson

Big Data, Data Management, Information Technology, Internet of Things, Sacramento, Technology

By Sid Richardson, PMP, CSM

I have been in the data warehousing practice since 1994, when I implemented a successful Distributed Data Warehouse for a flagship banking product, followed by co-developing Oracle’s Data Warehouse Methodology. In August 1997, I was invited to speak at the Data Warehouse Institute Conference in Boston.

Over the years, I’ve researched and implemented what I would consider some small scale/junior Big Data systems. I have an interest in Big Data and wanted to share my learnings on Big Data and Hadoop as a high-level overview for the layperson / busy executive.

What is Big Data?

Big Data defines an IT approach used to process the enormous amounts of available information from social media, emails, log files, text, camera/video, sensors, website clickstreams, Radio Frequency Identification (RFID) tags, audio, and other sources of information in combination with existing computer files and database data.

In the 1990s, three major trends occurred to make up Big Data: “Big” Transaction Data, “Big” Interaction Data, and “Big” Data Processing.

In 2001, Big Data was defined by Doug Laney, former Vice President and Distinguished Analyst with the Gartner Chief Data Officer (CDO) research and advisory team. Mr. Laney defined Big Data by the “three Vs”:

    1. Velocity – Speed of incoming data feeds.
    2. Variety – Unstructured data, social media, documents, images.
    3. Volume – Large quantities of data.

IBM decided to add two more Vs:

    1. Veracity – Accuracy of the data.
    2. Value – To define Big Data.

Why do we need Big Data?

In a nutshell: We need Big Data because there is a lot of data to process, for example:

Also noted by The Economist, the abundance of data and tools to capture, process, and share all this information already exceeds the available storage space (and the number of eyes on the planet to review and analyze it all!)

According to Forbes’s 2018 article, “How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read,” there are 2.5 quintillion bytes of data created each day. And, over the last two years alone, 90 percent of the data in the world was generated.

Clearly, the creation of data is expanding at an astonishing pace—from the amount of data being produced to the way in which it’s re-structured for analysis and used. This trend presents enormous challenges, but it also presents incredible opportunities.

You’re probably thinking, alright, I get the big data thing, but why couldn’t data warehouses perform this role? Well, data warehouses are large, complex, and expensive projects that typically run approximately 12-18 month-long durations with high failure rates (The failure rate of data warehouses across all industries is high—Gartner once estimated that as many as 50 percent of data warehouse projects would have only limited acceptance or fail entirely).

A new approach to handle Big Data was born: Hadoop.

What is Hadoop?

In a nutshell, Hadoop is a Java-based framework governed by the Apache Software Foundation (ASF) that initially addressed the ‘Volume’ and ‘Variety’ aspects of Big Data and provided a distributed, fault-tolerant, batched data processing environment (one record at a time, but designed to scale to Petabyte-sized file processing).

Hadoop was created out of a need to substantially reduce the cost of storage for massive volumes of data for analysis and does so by emulating a distributed parallel processing environment by networking many cheap, existing commodity processors and storage together, rather than using dedicated hardware and storage solutions.

Why Hadoop?

The Challenges with Hadoop

There is a limited understanding about Hadoop across the IT industry. Hadoop has operational limitations and performance challenges—you need to resort to several extended components to make it work and to make it reliable. And, Hadoop is becoming more fragmented, pulled by different commercial players trying to leverage their own solutions.

In summary…

The Hadoop Framework addresses a number of previous challenges facing the processing of Big Data for analysis. The explosion in deployment of data capture devices across all industries world-wide necessitated a more cost-effective way to store and access the massive volumes of data accumulating by the second!

I hope this blog post has provided you with a better understanding of some key Big Data and Hadoop concepts and technologies. Have you worked with Big Data and/or Hadoop? Let us know your thoughts and experiences in the comments!

P.S. If you have gotten this far and are curious where the name Hadoop comes from, here you go! The name ‘Hadoop’ was coined by one of the sons of Doug Cutting, a software designer and advocate and creator of open-source search technology. Mr. Cutting’s son gave the name ‘Hadoop’ to his toy elephant and Mr. Cutting used the name for his open source project because it was easy to pronounce.

About the Author: Mr. Richardson’s passion is Data Warehousing, Business Intelligence, Master Data Management and Data Architectures. He has helped Fortune 500 companies in the US, Europe, Canada, and Australia lead large-scale corporate system and data initiatives and teams to success. His experience spans 30 years in the Information Technology space, specifically with experience in data warehousing, business intelligence, information management, data migrations, converged infrastructures and recently Big Data. Mr. Richardson’s industry experience includes: Finance and Banking, government, utilities, insurance, retail, manufacturing, telecommunications, healthcare, large-scale engineering and transportation sectors.

How we Can Promote Workforce Development in the Sacramento Region

Co-working, Corporate Training, Event Recap, Internet of Things, KAIP Academy, Learning, Professional Development, Sacramento, Technology, The WorkShop, Training, Workforce Development

By Terry Daffin

I recently had the opportunity to attend the Golden Sierra Workforce Tech Forum: Occupations & Skills in an Automated World, hosted by Valley Vision and Golden Sierra Workforce Board.

Valley Vision “…inspires leaders to think big and collaborate on bold, long-term solutions that improve people’s lives and Golden Sierra’s Workforce Board, “…is an industry-led board of directors who identify and solve problems within key economic sectors in the tri-county region (Placer, El Dorado and Alpine).”

As the Project Manager for KAI Partners’ KAIP Academy and the Community Manager for co-working and incubation space The WorkShop – Sacramento, I was especially interested in hearing firsthand what employers are looking for in terms of workforce development for their organization.

There were many great panelists at the forum, including Sean Moss, Senior Estimator and Project Manager for McGuire and Hester; Gordon Rogers, Project Principal of the Owen Group; Annette Smith-Dohring, Workforce Development Manager for Sutter Health; Bernadette Williams, CMI Operations Manager at VSP; and Joseph Taylor, Assistant Professor at CSU Sacramento.

Each panelist was asked to describe what they believe the biggest educational need is for graduating students entering the workforce. Here’s a sampling of what they said:

  • Gap in technical skilled labor—employees are either highly skilled/specialized or they have little technical skills
  • Up-skilling; providing skills training on the Internet of Things and Artificial Intelligence to an existing workforce
  • Critical thinking skills; under-preparedness upon graduation

I left the forum with the question, “What can we do to close these gaps?” As a training provider, it’s clear we need to help industry and education align their efforts so that the workforce can stay updated on new methods, software/programming languages, and other emerging skills.

Here are a few ways to stay on top of digitalization and close the skills gap:

  1. Industry and education leaders should seek out training programs that will prepare students for critical thinking, data and business analytics, problem solving, and soft skills necessary to enter the workforce and immediately become productive.
  2. Students should be encouraged to seek out internships in work-based learning opportunities (especially those that provide educational units for their participation).
  3. Employees should be encouraged to widen their professional development by taking certification courses (especially those that provide professional development units).

There is a lot we can do to close the skills gap and promote workforce development in our region. KAIP Academy is excited to offer training courses and programs for building up a more highly skilled Sacramento.

About the Author: Terry Daffin is an Executive Consultant within KAI Partners. He has worked in the IT industry for more than 30 years and has over 25 years of project management experience. As a public sector consultant in the health care industry, Mr. Daffin assisted in the development and implementation of Project Management Offices that include project management, service management, lean agile and traditional product development lifecycles, and governance processes. He has been an innovation advocate and evangelist for 15 years and has implemented innovative processes for projects that he has been engaged on since 2001. Mr. Daffin currently works as the Project Manager of the KAIP Academy, KAI Partners’ training division and is the Community Manager at KAI Partners’ new co-working space, The WorkShop – Sacramento, focused on connecting innovative start-ups and the public sector.

Get Ready for the Internet of Things

Information Technology, Internet of Things, Technology

By Jason Hardi

Now more than ever, there is a buzz surrounding the “Internet of Things.” What is the “Internet of Things”? It’s a fast-evolving planetary infrastructure upgrade that is capturing and analyzing more data than ever.

More specifically, the Internet of Things, or “IoT,” involves an infrastructure move towards increasing machine-to-machine communication (called M2M) built on cloud computing, IPv6, 5G networks, and billions of mobile, virtual, and instantaneous connections to computer “smart devices.”

Just how many devices can IPv6 handle? According to Wikipedia: The length of an IPv6 address is 128bits, compared with 32 bits in IPv4. Therefore, the address space has 2128 or approximately 3.4×1038 addresses. Doing the math, this translates to 340,282,366,920,938,463,463,374,607,431,768,211,456 potential devices, each with a unique IP address. Read aloud this number:

340 undecillion, 282 decillion, 366 nonillion, 920 octillion, 938 septillion, 463 sextillion, 463 quintillion, 374 quadrillion, 607 trillion, 431 billion, 768 million, 211 thousand and 456

We live in a word where “smart devices” are everywhere, and with the IoT, smart devices will become ever-present: Always there, always on, and always exchanging data 24/7/365.

While the foundation of the IoT focuses around “smart devices,” in reality, these are nothing more than sensors and reporting devices that report data to the cloud, which supports “Big Data” algorithms and research.

Cloud-based applications leverage the data and enable real-time meta-data analysis on everything from tracking where you are, your purchases, airplane data, and more—all of which comes from sensors enabled by the push to build the next generation 5G network. The cloud enables the sensors to capture data anytime, anywhere.

Real-time examples of how this emerging technology can save lives include adding “smart sensors” to bridges to measure real-time stress, weather-related issues such as ice, cracks, and movements that can predict failure. Such information has the tremendous real-time advantage of saving lives before failure ensues.

The IoT has allowed real-time software updates of the next generation of electric cars: The Telsa. Now, instead of taking your car in for service, it is automatically updated at night when you sleep—always evolving and always updating.

According to Fool.com, major corporations have been investing in IoT for years. Monsanto and other agriculture companies use IoT to make planting and harvesting food easier, faster, and more efficient by utilizing data from sensors on farm equipment and plants, satellite images, and weather tracking in order to increase food production.

General Electric (GE) is using IoT to help liquefied natural gas (LNG) plants decrease their downtime by pinpointing potential problem areas before they become major issues—a savings of up to $150 million a year, according to MIT Sloan Management Review (Sloan Review, Big Idea: Competing With Data & Analytics Research Highlight October 21, 2016).

With the IoT comes ubiquitous real-time monitoring that can improve our lives and help make the world a safer place. While the possibilities are endless, they do involve a degree of discernment over just how much monitoring should be allowed when looking at practical considerations involving privacy of our lives and the ethics related to big data tracking.

At KAI Partners, we specialized in these highly complex integrated projects, where cross-functional technologies provide leading solutions in the cloud, complex network architectures, and highly evolved leading edge architectures.

About the Author: Jason Hardi has been in the Information Technology field for over 25 years. Prior to that, he started his working life as a Marine Biologist. As a Marine Biologist, he saw the need to develop an early advanced statistical analysis program for biologists. The application, formerly called “Hyper Stats,” was subsequently marketed and sold at colleges across the country. Following this, Mr. Hardi entered the Information Technology field as a System Operator working in mainframe shops and has enjoyed advancing from entry-level positions up to Project Director and Advisor.