Posted on: November 24, 2020 | 2 min read

Unstructured Data Vs Structured Data – What’s Needed in Your Data Science Initiative?

Data can come in many forms, which will influence infrastructure storage. An effective data environment should be easily accessible and centralize data from across your organization's different applications and source systems.

There are two main categories of data, structured and unstructured.

Structured data is akin to what can be seen in a database table or an excel sheet. Structured data is easy to manipulate, is often stored in databases, and accessed using SQL queries. A common infrastructure approach for structured data is a data warehouse, a collection of SQL tables with data collected and standardized from across multiple source systems. This approach allows data scientists to easily find and query the data they need, regardless of where the business originated.

 

(Mina Nacheva, 2018)

Unstructured data is more varied and can take the form of pictures, video, documents, audio, and others. These data types cannot be easily stored within a SQL table, making a data warehouse a less-than-ideal solution for storage. Cloud providers have developed several infrastructure solutions for storing unstructured data, including blob storage and data lakes. These offerings allow for data of any type to be stored and accessed within file structures of varying degrees of organization and rigidity.

As the amount of data being collected has drastically increased over recent years, the concept of "big data" has arisen along with infrastructure solutions to accommodate it. Distributed file systems such as HDFS allow for massive amounts of data to be stored across multiple locations, which can then be processed using frameworks such as MapReduce or Spark.

Written by CCG, an organization in Tampa, Florida, that helps companies become more insights-driven, solve complex challenges and accelerate growth through industry-specific data and analytics solutions.

Topic(s): Data & AI
Return to Blog Home