references :

module 1 : Module 1.pptx - Google Drive

module 2 : Module 2.pdf - Google Drive

module 3 : yet to complete


Module 1

1. Compare structured and unstructured data

Aspect Structured Data Unstructured Data
Format Pre-defined schema (rows, columns), relational database tables No fixed format, cannot be organized in tables
Storage & Management Easier to store, manage with legacy solutions, requires less storage Challenging to store/manage, requires more storage
Data Type Examples Numbers, dates, strings, spreadsheets, CRM, sales, finance Images, audio, video, word documents, emails, social media, surveys, multimedia
Share of Enterprise Data About 20% (Gartner estimate) About 80% (Gartner estimate)
Processing Can use SQL and conventional analytics; query performance usually high Cannot use standard SQL, requires advanced tools for extraction and analysis
Flexibility Rigid structure: less flexible, more consistent Highly flexible, adaptable to any content or format
Business Use Transactional data, operational reports Sentiment analysis, customer interactions, multimedia archiving
Querying Easily queried Difficult to query, need special processing

Structured data is vital for business transactions and quick analytics, while unstructured data holds rich contextual information but is complex to extract, analyze, and leverage without specialized technologies and approaches (e.g., machine learning, NLP, computer vision).

Reference: Pages 2–6


2. Characteristics of unstructured data

Unstructured data displays several distinct, critical characteristics: