A retail chain’s IT manager logs every point-of-sale transaction, customer movement, inventory change, and supplier notification into separate systems. Servers fill up, backup windows stretch longer, yet a straightforward question at the weekly management meeting goes unanswered: which product category sells most, in which store, and at what time of day? The data exists; the answer does not. This situation exposes the most frequently overlooked gap in big data discussions — the distance between collecting data and using data stems less from technical capacity and more from a conceptual deficit.
Big data is commonly defined by three properties: volume, velocity, and variety. These properties describe what collected data looks like, not what should be collected or why. For managers, the more useful framework is a different set of three questions: what to collect, why to collect it, and how it will be used. Building any data infrastructure without working through these three questions in sequence produces a burden that grows harder to manage over time. Storage costs may fall, but the cost of making sense of data — staff hours, software licences, analysis time — does not.
‘What to collect?’ is not a technical cataloguing exercise; it is first and foremost a business question definition exercise. A manufacturing company that wants to improve customer satisfaction needs to identify which data serves that goal: delivery lead times, return rates, complaint records, perhaps post-sale service durations. Everything outside that scope — sensor readings from the production line, historical supplier pricing — generates only storage cost unless tied to a separate, clearly stated business question. Designing from purpose to data consistently outperforms guessing from data toward purpose.
‘Why collect it?’ clarifies the distinction between institutional memory and real-time decision support. Some data is retained to document the past: accounting records, contracts, personnel files. These serve legal obligations or audit requirements, and their retention policies are well defined. Other data is collected to support operational decisions: inventory levels, sales trends, customer segmentation. When these two categories blur together, archive management grows heavier and the data needed for timely decisions gets lost in unnecessary noise. Translating this distinction into a written data policy is the first concrete step any organisation should take before scaling up its data infrastructure.
‘How will it be used?’ is the question most often deferred and most expensive to ignore. Companies collect data and leave the analysis layer for later. When the use case is not defined upfront, the structure of the collected data frequently turns out to be incompatible with the analysis tool chosen later. A data model designed for sales reporting falls short when someone needs customer behaviour analysis. The cost of restructuring consistently exceeds the cost of designing correctly from the start. For this reason, ‘how will it be used?’ must be answered not only with the technical team but together with the managers who will actually make decisions based on the output — which report, at what frequency, for which decision.
In practice, the greatest obstacle is that the organisational maturity needed to answer all three questions simultaneously has not yet taken hold in most companies. The IT department builds the technical infrastructure, but the business units do not provide a clear requirements definition. The business units say they want more data without specifying which decisions it should support. This gap does not surface in meeting rooms; it shows up in real data projects — during implementation, in budget discussions, in the weeks when reports go unused. A mid-size company looking to raise its data management maturity needs not just an external consultant but an internal coordinator who owns these three questions and keeps them connected.
For SME managers, the practical decision criterion is straightforward: before committing to any data collection investment, write a single-sentence business question for each data source. If the answer to ‘why are we collecting this?’ is ‘it might be useful someday’ or ‘our competitors are doing it too,’ deferring that data source protects both the budget and the organisation’s analytical capacity. Big data demands big clarity before it demands big volume.
This article was originally written in Turkish by Gökhan MERCANOĞLU on March 14, 2011 and has been automatically translated into English and other languages using machine translation.