Gartner’s latest Market Guide on File Analysis (G00271713 for those with subscription) has arrived. As a file analysis vendor, the fact that analysts are investing effort in this part of the market is obviously good news for us but in this case I cannot help but feel somewhat disappointed by what Gartner has produced. I confess that I haven’t been able to review comparable work by other analysts but, let’s face it, Gartner are the big kids on the block as so I think they can stand some constructive critique for producing a document that seems to fall short of actually helping buyers choose.

Whether or not analysts cover an area is of course governed by the demands of their customers’ enquiries – when enough people express an interest, research papers will follow. Gartner’s first work focused on file analysis was done last year and so, with two year’s thinking under their belts, it seems we might expect some more insight this time around with the aim of helping their customers make better buying decisions. Unfortunately, this seems not to be the case and after discussions with customers who are looking for help in this area I thought I’d take the opportunity to provide my thoughts on how organizations might approach selecting file analysis products.

Personal biases aside, what my colleagues and I see is increasing noise around file analysis and that customers find it incredibly difficult to focus their market research. Time and again, however, it looks to me that buyers requirements fall into one of three high-level camps:

  • First, IT-led storage management. Storage management and rationalization, access and entitlement management, migration planning and migration.
  • Second, Legal and Compliance. Litigation or regulatory discovery, compliance audit.
  • Third, Business Value. Critical/valuable data identification, metadata tagging and content value enhancement.

In a relatively new capability space I guess its not surprising that so many vendors are messaging file analysis capabilities (I think Gartner tracks more than 50); however, its very hard to find any that do a decent job of all of the above. In fact, most only practically address one. That means that the first thing buyers can really help themselves with is by deciding which of the above they are focused on. Sounds simple? It should be but with such a wide range of potential use cases to choose from both vendors and buyers find it hard to focus in this way. In a recent customer meeting I found myself having to work across all of these areas, addressing very different stakeholders and needs in each case. At the end the customer commented on how difficult it was proving to reach a vendor short-list with so many messaging their file analysis capabilities.

With top-level focus set, the critical question is the most difficult – how to map vendors against your focus? The best vendors will be transparent about what they do well; others (intentionally or otherwise) may be less helpful, especially if they are trying to shift their focus from their origins into a more attractive proposition. At this point for the sake of keeping this post neutral, I will intentionally not describe our focus. Instead, I’ll describe the characteristics that should help you match vendors yourself:

  • IT-led storage management. Everyone wants to be bigger and faster, but these guys really major on scale and index performance and should have real references to prove it. SharePoint excepted, their connectors are likely not to include any ECM platform but will focus on storage and archive solutions – maybe some tape – and, these days, perhaps cloud storage too. Their reporting will focus on file property metadata and access control and if they classify files, it will likely be limited to sensitive data types or storage and access management use cases and will not easily support classification for use cases such as record identification.
  • Legal and compliance. Often with roots in the ediscovery space, these vendors have lots of functionality for both file metadata and file contents analysis and have sophisticated support for the structured full text review of culled or responsive content. Their strength is in the ability to discover and virtually cull data sets before forensically coping/collecting the culled subset of data for review in a specialist tool or off-line environment. Reporting may struggle to usefully enable visualization and exploration of large datasets; further, taking action on content in place may not be well supported.
  • Business value. Flexible classification and content analysis is key for deriving the metadata required to tag and enhance content value for business use cases. Vendors should support extensible metadata sets with hierarchical classification which can be mapped to taxonomies or business classification schemes and, ideally, metadata rules should be edited through a graphical user interface rather than manipulation of XML files. The solution should allow reporting on any aspect of derived metadata and provide the ability to write metadata patterns and values to files in place or push it on migration to connected repositories.

These characteristics are generalized and you will find products that cut across each area or introduce a different spin. The point is for a useful down select in a noisy market these reflect capabilities we think typify different vendor approaches. A rule of thumb when looking at the market space might be as follows: scale and file/access properties = IT, full text analysis and review = legal, flexible metadata and classification = business value.

In many cases, a product that is best fit in one area might stretch to another – however, to make it do so could require extra investment in skilled staff or consultants. Hopefully the points I describe in each case should help you down-select and more quickly get to the short-list of those who can really help you.

Comment on Linkedin