Big Data Cloud Storage Framework Solutions

Based upon a recent meeting of minds at MESS (Media Entertainment & Scientific Systems) http://www.meetup.com/MESS-LA the challenge facing the industries is how to deal with petabyte-sized amassed data that still needs to be accessible in real-time for secured editing purposes by downstream customers and suppliers.
Here’s a multi-phase solution idea:

big_data_arch_model
My idea for a secure big file delivery model. Figured by J. H. Lui (c) 2016

Using torrent technology for access, with authenticated peer-to-peer hosts. private SSL-encrypted trackers/announcers, and encrypted bit streams, this maintains access to the fundamental data source using minimal infrastructure.
Add two-factor authentication to the authentication protocol to allow time- and role-based security to be enforced (so-and-so p2p host is authorized to connect to the torrent during X days/N hours per day/etc.)
Use generic two-factor authentication providers (e.g. Symantec VIP or SAASPass) to allow the small service providers to access data without excessive overhead cost, or dedicated hardware solutions.
Store the data source files using a torrent+sharding+bit-slicing protocol (similar to the Facebook imaging storage model.) Without authenticated access to the cloud torrent, any individual data chunk or shard grabbed by a sniffer becomes useless.
Segregate and divide the data files using a role-based security architecture (e.g. Scene 1 needed by X post-production editor, during N time-period.) Individual torrent participants can select the individual virtual file segments they need for work, without downloading the data chunks unrelated to them. Similarly, the above described time+role based security prevents access to/from data segments that are not authorized for that endpoint. Could even add password-protection to individual sensitive segments to provide one more level of turn-key security.
Use a Google Drive/Dropbox style OS protocol to allow mounting of the torrent sources to the end-user workstations with transparent access.  Whichever mechanism can provide adequate latency for the block replication should be sufficient.  Rather than mounting the same cloud torrent to every local workstation, use local NFS servers to provide local home-basing of the cloud mount (WAN speed), then export that mount locally (LAN speed) to the various workstations that need access to it.  That way, there’s only one penetration point to/from the cloud torrent, which can be adequately firewalled locally by the end-user. This is a solution for the end consumers that need access to the largest portion of the cloud data set.
The source data hives can use multi-path networking protocol ( https://jhlui1.wordpress.com/2015/05/21/multi-path-multiplexed-network-protocol-tcpip-over-mmnp-redundant-connections ) to further split and sub-divide the data streams (which are already encrypted), to maximize performance to bandwidth-limited consumer endpoints.
Media companies have a rather different data value model to deal with because during pre-production the data value is extremely high, but it drops off rapidly post-production release once the market consumes it. But the same model at a lower protection level would work for actual distribution – wherein end subscribers are authenticated for access to a particular resolution or feature set of the original cloud segments (e.g. 8K versus 1K media, or audio-only, or with or without Special Features access.)

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s