Home
Schedule
Conference Info
Sponsorship Information
IBM Watson AI Day
Registration
Press Registration
Speakers
Sessions
Sponsors
Exhibitors
JETRO × Six Prefectures of Japan Pavilion Exhibitors
  Media Sponsors
  Topics
  Call For Papers
  Hotel Info
  Past Events
Untitled Document
2017 West
Premium Sponsors
Diamond



Platinum
@DevOpsSummit

Bronze










Untitled Document
2017 West
Keynote Sponsor


Untitled Document
2017 West Exhibitors
























@ThingsExpo











Untitled Document
2017 West Media Sponsors














Untitled Document
2017 East
Premium Sponsors
Diamond



Platinum
@DevOpsSummit

@DevOpsSummit

Silver
@DevOpsSummit


Bronze










Untitled Document
2017 East Exhibitors
@DevOpsSummit




































Untitled Document
2017 East Media Sponsors
















Untitled Document
2016 West
Premium Sponsors
Platinum Plus



Silver
@ThingsExpo

Bronze







Untitled Document
2016 Welcome Reception Sponsor

Untitled Document
2016 West Exhibitors










@DevOps Summit






@DevOps Summit

@WebRTC Summit












@WebRTC Summit









@DevOps Summit

Untitled Document
2016 West Media Sponsors











Untitled Document
2016 East Gold Sponsors

@ThingsExpo

Untitled Document
2016 East Silver Sponsors


@DevOps Summit

Untitled Document
2016 East Bronze Sponsors

Cloud Expo







Cloud Expo

Untitled Document
2016 East Vendor Presentation Sponsors

@DevOps Summit

Untitled Document
2016 East Exhibitors

@DevOps Summit





@ThingsExpo



@DevOps Summit

@ThingsExpo


@DevOps Summit









@DevOps Summit







@DevOps Summit










Untitled Document
2016 East Media Sponsors










Untitled Document
2015 West Gold Sponsors

Untitled Document
2015 West Silver Sponsor


Untitled Document
2015 West Bronze Sponsors

Cloud Expo |@ThingsExpo

Cloud Expo | DevOps Summit


@ThingsExpo





@DevOps Summit

@ThingsExpo


@ThingsExpo

 


Untitled Document
2015 West Exhibitors












@DevOps Summit





@DevOps Summit












@DevOps Summit

@DevOps Summit




@ThingsExpo


@DevOps Summit

 


Untitled Document
2015 West E-Bulletin Sponsors

DevOps Summit

Untitled Document
2015 West
Associate Sponsor

Untitled Document
2015 West Media Sponsor

Untitled Document
2015 East Gold Sponsors


WebRTC Summit

DevOps Summit

Untitled Document
2015 East Silver Sponsors
DevOps Summit
WebRTC Summit

Untitled Document
2015 East Bronze Sponsors

DevOps Summit

Cloud Expo | DevOps Summit
@ThingsExpo

DevOps Summit

DevOps Summit

Untitled Document
2015 East Delegate Bag Sponsors


Untitled Document
2015 East Exhibitors

DevOps Summit


@ThingsExpo



DevOps Summit






Cloud Expo | @ThingsExpo
Internet of @ThingsExpo
@ThingsExpo
DevOps Summit

DevOps Summit
@ThingsExpo
DevOps Summit
DevOps Summit
DevOps Summit
DevOps Summit
DevOps Summit



@ThingsExpo

Untitled Document
2015 East Associate Sponsor

Untitled Document
2015 East
Media Sponsors

Data Lake Plumbers | @BigDataExpo @Schmarzo #BigData #IoT #AI #ML #DL
The data lake is ideal for your data science team as it liberates them from the constraints & limitations of the data warehouse

Many of my blogs promote the business benefits of the data lake, both from a “save me more money” as well as the “make me more money” perspectives. But I fear that I’m making this thing called the data lake sound like a “silver bullet" [1] – just drop the data into the data lake and everything magically works. But much like in the world of data warehousing, there is significant work that needs to be done under the covers – in areas such as metadata management, data governance and security – to ensure that the data lake will perform for a business in a production environment. Many of the processes and techniques we learned in the data warehouse will benefit us here, though there are many new tools to be aware of that can help us in the operationalization task.

I’ve asked an industry expert in metadata management and data governance, Joe DosSantos (follow Joe on twitter: @JoeDosSantos) to co-author this blog with me. Well, to be honest, this mostly reflects Joe’s experience and thinking; I just wanted to get credit for being smart enough to know when to bring someone smarter than me into the conversation!

Data Lake Benefits
You know from previous blogs that there are many benefits to the data lake including:

  • Capture data from wide range of traditional (operational, transactional) and new sources (structured and unstructured) as-is
  • Store all your data in one environment for cross-functional business analysis
  • Support the analytics and data science to uncover new customer, product, and operational insights
  • Empower front-line employees and managers, and drive a more profitable customer engagement leveraging customer, product and operational insights
  • Integrate analytic insights into operational (Finance, Manufacturing, Marketing, Sales Force, Procurement, Logistics) and management systems (Business Intelligence reports and dashboards)

The data lake is ideal for your data science team in that it liberates them from the constraints and limitations of the data warehouse, enabling the data science team to quickly ingest, test and determine if there is any value to different data sets and analytic techniques without having to go through the rigorous operational procedures that govern the data warehouse.

However, this liberty can be quite terrifying in highly regulated environments. Companies have spent years developing governance and stewardship organizations specifically to control patient information, personal contact information, account balances, and other sensitive information. The description above seems to undo all of this work by creating free and easy access to data that should be locked down.

This is why the controls of a data lake need to be very clear. Data that is onboarded into a lake must go through a rigorous set of operational procedures to manage and govern that data set to make sure that it is appropriately tagged and protected, and then provisioned only to people who have the proper authorization. Modern data tools allow for this kind of governance capability to balance the quick and easy access to data that a data scientist needs with the security that good practices (and often the government) demand.

Operationalizing the Data Lake
Operationalizing the data lake requires several non-obvious disciplines, many of which we learned from our data warehouse experiences. These disciplines include data ingestion, indexing, cataloging, metadata management, data governance and security [2].

  1. As with a data warehouse, you will need a method to bring data into your environment. As batch windows became longer and longer in the data warehouse world and business users clamored for increasingly up-to-date information, practitioners began shifting from conventional data ETL (Extract, Transform, Load) to lower latency streaming and micro-batch. This trend was extended in the big data universe with tools like Kafka, a streaming message bus, and with Spring and Sqoop to accelerate data ingest. In the big data world, you might also want to ingest unstructured data sets as well, introducing new tools like Flume. Finally, you may want to trigger complex events based on this data stream and you might do so via Spark, Gemfire, or other in-memory grids. And just to make it more complex, you will likely use several of these tools in combination depending on your data feed needs. Keep in mind that in the world of ELT (Extract, Load, Transform) (note that the order differs from E-T-L), most of these data movements are data dumps. At this point, you have simply collected lots of raw data. It’s now time to make sense of it.
  2. Next, it is useful to tag files that you have ingested. What kind of file is this? What would be useful to know about it so that I could search for it later? Zaloni Bedrock is an example of a tool to apply metadata tags to the files, which is useful for both structured and unstructured data sets.
  3. We mentioned above that one of the key requirements of our data lake is having control over who can have access to specific data sources. Generally speaking, the data loaded in steps 1 and 2 is what we call “Bronze” data, meaning that it is good enough for the business process that created it. Data in these sets will likely be sensitive and your security should reflect it. However, we need to determine rules for how the data should be modified, obfuscated, or deleted in order to make it consumable for broader audience, or what we might call “Silver” status. You need to create business rules to manage data (e.g. birthdays should be masked and social security numbers should be stored as only the last 4). Collibra is an example of a tool for this rules definition and management. It allows data rules to be set up based on logical business entities by business people rather than technologists.
  4. For those people who are familiar with governance concepts, you will recognize the difference between a policy and a control. A policy is like a speed limit sign along the highway. The control is the police officer that pulls you over if you are driving over that speed limit. Data works the same way. While Collibra establishes the policy, you need to create a method for enforcing that policy. To do this, you need to find the logical entities buried in the data (i.e. “oh look, I found a social security number!”). Examples of such products include Global IDs for scanning structured data sets with the operational systems and Waterline for scanning data inside of Hadoop.
  5. Once you have found the data that you want, you want to implement the rules. For this, there is an open source tool called Atlas that contains an orchestration capability called Falcon that helps implement the rules.
    1. Apache Atlas is a scalable and extensible set of core foundational governance services that enables enterprises to effectively and efficiently meet their compliance requirements within Hadoop and allows integration with the complete enterprise data ecosystem.
    2. Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon uses those definitions to auto-generate workflows in Apache Oozie
  6. Now that the data is loaded, you will want to enforce security through your LDAP capability or possibly through Kerberos. There are also tools like Blue Talon that simplify the ability to authorize, provision, protect, enforce and audit data security policies across your data lake.
  7. Finally, audit controls are critical. Cloudera introduced Navigator specifically to allow simple transparency to data history and lineage. Hortonworks will rely on Atlas to provide this capability.

Data that has gone through the above processes creates a view and accessibility of the data that can be made available to a wide set of users – both business analysts and data science teams.

Summary
When you build a house, the vast majority of the creative work is in the features and curbside appeal. That’s the fun part. But without the underlying plumbing, the house would quickly degrade into a money pit.

Consider the metaphor of a retail store: stocking the shelves vs. purchasing goods. When you go to the store, you don’t care about how the goods got there, but the rules for accessing the goods are everywhere. Cigarettes are behind the front desk. Pharmaceuticals must be dispensed with a prescription. Razor blades are under lock and key (for some strange reason). There are policies and enforcements on stocking the shelves so that the shopping experience is clear and easy.

To successfully operationalize the data lake, organizations need to address all of the plumbing requirements outlined in this blog that enable the business users and data science teams to have confidence in the wealth of data that the organization is amassing. The data lake plumbing processes may not be very glamorous, but without them, you’ll end up with a stinky data dump instead of a glorious data lake.

References:

  1. A “silver bullet” is a simple and seemingly magical solution to a complicated problem.
  2. While I mention several tools, this blog is not meant to be an endorsement of these tools nor is this intended to be a comprehensive list of such tools. However, many of these tools are the same tools that we use in our data lake implementations at EMC.

Data Lake Plumbers: Operationalizing the Data Lake
Bill Schmarzo

About William Schmarzo
Bill Schmarzo, author of “Big Data: Understanding How Data Powers Big Business” and “Big Data MBA: Driving Business Strategies with Data Science”, is responsible for setting strategy and defining the Big Data service offerings for Hitachi Vantara as CTO, IoT and Analytics.

Previously, as a CTO within Dell EMC’s 2,000+ person consulting organization, he works with organizations to identify where and how to start their big data journeys. He’s written white papers, is an avid blogger and is a frequent speaker on the use of Big Data and data science to power an organization’s key business initiatives. He is a University of San Francisco School of Management (SOM) Executive Fellow where he teaches the “Big Data MBA” course. Bill also just completed a research paper on “Determining The Economic Value of Data”. Onalytica recently ranked Bill as #4 Big Data Influencer worldwide.

Bill has over three decades of experience in data warehousing, BI and analytics. Bill authored the Vision Workshop methodology that links an organization’s strategic business initiatives with their supporting data and analytic requirements. Bill serves on the City of San Jose’s Technology Innovation Board, and on the faculties of The Data Warehouse Institute and Strata.

Previously, Bill was vice president of Analytics at Yahoo where he was responsible for the development of Yahoo’s Advertiser and Website analytics products, including the delivery of “actionable insights” through a holistic user experience. Before that, Bill oversaw the Analytic Applications business unit at Business Objects, including the development, marketing and sales of their industry-defining analytic applications.

Bill holds a Masters Business Administration from University of Iowa and a Bachelor of Science degree in Mathematics, Computer Science and Business Administration from Coe College.

Presentation Slides
Enterprise architects are increasingly adopting multi-cloud strategies as they seek to utilize existing data center assets, leverage the adv...
"We host and fully manage cloud data services, whether we store, the data, move the data, or run analytics on the data," stated Kamal Shanna...

Register and Save!
Save $405
on your “Golden Pass”!
before October 30, 2017!
Call 201.802.3020


Santa Clara Call for Papers Open
Submit
submit your speaking proposal
for the upcoming WebRTC Summit in
Santa Clara!
[Oct 31- Nov 2, 2017]


WebRTC Summit 2017 West
Sponsorship Opportunities
Please Call
201.802.3021
events (at) sys-con.com
Sponsorship opportunities are now open for WebRTC Summit 2017 Santa Clara, Oct 31-Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, and for WebRTC Summit 2018 New York, June 5-7, 2018, at the Javits Center in New York, NY. For sponsorship, exhibit opportunities and show prospectus, please contact Carmen Gonzalez, carmen (at) sys-con.com.



WebRTC Summit Silicon Valley All-Star Speakers Include

MATTHIEU
Octoblu

MAHADEV
Cisco

MCCARTHY
Bsquare

FELICIANO
AMDG

PAUL
VenueNext

SMITH
Eviot

BEAMER
goTraverse

GETTENS
goTraverse

CHAMBLISS
ReadyTalk

HERBERTS
Cityzen Data

REITBAUER
Dynatrace

WILLIAM-
SON

Cloud
Computing

SCHMARZO
EMC

WOOD
VeloCloud

WALLGREN
Electric Cloud

VARAN-
NATH

GE

SRIDHARA-
BALAN

Pulzze

METRIC
Linux

MONTES
Iced

ARIOLA
Parasoft

HOLT
Daitan

CUNNING-
HAM

ReadyTalk

BEDRO-
SIAN

Cypress

NAMIE
Cisco

NAKA-
GAWA

Transparent
Cloud

SHIBATA
Transparent
Cloud

BOYD
Neo4j

WARD
DWE

MILLER
Covisint

EVAVOLD
Covisint

MEINER
Oracle

MEEHAN
Esri

WITECK
Citrix

LIANG
Rancher Labs

BUTLER
Tego

ROWE
IBM Cloud

SKILLERN
Intel

SMITH
Numerex
WebRTC Summit New York All-Star Speakers Include

CLELAND
HGST

VASILIOU
Catchpoint

WALLGREN
Electric Cloud

HINCH-
CLIFFE

7Summits

DE SOUZA
Cisco

RANDALL
Gartner

ARM-
STRONG

AppNeta

SMALL-
TREE

Cazena

MCCARTHY
Bsquare

DELOACH
Infobright

QUINT
Ontegrity

MALAU-
CHLAN

Buddy Platform

PALIOTTA
Vector

MITRA
Cognizant

KOCHER
Grey Heron

PAPDO
POULOS

Cloud9

HARLAN
Two Bulls

GOLO
SHUBIN

Bit6

PROIETTI
Location
Smart

MARTIN
nfrastructure

MOULINE
Everbridge

MARSH
Blue Pillar

PARKS
SecureRF

PEROTTI
Plantronics

HOFFMAN
EastBanc

WATSON
Trendalyze

BENSON-
OFF

Unigma

SHAN
CTS

MATTELA
Redpine

GILLEN
Spark
Coginition

SOLT
Netvibes

BERN-
ARDO

GE Digital

ROMAN-
SKY

TrustPoint

BEAMER
GoTransverse

LESTER
LogMeIn

PONO
-MAREVA

Google

SINGH
Sencha

CALKINS
Amadeus

KLEIN
Rachio

HOASIN
Aeris

SARKARIA
PHEMI

SPROULE
Metavine

SNELL
Intel

LEVINE
CytexOne

ALLEN
Freewave

MCCAL-
LUM

Falconstor

HYEDT
Seamless

WebRTC Summit Silicon Valley All-Star Speakers Include

SCHULZ
Luxoft

TAM-
BURINI

Autodesk

MCCARTHY
Bsquare

THURAI
SaneIoT

TURNER
Cloudian

ENDO
Intrepid

NAKAGAWA
Transparent

SHIBATA
Transparent

LEVANT-LEVI
testRTC

VARAN NATH
GE

COOPER
M2Mi

SENAY
Teletax

SKEEN
Vitria

KOCHER
Grey Heron

GREENE
PubNub

MAGUIRE
HP

MATTHIEU
Octoblu

STEINER-
JOVIC

AweSense

LYNN
AgilData

HEDGES
Cloudata

DUFOUR
Webroot

ROBERTS
Platform

JONES
Deep

PFEIFFER
NICTA

NIELSEN
Redis

PAOLAL-
ANTORIO

DataArchon

KAHN
Solgenia

LOPEZ
Kurento

KIM
MapR

BROMHEAD
Instaclustr

LEVINE
CytexOne

BONIFAZI
Solgenia

GORBA-
CHEV

Intelligent
Systems

THYKAT-
TIL

Navisite

TRELOAR
Bebaio

SIVARAMA-
KRISHNAN

Red Hat
Cloud Expo New York All-Star Speakers Included

DE SOUZA
Cisco

POTTER
SafeLogic

ROBINSON
CompTIA

WARUSA
-WITHANA

WSO2 Inc

MEINER
Oracle

CHOU
Microsoft

HARRISON
Tufin

BRUNOZZI
VMware

KIM
MapR

KANE
Dyn

SICULAR
Basho

TURNER
Cloudian

KUMAR
Liaison

ADAMIAK
Liaison

KHAN
Solgenia

BONIFAZI
Solgenia

SUSSMAN
Coalfire

ISAACSON
RMS

LYNN
CodeFutures

HEABERLIN
Windstream

RAMA
MURTHY

Virtusa

BOSTOCK
IndependenceIT

DE MENO
CommVault

GRILLI
Adobe

WILLIAMS
Rancher Labs

CRISWELL
Alert Logic

COTY
Alert Logic

JACOBS
SingleHop

MARAVEI
Cisco

JACKSON
Softlayer

SINGH
IBM

HAZARD
Softlayer

GALLO
Softlayer

TAMASKAR
GENBAND

SUBRA
-MANIAN

Emcien

LEVESQUE
Windstream

IVANOV
StorPool

BLOOM-
BERG

Intellyx

BUDHANI
Soha

HATHAWAY
IBM Watson

TOLL
ProfitBricks

LANDRY
Microsoft

BEARFIELD
Blue Box

HERITAGE
Akana

PILUSO
SIASMSP

HOLT
IBM Cloudant

SHAN
CTS

PICCIN-
INNI

EMC

BRON-
GERSMA

Modulus

PAIGE
CenturyLink

SABHIKHI
Cognitive Scale

MILLS
Green House Data

KATZEN
CenturyLink

SLOPER
CenturyLink

SRINIVAS
EMC

TALREJA
Cisco

GORBACHEV
Systems Services Inc.

COLLISON
Apcera

PRABHU
OpenCrowd

LYNN
CodeFutures

SWARTZ
Ericsson

MOSHENKO
CoreOS

BERMING-
HAM

SIOS

WILLIS
Stateless Networks

MURPHY
Gridstore

KHABE
Vicom

NIKOLOV
GetClouder

DIETZE
Windstream

DALRY-
MPLE

EnterpriseDB

MAZZUCCO
TierPoint

RIVERA
WHOA.com

HERITAGE
Akana

SEYMOUR
6fusion

GIANNETTO
Author

CARTER
IBM

ROGERS
Virtustream
Cloud Expo Silicon Valley All-Star Speakers

TESAR
Microsoft

MICKOS
HP

BHARGAVA
Intel

RILEY
Riverbed

DEVINE
IBM

ISAACSON
CodeFutures

LYNN
HP

HINKLE
Citrix

KHAN
Solgenia

SINGH
Bigdata

BEACH
SendGrid

BOSTOCK
IndependenceIT

DE SOUZA
Cisco

PATTATHIL
Harbinger

O'BRIEN
Aria Systems

BONIFAZI
Solgenia

BIANCO
Solgenia

PROCTOR
NuoDB

DUGGAL
EnterpriseWeb

TEGETHOFF
Appcore

BRUNOZZI
VMware

HICKENS
Parasoft

KLEBANOV
Cisco

PETERS
Esri

GOLDBERG
Vormetric

CUMBER-
LAND

Dimension

ROSENDAHL
Quantum

LOOMIS
Cloudant

BRUNO
StackIQ

HANNON
SoftLayer

JACKSON
SoftLayer

HOCH
Virtustream

KAPADIA
Seagate

PAQUIN
OnLive

TSAI
Innodisk

BARRALL
Connected Data

SHIAH
AgilePoint

SEGIL
Verizon

PODURI
Citrix

COWIE
Dyn

RITTEN-
HOUSE

Cisco

FALLOWS
Kaazing

THYKATTIL
TimeWarner

LEIDUCK
SAP

LYNN
HP

WAGSTAFF
BSQUARE

POLLACK
AOL

KAMARAJU
Vormetric

BARRY
Catbird

MENDEN-
HALL

SUPERNAP

SHAN
KEANE

PLESE
Verizon

BARNUM
Voxox

TURNER
Cloudian

CALDERON
Advanced Systems

AGARWAL
SOA Software

LEE
Quantum

OBEROI
Concurrent, Inc.

HATEM
Verizon

GALEY
Autodesk

CAUTHRON
NIMBOXX

BARSOUM
IBM

GORDON
1Plug

LEWIS
Verizon

YEO
OrionVM

NAKAGAWA
Transparent Cloud Computing

SHIBATA
Transparent Cloud Computing

NATH
GE

GOKCEN
GE

STOICA
Databricks

TANKEL
Pivotal Software


Testimonials
This week I had the pleasure of delivering the opening keynote at Cloud Expo New York. It was amazing to be back in the great city of New York with thousands of cloud enthusiasts eager to learn about the next step on their journey to embracing a cloud-first worldl."
@SteveMar_Msft
General Manager of Window Azure
 
How does Cloud Expo do it every year? Another INCREDIBLE show - our heads are spinning - so fun and informative."
@SOASoftwareInc
 
Thank you @ThingsExpo for such a great event. All of the people we met over the past three days makes us confident IoT has a bright future."
Yasser Khan
CEO of @Cnnct2me
 
One of the best conferences we have attended in a while. Great job, Cloud Expo team! Keep it going."

@Peak_Ten


Who Should Attend?
Senior Technologists including CIOs, CTOs & Vps of Technology, Chief Systems Engineers, IT Directors and Managers, Network and Storage Managers, Enterprise Architects, Communications and Networking Specialists, Directors of Infrastructure.

Business Executives including CEOs, CMOs, & CIOs , Presidents & SVPs, Directors of Business Development , Directors of IT Operations, Product and Purchasing Managers, IT Managers.

Download Cloud Expo Show Guide
Cloud Expo Show Guide
Download PDF

Join Us as a Media Partner - Together We Can Rock the IT World!
SYS-CON Media has a flourishing Media Partner program in which mutually beneficial promotion and benefits are arranged between our own leading Enterprise IT portals and events and those of our partners.

If you would like to participate, please provide us with details of your website/s and event/s or your organization and please include basic audience demographics as well as relevant metrics such as ave. page views per month.

To get involved, email Patricia Henderson at patricia@sys-con.com.

Digital Transformation Blogs
CloudEXPO New York 2018, colocated with DXWorldEXPO New York 2018 will be held November 11-13, 2018, in New York City and will bring together Cloud Computing, FinTech and Blockchain, Digital Transformation, Big Data, Internet of Things, DevOps, AI, Machine Learning and WebRTC to one location.
DXWorldEXPO | CloudEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
Disruption, Innovation, Artificial Intelligence and Machine Learning, Leadership and Management hear these words all day every day... lofty goals but how do we make it real? Add to that, that simply put, people don't like change. But what if we could implement and utilize these enterprise tools in a fast and "Non-Disruptive" way, enabling us to glean insights about our business, identify and reduce exposure, risk and liability, and secure business continuity?