AWS re:Invent 2022 – Swami Sivasubramanian and Data in the Organization
With the amount of data organizations collect paired with tools to store, analyze, and make predictions from that data, it’s tempting to think of data functioning in an organization as it would in the brain. That’s the goal in many ways, but we aren’t quite there yet, says AWS VP of Data and Machine Learning Dr. Swami Sivasubramanian.
In the second full day of AWS re:Invent, Dr. Sivasubramanian talked about the core components of a modern data strategy and how AWS is working to support organizations in those areas. Here are the highlights from his keynote on November 30, 2022.
Driving innovation with cues from neuroscience
It’s not uncommon for data scientists to take inspiration from neuroscience, and Dr. Sivasubramanian jumped right into sharing his vision for how organizations can utilize data more in the way that a human brain does.
In order for this to happen, Sivasubramanian says data should:
- Be centralized
- Be processed automatically
- Flow naturally
- Be easy to visualize
These four ingredients form the recipe for the creative spark that has driven so much innovation throughout human history according to Sivasubramanian, and it’s how AWS envisions its data products facilitating invention.
How to build a modern data strategy
But to realize that vision of organization as neural network, three things need to happen:
- Organizations must build a future-proof technical foundation with core data services
- They must weave connective tissue across the organization
- They must invest in democratizing data with the right tools and services
On that first point, Sivasubramanian says a data foundation needs scalable tools for every workload in order to remove heavy lifting. He noted that 94 percent of AWS customers use over ten different database and analytics tools, taking advantage of the extensive catalog of database, analytics, and machine learning products AWS offers.
Within the AWS ecosystem, customers can store data, query it, gain insights from it using AI and ML, and govern and catalog that data with tools like Lake Formation and the newly announced DataZone. Governance got a lot of attention in this talk, especially in the ways that it can not only increase security but also break down silos across teams to make data insights more available.
The rest of Dr. Sivasubramanian’s talk expanded on themes introduced in CEO Adam Selipsky’s keynote address on Tuesday, like increased interoperability among AWS products, enhanced security, and greater accessibility of products. This of course entailed a slew of new product and feature announcements.
New products and features for AWS data products
If you attend re:Invent, there’s one phrase you’ll quickly get used to hearing: “Today, I’m pleased to announce…” And yesterday, Swami Sivasubramanian got to repeat it at least a couple times.
Here’s what followed those ellipses:
- New product – Amazon DataZone. While Adam Selipsky technically broke the news of this new product on Tuesday, Sivasubramanian expanded on it, bringing out Shikha Verma, Head of Product for Amazon DataZone, to share a brief demo with the audience. Based on the demo, DataZone makes it much easier to bridge the gap between data producers and data consumers.
- New product – Amazon Glue Data Quality. A new tool that reduces or eliminates the need for manual rule creation when it comes to maintaining data quality. The tool generates automatic data quality rules and claims to reduce manual efforts from days to hours.
- New feature – Amazon Athena for Apache Spark. Now Athena users can run interactive Apache Spark workloads in less than a second, per Sivasubramanian. Athena for Apache Spark integrates with ML and BI tools like SageMaker and EMR, and it requires no infrastructure management with an only-pay-for-what-you-use pricing model.
- New feature – Amazon Redshift for Apache Spark. Again, this new integration was announced on Tuesday, but Sivasubramanian provided more details, claiming that users can run Spark on Redshift data up to ten times faster compared to other data warehousing tools. With the addition of Spark support for Redshift and Athena, AWS wants to be the “…best place to run Apache Spark on the cloud,” said Sivasubramanian.
- New feature – Amazon DocumentDB Elastic Clusters. A feature update for DocumentDB offering a fully-managed solution for scaling document workloads of “virtually any size and scale,” with zero impact to application availability or performance.
- New feature – Geospatial ML support for SageMaker. By far the sexiest announcement made in this keynote, SageMaker customers can now acquire and prepare geospatial data with a number of pre-built models and built-in visualization tools. AWS sees big possibilities for this addition, bringing out GM of ML/AI Services at AWS Kumar Chellapilla for a demo on how disaster response teams can use geospatial ML in SageMaker to quickly put together disaster evacuation routes and help navigate first responders to get victims to safety faster. There are, of course, multiple other applications for this, including automotive, retail, agriculture, and urban planning.
- New feature – Amazon Redshift Multi-AZ. Lets customers maximize price performance with high availability by processing reads and writes with underutilized standbys in separate availability zones.
- New feature – Amazon GuardDuty RDS Protection for Amazon Aurora. An enhancement of the security-of-the-cloud responsibility AWS assumes under its shared responsibility model, Amazon GuardDuty will now use ML to detect suspicious activity in relational databases and report on its security findings.
- New feature – Centralized Access Controls for Redshift Data Sharing. Lets you centrally manage access controls for Redshift data using Lake Formation.
- New SageMaker features. SageMaker is getting three new features: role manager, model cards, and model dashboard. Role manager is a user permissions feature, while model cards and model dashboard let you centralize model information and documentation and monitor your model performance from a central location.
- New feature – Auto copy from S3 for Amazon Redshift. This feature makes it easier to ingest data into S3, continuously through simple data ingestion pipelines.
- New feature – New connectors for Amazon AppFlow. AppFlow now supports over 50 connectors, making it easy to connect data from multiple disparate sources.
- New feature – New data sources for Amazon SageMaker Data Wrangler. Data Wrangler now supports over 40 data sources.
How AWS is investing in data access through tools and education
To wrap up, Sivasubramanian talked about efforts from AWS to make data education more accessible, primarily through AWS Machine Learning University (MLU). MLU now provides educator training for community colleges and minority-serving institutions (MISs). Training includes hands-on sessions for educators with a structured curriculum and learning resources.
Additionally, AWS now offers over 150 online courses in machine learning to make education more accessible. This initiative seemed particularly meaningful to Sivasubramanian, who grew up in a rural Indian town with only ten minutes of access to a computer per week in his high school years.
Beyond philanthropic efforts, AWS is making a broader push into low-code and no-code tools and features, making it easier for data consumers without a technical background to gain insights from data.
AWS continues making big investments in data products
Whether it’s database, data query, analytics, or machine learning, Swami Sivasubramanian’s keynote goes to show how AWS is working to make its already robust lineup of data products more secure, user-friendly, and interoperable. Data in the organization may not yet mirror its function in the brain, but that’s the future AWS envisions. Yesterday’s announcements make that clear.
Check back on our blog for more re:Invent 2022 highlights throughout the week. If you aren’t able to attend in-person or you simply don’t have the time to watch on-demand, we’ll do our best to bring you the most important updates from one of the biggest annual events in cloud computing.
We often see discussions of cloud transformation framed as on-premises vs. cloud, but the reality is a bit more nuanced than that. The fact is, you have multiple options when it comes to how your organization will use the cloud, with two of the most popular being...
Originally published April 27, 2022 If you haven’t already started your data center migration to AWS, it’s time to start thinking about it. Over 9 million websites use AWS and reap the benefits of its advanced cloud computing capabilities. In this article, we’ll cover...
As more organizations transition to the cloud, robust cybersecurity within cloud environments is more necessary than ever. Recognizing this need, Amazon Web Services (AWS) recently announced its new Cyber Insurance Program to help match customers with cyber insurance...