AWS re:Invent 2022 – Swami Sivasubramanian and Data in the Organization

by | Dec 1, 2022 | Featured, Tech

With the amount of data organizations collect paired with tools to store, analyze, and make predictions from that data, it’s tempting to think of data functioning in an organization as it would in the brain. That’s the goal in many ways, but we aren’t quite there yet, says AWS VP of Data and Machine Learning Dr. Swami Sivasubramanian.

In the second full day of AWS re:Invent, Dr. Sivasubramanian talked about the core components of a modern data strategy and how AWS is working to support organizations in those areas. Here are the highlights from his keynote on November 30, 2022.

Driving innovation with cues from neuroscience

It’s not uncommon for data scientists to take inspiration from neuroscience, and Dr. Sivasubramanian jumped right into sharing his vision for how organizations can utilize data more in the way that a human brain does.

In order for this to happen, Sivasubramanian says data should:

  • Be centralized
  • Be processed automatically
  • Flow naturally
  • Be easy to visualize

These four ingredients form the recipe for the creative spark that has driven so much innovation throughout human history according to Sivasubramanian, and it’s how AWS envisions its data products facilitating invention.

How to build a modern data strategy

But to realize that vision of organization as neural network, three things need to happen:

  1. Organizations must build a future-proof technical foundation with core data services
  2. They must weave connective tissue across the organization
  3. They must invest in democratizing data with the right tools and services

On that first point, Sivasubramanian says a data foundation needs scalable tools for every workload in order to remove heavy lifting. He noted that 94 percent of AWS customers use over ten different database and analytics tools, taking advantage of the extensive catalog of database, analytics, and machine learning products AWS offers.

Within the AWS ecosystem, customers can store data, query it, gain insights from it using AI and ML, and govern and catalog that data with tools like Lake Formation and the newly announced DataZone. Governance got a lot of attention in this talk, especially in the ways that it can not only increase security but also break down silos across teams to make data insights more available.

The rest of Dr. Sivasubramanian’s talk expanded on themes introduced in CEO Adam Selipsky’s keynote address on Tuesday, like increased interoperability among AWS products, enhanced security, and greater accessibility of products. This of course entailed a slew of new product and feature announcements.

New products and features for AWS data products

If you attend re:Invent, there’s one phrase you’ll quickly get used to hearing: “Today, I’m pleased to announce…” And yesterday, Swami Sivasubramanian got to repeat it at least a couple times.

Here’s what followed those ellipses:

  • New product – Amazon DataZone. While Adam Selipsky technically broke the news of this new product on Tuesday, Sivasubramanian expanded on it, bringing out Shikha Verma, Head of Product for Amazon DataZone, to share a brief demo with the audience. Based on the demo, DataZone makes it much easier to bridge the gap between data producers and data consumers.
  • New product – Amazon Glue Data Quality. A new tool that reduces or eliminates the need for manual rule creation when it comes to maintaining data quality. The tool generates automatic data quality rules and claims to reduce manual efforts from days to hours.
  • New feature – Amazon Athena for Apache Spark. Now Athena users can run interactive Apache Spark workloads in less than a second, per Sivasubramanian. Athena for Apache Spark integrates with ML and BI tools like SageMaker and EMR, and it requires no infrastructure management with an only-pay-for-what-you-use pricing model.
  • New feature – Amazon Redshift for Apache Spark. Again, this new integration was announced on Tuesday, but Sivasubramanian provided more details, claiming that users can run Spark on Redshift data up to ten times faster compared to other data warehousing tools. With the addition of Spark support for Redshift and Athena, AWS wants to be the “…best place to run Apache Spark on the cloud,” said Sivasubramanian.
  • New feature – Amazon DocumentDB Elastic Clusters. A feature update for DocumentDB offering a fully-managed solution for scaling document workloads of “virtually any size and scale,” with zero impact to application availability or performance.
  • New feature – Geospatial ML support for SageMaker. By far the sexiest announcement made in this keynote, SageMaker customers can now acquire and prepare geospatial data with a number of pre-built models and built-in visualization tools. AWS sees big possibilities for this addition, bringing out GM of ML/AI Services at AWS Kumar Chellapilla for a demo on how disaster response teams can use geospatial ML in SageMaker to quickly put together disaster evacuation routes and help navigate first responders to get victims to safety faster. There are, of course, multiple other applications for this, including automotive, retail, agriculture, and urban planning.
  • New feature – Amazon Redshift Multi-AZ. Lets customers maximize price performance with high availability by processing reads and writes with underutilized standbys in separate availability zones.
  • New project – Trusted Language Extensions for PostgreSQL. An open source development kit from AWS that lets developers safely build extensions for PostgreSQL using trusted languages like JavaScript, Perl, and PL/pgSQL.
  • New feature – Amazon GuardDuty RDS Protection for Amazon Aurora. An enhancement of the security-of-the-cloud responsibility AWS assumes under its shared responsibility model, Amazon GuardDuty will now use ML to detect suspicious activity in relational databases and report on its security findings.
  • New feature – Centralized Access Controls for Redshift Data Sharing. Lets you centrally manage access controls for Redshift data using Lake Formation.
  • New SageMaker features. SageMaker is getting three new features: role manager, model cards, and model dashboard. Role manager is a user permissions feature, while model cards and model dashboard let you centralize model information and documentation and monitor your model performance from a central location.
  • New feature – Auto copy from S3 for Amazon Redshift. This feature makes it easier to ingest data into S3, continuously through simple data ingestion pipelines.
  • New feature – New connectors for Amazon AppFlow. AppFlow now supports over 50 connectors, making it easy to connect data from multiple disparate sources.
  • New feature – New data sources for Amazon SageMaker Data Wrangler. Data Wrangler now supports over 40 data sources.

Phew!

How AWS is investing in data access through tools and education

To wrap up, Sivasubramanian talked about efforts from AWS to make data education more accessible, primarily through AWS Machine Learning University (MLU). MLU now provides educator training for community colleges and minority-serving institutions (MISs). Training includes hands-on sessions for educators with a structured curriculum and learning resources.

> Modern Data Architecture: Making the Most of Big Data

Additionally, AWS now offers over 150 online courses in machine learning to make education more accessible. This initiative seemed particularly meaningful to Sivasubramanian, who grew up in a rural Indian town with only ten minutes of access to a computer per week in his high school years.

Beyond philanthropic efforts, AWS is making a broader push into low-code and no-code tools and features, making it easier for data consumers without a technical background to gain insights from data.

AWS continues making big investments in data products

Whether it’s database, data query, analytics, or machine learning, Swami Sivasubramanian’s keynote goes to show how AWS is working to make its already robust lineup of data products more secure, user-friendly, and interoperable. Data in the organization may not yet mirror its function in the brain, but that’s the future AWS envisions. Yesterday’s announcements make that clear.

Check back on our blog for more re:Invent 2022 highlights throughout the week. If you aren’t able to attend in-person or you simply don’t have the time to watch on-demand, we’ll do our best to bring you the most important updates from one of the biggest annual events in cloud computing.

Headshot of Forrest Brown.
About Forrest Brown
Forrest Brown is the Content Manager at NerdRabbit. He lives in Atlanta with his wife and two cats.

Related articles

C++ vs. Java: Choose Your Fighter

C++ vs. Java: Choose Your Fighter

Whether you’re a developer looking to expand your programming literacy or a recruiter writing a job post or prepping for an interview, understanding the differences between C++ and Java is important. Read on to learn more about what makes these programming languages...

Data Center Sustainability: Making the Internet Green

Data Center Sustainability: Making the Internet Green

When it comes to business’ carbon footprints, we often hear about the importance of flying less, offering more sustainable commute options, and eliminating single use plastics in the office. But rarely is data center sustainability ever mentioned. Data centers are the...

AWS re:Invent 2022 – Werner Vogels and Event-Driven Architecture

AWS re:Invent 2022 – Werner Vogels and Event-Driven Architecture

If given the chance, would you want to live in a synchronous world? “Nope,” said Dr. Werner Vogels, VP and CTO of Amazon.com, in a The Matrix-inspired opening video to his Thursday keynote address at AWS re:Invent 2022. What follows is an overview of Dr. Vogels’...