Transforming Data: A Beginner’s Guide to Feature Engineering

Have you ever wondered how machines can understand customer preferences, house prices, or even text messages? The answer lies in feature engineering – one of the most crucial yet often overlooked aspects of machine learning.

What is Feature Engineering?

Feature engineering 💡 transforms raw data into meaningful features that help machine learning models better understand patterns and make more accurate predictions. Think of it like translating raw ingredients into a form ready for cooking. Just as a chef needs properly prepared ingredients to make a delicious meal, a machine-learning model needs well-engineered features to make accurate predictions.

Why is Feature Engineering Important?

Even the most sophisticated machine learning algorithms can fail if fed poor-quality features. Here’s why feature engineering matters:

  1. Better Model Performance: Well-engineered features can capture important patterns in your data that might otherwise be hidden. For example, instead of using raw dates, creating features like “day of the week” or “is_weekend” might better predict shopping behavior.
  2. Domain Knowledge Integration: Feature engineering allows us to incorporate our understanding of the problem into the model. If we’re predicting house prices, we might create a feature that combines square footage and location, knowing that price per square foot varies by neighborhood.

Understanding Data Types

Before diving into feature engineering techniques, let’s understand the two main types of data we typically encounter:

Quantitative Data

This is numerical data that you can perform mathematical operations on. For example:

  • Age (25, 30, 45)
  • Temperature (98.6°F, 102.3°F)
  • Sales amount ($100, $250, $500)

Qualitative Data

This represents categories or qualities that can’t be measured numerically. For example:

  • Colors (Red, Blue, Green)
  • Education level (High School, Bachelor’s, Master’s)
  • Customer satisfaction (Very Satisfied, Satisfied, Dissatisfied)

Essential Encoding Techniques for Beginners

When working with qualitative data, we need to convert it into numbers for our machine-learning models. Here are two fundamental encoding techniques:

One-Hot Encoding💡

Imagine you have a “color” feature with values: Red, Blue, and Green. One-hot encoding creates separate columns for each unique value:

This is perfect for categorical data where there is no natural order between values. Each category is given equal importance, and the model can treat them independently.

Ordinal Encoding💡

When your categories have a natural order (like education levels), ordinal encoding assigns numbers based on that order:

Education Level    Encoded Value
High School       1
Bachelor's        2
Master's          3
PhD              4

This preserves the relative relationship between categories while converting them to a numerical format the model can understand.

Tips for Beginners

  1. Start Simple: Begin with basic feature engineering techniques and gradually explore more complex ones as you gain confidence.
  2. Understand Your Data: Before applying any encoding technique, understand what your data represents and how different features relate.
  3. Document Your Process: Track how you’ve engineered your features. This will help you replicate your success and troubleshoot issues.
  4. Validate Your Results: Always check if your feature engineering improves model performance. Sometimes simpler is better!

Remember, feature engineering is both an art and a science. It requires creativity, domain knowledge, and experimentation. As you practice, you’ll develop an intuition for which techniques work best in different situations.

Keep exploring and happy engineering!

Did you find this useful? I’m turning AI complexity into friendly chats & aha moments 💡- Join thousands in receiving valuable AI & ML content by subscribing to the weekly newsletter.

What do you get for subscribing?

  • I will teach you about AI & ML practically
  • You will gain valuable insight on how to adopt AI
  • You will receive recommended readings and audio references for when you are on the go

Mike

Sources:

What Is One Hot Encoding and How to Implement It in Python

Cisco Identity Services Engine (ISE) version 3.3

 

Simplified Operations

 

New Split Update: Upgrading Cisco ISE has never been easier. With the new Split Upgrade feature, customers now have complete control over the upgrade process from the UI, allowing them to upgrade specific ISE nodes in parallel, with multiple iterations, at their convenience without experiencing any downtime. Say goodbye to complex and time-consuming upgrades.

 

Control Application Restart: Minimize Downtime, Maximize Efficiency. Downtime during certification renewals can be disruptive. Cisco ISE 3.3 introduces Controlled Application Restart, which allows customers to plan the renewals of the ISE administrative certificate, eliminating the need to reboot the entire ISE deployment at once without control. Schedule updates during low network usage periods, ensuring a smoother security update process without impacting operations.

 

Navigation improvement: ISE admins use the ISE UI in order to perform their job. ISE 3.3 introduces a new and improved navigation, allowing ISE admin to faster perform their tasks, with fewer clicks and without hiding their screen while navigating throughout ISE pages. Each ISE admin can now save the pages he or she is using most frequently on ISE and reduce the time it takes them to access those pages. 

 

IPv6 Support: in addition to the RADIUS, TACACS+, and ISE management over IPv6, customers can now enable additional services over IPv6: the ISE guest portal can now be accessed over IPv6 address and serve guests on the IPv6 network. profiling of IPv6-enabled endpoints and doing posture checks is also available for IPv6-enabled endpoints. 

 

Enhanced Platform Security

 

TPM Chip: Strengthen Security with the TPM Chip Security is paramount. Cisco ISE 3.3 with SNS-3700 (or virtual machines supporting VTPM) introduces the TPM Chip, a dedicated and secure storage location for sensitive information. With true random number generation for key generation, the TPM Chip enhances the security of stored data, providing you with peace of mind.

ISE Cipher Control: By allowing ISE admins to disable unwanted and weak ciphers manually, ISE 3.3 helps customers to meet compliance and regulations without the need to wait for the next release or a patch. 

 

TLS 1.3 for ISE admins: ISE admins can now connect to ISE UI over TLS 1.3. TLS 1.3 provides enhanced security and improved performance by reducing latency and eliminating outdated cryptographic algorithms, ensuring stronger encryption and more efficient communication between clients and servers. 

Certificate-Based Authentication for API calls: ISE 3.3 supports Certificate-based authentication for API calls. Certificate-based authentication offers stronger security by eliminating the vulnerabilities associated with traditional username and password authentication methods. It provides robust protection against credential theft, unauthorized access, and phishing attacks, ensuring a higher level of trust and authentication for users accessing sensitive systems or resources.

 

Visibility and Compliance

 

AI/ML based Profiling: Effortlessly Identify Unknown Endpoints with AI/ML Profiling Unidentified endpoints on the network can be a challenge. Cisco ISE 3.3 employs AI/ML Profiling and multi-factor classification (MFC) to swiftly identify clusters of similar unknown endpoints. This cloud-based ML engine helps customers categorize these devices accurately, making it easier to determine their nature and apply appropriate policies.

 

Unlock Valuable Insights with Wi-Fi Edge Analytics 

Our exclusive Wi-Fi Edge Analytics feature enables customers, who use the Cisco Catalyst 9800 wireless controllers, to exchange data between ISE 3.3 and the controller and get profiling information from Apple, Intel, and Samsung devices, enhancing endpoint profiling. 

This information includes endpoint-specific attributes such as model, operating system version, and firmware. 

 

Multi Factor Classification: ISE 3.3 introduces a new way to profile endpoints on the network. The profile is no longer a descriptive string of the endpoint. Instead of that ISE uses MFC – Multi Factor Classification which breaks the profile into 4 categories: Manufacturer, Device Type, Model and OS. This allows our customers to build more granular policies, based on the different MFCs. 

 

Posture for ARM based Windows: for customers who move to computers based on ARM processor, ISE 3.3 can now perform posture checks in order to check compliance status before letting those endpoints access to the network. 

 

Cloud Availability 

 

ISE 3.3 is going to be available on all the supported platforms: AWS, Azure, and Oracle Cloud. Release dates depend on the different cloud vendors:

ISE 3.3 on Azure  – Already available

ISE 3.3 on OCI – Already Available

ISE 3.3 on AWS – Already Available

 

ISE 3.3 Resources:

 

ISE 3.3 download page

ISE 3.3 release notes

Cisco acquired Valtix: What is Valitx?

Valtix is a cloud-native network security company that provides next-generation firewall and web application firewall (WAF) solutions for businesses looking to protect their cloud-based infrastructure. The company was founded in 2018 by seasoned technology executives who recognized the need for a modern approach to network security in the cloud.

Valtix’s cloud-based approach to network security is designed to be both scalable and flexible, allowing businesses to secure their cloud-based infrastructure without having to worry about the complexities of managing hardware or software. By leveraging cloud-native security technologies, Valtix enables businesses to deploy security policies that can be enforced consistently across their entire infrastructure, regardless of the cloud provider or network topology.

One of the key benefits of Valtix’s approach to network security is its ability to provide real-time threat detection and response capabilities. Using advanced machine learning algorithms, Valtix can analyze network traffic in real-time, identifying potential threats and responding quickly to mitigate any risks. This helps businesses stay ahead of the constantly evolving threat landscape and ensure their infrastructure remains secure.

In addition to its advanced threat detection and response capabilities, Valtix also provides businesses with granular control over their network security policies. This allows businesses to tailor their security policies to their specific needs, ensuring that their infrastructure is protected in the most effective way possible. With Valtix, businesses can easily manage their security policies from a centralized dashboard, making it easy to enforce policies consistently across their entire infrastructure.

Valtix’s cloud-based approach also makes it easy for businesses to scale their network security as their needs evolve. Whether they need to protect a small cloud environment or a large, complex infrastructure, Valtix can provide the necessary security solutions to meet their needs. This flexibility allows businesses to focus on growing their business, rather than worrying about managing their network security.

Finally, Valtix’s cloud-native approach to network security is designed to be highly automated, which helps businesses reduce the burden of managing their network security. By automating many of the routine tasks associated with network security, Valtix enables businesses to free up their IT resources to focus on more strategic initiatives.

In conclusion, Valtix is a cloud-native network security company, recently acquired by Cisco that provides businesses with advanced threat detection and response capabilities, granular control over their security policies, and the flexibility to scale their security solutions as their needs evolve. With its cloud-based approach and automated processes, Valtix helps businesses stay ahead of the constantly evolving threat landscape while reducing the burden of managing their network security.

https://valtix.com/blog/ciscos-intent-to-acquire-our-journey-and-why-it-matters/

Mike

Apronomics: March, 2023

Apronomics, is a play on the word ‘macroeconomics’ which seeks to provide a general perspective in three specific domains. Cloud, Digital Transformation, and Web3. This is monthly and sometimes twice a month TL;DR “too long; didn’t read” digital glance that serves as a quick consumption style for those looking for hot topics in Cloud, Digital Transformation, and Web3.

CLOUD

  • Google: Announces the general availability of Dataplex data lineage — a fully managed Dataplex capability that helps you understand how data is sourced and transformed within the organization. (Link)
  • Google: Opens access to Bard, an early experiment that lets you collaborate with generative AI. Bard is powered by a research large language model (LLM), specifically a lightweight and optimized version of LaMDA. (Link)
  • Azure: Announce that GPT-4 is available in preview in Azure OpenAI Service. AI models—including GPT-3.5, ChatGPT, and DALL•E 2. (Link)

DIGITAL TRANSFORMATION

  • Cisco: Announce its intent to acquire Lightspin Technologies Ltd. a privately-held cloud security software company. Lightspin’s lightweight agentless solution quickly scans your AWS, Azure, and GCP environments and Kubernetes clusters covering virtual machines, containers, and serverless. (Link)
  • SAP: SAP and DataRobot announced a joint partnership to enable customers to train ML models on their data residing in SAP HANA Cloud and SAP Data Warehouse Cloud. As a result, enterprises can now get powerful insights and predictive analytics from their business data. (Link)
  • OpenAI: Released GPT-4, a newer natural language processing (NLP) model that can render both images and text and produce text outputs. GPT-4 still suffers from similar limitations as earlier GPT models. Most notable is that it “hallucinates” facts and makes reasoning errors. (Link)

WEB3

  • Web3 Games Collective: The members of W3GC include Yield Guild Games (YGG), Game7, Magic Eden, and Fenix Games formed the Web3 Games Collective to leverage their expertise in creating a wave of breakout blockchain games. (Link)
  • Chainlink: A web3 services platform, is launching a self-service, serverless platform to help developers connect their decentralized applications (dApps) to any Web 2.0 API, like an AWS or Meta service. (Link)
  • Bitcoin NFTs: Bitcoin supports on-chain (native) support for NFTs, known as ordinal NFTs. Ordinals use an arbitrary but logical ordering system called ordinal theory to give each individual Bitcoin satoshi a unique number. (Link)

Mike

Apronomics: January, 2023

Apronomics, is a play on the word ‘macroeconomics’ which seeks to provide a general perspective in three specific domains. Cloud, Digital Transformation, and Web3. This is monthly and sometimes twice a month TL;DR “too long; didn’t read” digital glance that serves as a quick consumption style for those looking for hot topics in Cloud, Digital Transformation, and Web3.

CLOUD

  • AWS: Expected to reach 100B in 2023, despite economic uncertainty. AWS will announce its fourth-quarter earnings on Feb 2, 2023. A breakdown of AWS 12mo earnings in 2021, Q1:18.44B, Q2:19.74, Q3:20.54, Q4:17.78. (Link)
  • Azure: Multiyear, Multibillion dollar partnership with OpenAI, best known for ChatGPT, to accelerate AI breakthroughs. As the exclusive cloud provider powering OpenAI, Azure will look to commercialize OpenAI and offer the technology in its native Azure services. (Link)
  • Snowflake: Acquires Myst, a time series forecasting company. Myst offers an AI platform that helps index a sequence of data points over a period of time. This allows historical data to forecast future behaviors. (Link)

DIGITAL TRANSFORMATION

  • Meta: Confirms that it is acquiring Luxexcel, a smart eyewear company. Meta will likely leverage the company’s technology to produce AR glasses. This acquisition aligns with Meta’s corporate strategy when it comes to AR and VR advancements. (Link)
  • Amazon: Sidewalk, Amazon’s long-range, low-bandwidth IoT mesh network has four new device manufacture partners to bring smart devices to offer developers. (Link)
  • Microsoft: Acquires Fungible, a company that offers scale-out capabilities for data center infrastructure with low processing power also known as low-power data processing units (DPU). (Link)

WEB3

  • Ava Labs: Has partnered with AWS to support its Web3 node operations. Ava Labs makes it simple to deploy high-performance solutions for Web3. (Link)
  • Polygon: $MATIC Completes a hard fork upgrade to minimize gas fees. Although gas fees will continue to increase during peak demand, they will be aligned with Ethereums gas dynamics. (Link)
  • U.S. Gov: The U.S. government seeks to set a basis for legislative and regulatory control of cryptocurrencies. One way the U.S. government considers jurisdiction over cryptocurrencies is through the Commodity Futures Trading Commission, not the SEC. (Link)

Mike

What’s the difference between GCP and AWS Regions?

To understand the global infrastructure of a cloud provider, consider a coffee shop. If an event such as a flood, or power outage impacts one coffee shop location, customers can still get their coffee by visiting a different location only a few blocks away.

A cloud provider’s global infrastructure provides high availability that consisting of several components: Region, Zone, and Edge locations. 

A Region represents independent geographic areas that hosts cloud services. Each Region is isolated from each other unless you allow traffic out of that Region. Thinking back to our coffee shop analogy, all the coffee shops in the Northeast could be considered Northeast Region Coffee. If all Northeast coffee shops went out of business, it wouldn’t affect any Coffee shops located in the Northwest. And a Region consists of Zones. 

A Zone is where cloud resources are deployed generally consisting of two or three independent data centers located tens of miles apart from each other but close enough to have low latency or in our case coffee shops. Let’s say there are three coffee shops in town, one of the coffee shops loses power, however the other two coffee scops can still service customers in town. Zones provide high availability to cloud services and applications in the cloud.

An Edge location is part of the cloud provider’s network also known as Point-of-Presence that places cloud services closer to the user improving the user’s experience and convenience. 

Choosing where your applications are located affects qualities like user experience, availability, durability, and latency. 

Comparing Regions and Zones in Google Cloud and AWS

Google and AWS both use Regions to provide Cloud services to customers. 

One difference is that Google will have at least three Zones in each Region, whereas AWS uses Availability Zones to provide high availability. Every region will have at least two availability zones in an AWS Region.

Google Cloud infrastructure is based in five major geographic locations: North America, South America, Europe, Asia, and Australia.

Google Cloud currently supports 106 Zones in 35 regions

AWS Cloud infrastructure functions in North America, South America, Europe, the Middle East, Africa, Asia, and Australia

The AWS Cloud spans 96 Availability Zones within 30 Regions.

The Google and AWS networks have many of the same attributes with some slight differences! Regardless of which cloud provider you use selecting a region should include four key factors.

  1. Compliance
  2. Proximity to your customers
  3. Available Services within a Region
  4. Pricing

Mike

“Global Locations – Regions & Zones  |  Google Cloud.” Google, Google, https://cloud.google.com/about/locations/. 

Indeglia, Shaun. “GCP Networking- Regions and Zones.” Medium, Google Cloud – Community, 11 Nov. 2022, https://medium.com/google-cloud/gcp-region-and-zones-4eb4bf1f99ab. 

“Select Geographic Zones and Regions  |  Architecture Framework  |  Google Cloud.” Google, Google, https://cloud.google.com/architecture/framework/system-design/geographic-zones-regions. 

“Whitepapers.” Amazon, Earthpledge Foundation, https://docs.aws.amazon.com/whitepapers/latest/aws-overview/global-infrastructure.html.