Wednesday, March 4, 2020

Amazon Seller Central Customer Search Term Keyword Performance

Amazon has used data science and machine learning to a great benefit for themselves but has not shared those tools down to the seller to utilize. We are going analyze the monthly Sponsored Products Amazon report specifically looking at the individual keywords and which resulted in click thrus for each campaign. This will let us better analyze the performance and improve our keyword selection for future campaigns.

I'm going to give you an option to analyze the performance of your Sponsored Products using R and a Power BI front end to give your executives and other end users an easy way to interact with the information.

First login to your Amazon Seller Central Account. Go to Reports > Advertising Reports. From here select Create report and create a new report.

Once the report is create save the report locally and rename the file including the Month and Year so we can maintain our historical files. Open the report and save it as a csv in preparation for importing the file for processing in SQL and R. Close the file once done.

We are going to import the file into our Grok database. This is the database used as a repository for all our data science projects. It is built on SQL Server 2017 Enterprise Edition to take advantage of Memory Optimized tables. There are no Memory Optimized tables in this project, those are reserved for my Product Recommendation Engine. We are going to import the file as a new table to avoid errors on the import. This step really should be automated. Until we spend the time and automate this step follow these instructions:

Right click on the Grok database and select Task > Import Data > Flat File Source. Select the csv file we saved in an earlier step. Change the Text qualifier to " in the General section. On the Advanced section change output column width to 500 for all columns. Set your destination to SQL Server native Client 11.0 and select your Server name where the Grok database lives. Change the Destination table name to STGSponsoredProductsSearchTermReport.

Before we leave our database we are going to need a few tables created in order to complete processing of the data. Here are those table create scripts


USE [Grok]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[stgR_AmzSearchTerm](
 [Keyword] [varchar](255) NULL,
 [GoodFreq] [float] NULL,
 [PoorFreq] [float] NULL,
 [Good] [float] NULL,
 [Poor] [float] NULL,
 [Targeting] [varchar](255) NULL,
 [RptType] [varchar](255) NULL,
 [Mth] [varchar](255) NULL,
 [Year] [float] NULL
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[R_AmzSearchTerm](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [Keyword] [varchar](255) NULL,
 [GoodFreq] [float] NULL,
 [PoorFreq] [float] NULL,
 [Good] [float] NULL,
 [Poor] [float] NULL,
 [Targeting] [varchar](255) NULL,
 [RptType] [varchar](255) NULL,
 [Mth] [varchar](255) NULL,
 [Year] [float] NULL,
 CONSTRAINT [pk_ID_AmzSearchTermID] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

CREATE TABLE [dbo].[R_AmzSearchTermSummary](
 [ID] [int] IDENTITY(1,1) NOT NULL,
 [StartDate] [datetime] NULL,
 [EndDate] [datetime] NULL,
 [Campaign Name] [varchar](255) NULL,
 [Ad Group Name] [varchar](255) NULL,
 [Targeting] [varchar](255) NULL,
 [Match Type] [varchar](255) NULL,
 [CustomerSearchTerms] [int] NULL,
 [Impressions] [int] NULL,
 [Clicks] [int] NULL,
 [Mth] [varchar](12) NULL,
 [Year] [int] NULL,
 CONSTRAINT [pk_ID_AmzSearchTermSumID] PRIMARY KEY CLUSTERED 
(
 [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO


Now that are we have our base data it is time to move into RStudio. From here create a new file called R_AmazonSearchKeywordsEncoding.R from this script. Please note that the encoding is important for this file due to special characters included in the Amazon results. To enable the encoding create the new file then select from the RStudio menu: File > Save with Encoding and chose UTF-8.

Change the variables for var.Mth and var.Year to the file being processed. Also change the cn2 and cnWrite connections to your database. Now you can run the script and the keywords for each campaign will be analyzed for click thru performance. The results will be written to our Grok database into the stgR_AmzSearchTerm table created earlier.

#Author:      Todd Palecek (toddpalecek@gmail.com)
#Description: Ingest amazon search terms and transform into good vs bad 
#keywords to be written to a database and then visualized from Power BI

library(RODBC)
library(tm,lsa)
library(ggplot2)
library(doSNOW)
library(SnowballC) #for stem
library(plyr)
library(tidytext)#library(dplyr)
library(wordcloud) #now used thru Power BI
library(stringr) #for str_trim

#variable for report name and key to write to SQL
var.RptType <- 'SponsoredProductsSearchTermReport'
var.Mth <- 'Jan'  
var.Year <- 2020

#database connections
cn2 <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=dc1-dwdb1;database=Grok;trusted_connection=yes;")
cnWrite <- odbcDriverConnect(connection="Driver={SQL Server Native Client 11.0};server=dc1-dwdb1;database=Grok;trusted_connection=yes;")

#sql data select
sqlstate <- "SELECT [Campaign Name]
      ,[Ad Group Name]
      ,trim([Targeting]) Targeting
      ,[Match Type]
      ,trim([Customer Search Term]) CustomerSearchTerm
      ,cast([Impressions] as INT) Impressions
      ,cast([Clicks] as INT) Clicks
     ,(cast([Clicks] as decimal(8,2))) /cast([Impressions] as decimal(8,2))   ClickThruRate
      ,cast([7 Day Total Orders (#)] as int) SevenDayTtlOrders
      FROM [Grok].[dbo].[STGSponsoredProductsSearchTermReport]
      where cast([Impressions] as INT) > 0"

#data
dataFetch2 <-sqlQuery(cn2, sqlstate, rows_at_time=1000, stringsAsFactors = FALSE)
close(cn2)

#cleanse data
dataFetch2$Targeting <- paste(tolower(dataFetch2$Targeting), " ")
dataFetch2$CustomerSearchTerm <- paste(tolower(dataFetch2$CustomerSearchTerm), " ")
#remove blank space at front and end
dataFetch2$Targeting <- str_trim(dataFetch2$Targeting)
dataFetch2$CustomerSearchTerm <- str_trim(dataFetch2$CustomerSearchTerm)
#remove special characters because not catching in punction remove
dataFetch2$CustomerSearchTerm <-gsub("\\“", "", dataFetch2$CustomerSearchTerm)
dataFetch2$CustomerSearchTerm <-gsub("\\’", "", dataFetch2$CustomerSearchTerm)
dataFetch2$CustomerSearchTerm <-gsub("\\â€", "", dataFetch2$CustomerSearchTerm)

#replace * because breaks my loop
dataFetch2$Targeting <-gsub("\\*", "wildcard", dataFetch2$Targeting)

#click thru rate less than 25% = Poor
dataFetch2$ClickPerf <- ifelse(dataFetch2$ClickThruRate <= .25,0, 1)

#subset for for a specific Targeting campaign
AmazonPerf <- subset(dataFetch2,Targeting=='personalized ornament')  

#replace Targeting Words in Customer Search Term with blank so do not reanalyze them
AmazonPerf$CustomerSearchTerm <- gsub("ornaments", " ", AmazonPerf$CustomerSearchTerm)
#remove punctuation, lose the + for exact term
AmazonPerf$CustomerSearchTerm <- gsub('[[:punct:]]', '', AmazonPerf$CustomerSearchTerm)
#remove the s from plural search term rather than stemming words
AmazonPerf$CustomerSearchTerm <- gsub(" s ", "", AmazonPerf$CustomerSearchTerm)
#remove blank space at front and end
AmazonPerf$CustomerSearchTerm <- str_trim(AmazonPerf$CustomerSearchTerm)

#split into good and poor performing datasets
AmazonPerfGood <- subset(AmazonPerf,ClickPerf==1)  
AmazonPerfPoor <- subset(AmazonPerf,ClickPerf==0)  


#Good performing keywords
#overall word frequency, Customer Search Term
c <- AmazonPerfGood[,5]
cGood <- c
#convert data frame to corpus
crude <- Corpus(VectorSource(as.character(c))) 
# Transform document words to lower case
crude <- tm_map(crude, content_transformer(tolower))
## Remove punctuation from documents
#crude <- tm_map(crude, removePunctuation)
# Remove stopwords from the corpus
crude <- tm_map(crude, removeWords, stopwords("english"))
crudeGood <- crude
#Word frequency
myTdm <- as.matrix(TermDocumentMatrix(crude))
FreqMat <- data.frame(ST = rownames(myTdm), 
                      Freq = rowSums(myTdm), 
                      row.names = NULL)

attach(FreqMat)
FreqMat <- FreqMat[order(Freq,decreasing=TRUE),]
detach(FreqMat)
#head(FreqMat)
FreqMatGood <- FreqMat


#poor performing keywords
#overall word frequency, Customer Search Term
c <- AmazonPerfPoor[,5]
cPoor <- c
#convert data frame to corpus
crude <- Corpus(VectorSource(as.character(c))) 
# Transform document words to lower case
crude <- tm_map(crude, content_transformer(tolower))
# Remove stopwords from the corpus
crude <- tm_map(crude, removeWords, stopwords("english"))
# Stem the corpus
#crude <- tm_map(crude, stemDocument, language = "english")
crudePoor <- crude
#Word frequency
myTdm <- as.matrix(TermDocumentMatrix(crude))
FreqMat <- data.frame(ST = rownames(myTdm), 
                      Freq = rowSums(myTdm), 
                      row.names = NULL)

attach(FreqMat)
FreqMat <- FreqMat[order(Freq,decreasing=TRUE),]
detach(FreqMat)
#head(FreqMat)
FreqMatPoor <- FreqMat

FreqComp <- merge(x=FreqMatGood, y=FreqMatPoor, by="ST", all = TRUE)
FreqComp[is.na(FreqComp$Freq.x),"Freq.x"] <- 0
FreqComp[is.na(FreqComp$Freq.y),"Freq.y"] <- 0

FreqComp$ClickPerf <- ifelse(FreqComp$Freq.x == FreqComp$Freq.y, 1, ifelse(FreqComp$Freq.x > FreqComp$Freq.y, 1, 0))
FreqComp$FreqGood <- FreqComp$Freq.x - FreqComp$Freq.y
FreqComp$FreqPoor <- FreqComp$Freq.y - FreqComp$Freq.x
FreqCompGood <- subset(FreqComp,ClickPerf==1)  
FreqCompPoor <- subset(FreqComp,ClickPerf==0)  


attach(FreqComp)
FreqComp <- FreqComp[order(ST,decreasing=FALSE),]
detach(FreqComp)

colnames(FreqComp) = c("Keyword","GoodFreq","PoorFreq","ClickPerf","Good", "Poor")


#extract unique values of SKUs and Order Methods to loop through later
sku_list <- unique(dataFetch2$Targeting)

#create blank data frame to populated relationships into
rulestoDBALL <- data.frame()

#loop through each SKU, Brand, and Order Method to build relationships
for (i in 1:length(sku_list)){
  AmazonPerf <- dataFetch2[which(dataFetch2$Targeting==sku_list[i]),]
 
  #replace Targeting Words in Customer Search Term with blank so do not reanalyze them
  #split string of words
  s <- (sku_list[i])
  #strsplit(s, split = " ")
  #create lookup table with words to replace
  l <- strsplit(s, split = " ")
  list_gsub <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T))
  
  #loop through each lookup row
  for(x in 1:nrow(list_gsub)){
    AmazonPerf$CustomerSearchTerm <- gsub(list_gsub[x]," ", AmazonPerf$CustomerSearchTerm)
  }
  
  #old replace Targeting Words in Customer Search Term with blank so do not reanalyze them
  #remove punctuation, lose the + for exact term
  AmazonPerf$CustomerSearchTerm <- gsub('[[:punct:]]', '', AmazonPerf$CustomerSearchTerm)
  #remove the s from plural search term rather than stemming words
  AmazonPerf$CustomerSearchTerm <- gsub(" s ", "", AmazonPerf$CustomerSearchTerm)
  #remove blank space at front and end
  AmazonPerf$CustomerSearchTerm <- str_trim(AmazonPerf$CustomerSearchTerm)
  
  #split into good and poor performing datasets
  AmazonPerfGood <- subset(AmazonPerf,ClickPerf==1)  
  AmazonPerfPoor <- subset(AmazonPerf,ClickPerf==0)  
  
  
  #Good performing keywords
  #overall word frequency, Customer Search Term
  c <- AmazonPerfGood[,5]
  cGood <- c
  #convert data frame to corpus
  crude <- Corpus(VectorSource(as.character(c))) 
  # Remove stopwords from the corpus
  crude <- tm_map(crude, removeWords, stopwords("english"))

  crudeGood <- crude
  #Word frequency
  myTdm <- as.matrix(TermDocumentMatrix(crude))
  FreqMat <- data.frame(ST = rownames(myTdm), 
                        Freq = rowSums(myTdm), 
                        row.names = NULL)
  
  attach(FreqMat)
  FreqMat <- FreqMat[order(Freq,decreasing=TRUE),]
  detach(FreqMat)
  FreqMatGood <- FreqMat
  
  if(class(FreqMatGood)=="numeric") {
    FreqMatGood <- NULL
    FreqMatGood <- data.frame(ST=character(),
                              Freq=integer())
  }
  
  FreqMat <- NULL
  #poor performing keywords
  #overall word frequency, Customer Search Term
  c <- AmazonPerfPoor[,5]
  cPoor <- c
  #convert data frame to corpus
  crude <- Corpus(VectorSource(as.character(c))) 
  # Remove stopwords from the corpus
  crude <- tm_map(crude, removeWords, stopwords("english"))

  crudePoor <- crude
  #Word frequency
  myTdm <- as.matrix(TermDocumentMatrix(crude))
  FreqMat <- data.frame(ST = rownames(myTdm), 
                        Freq = rowSums(myTdm), 
                        row.names = NULL)
  
  attach(FreqMat)
  FreqMat <- FreqMat[order(Freq,decreasing=TRUE),]
  detach(FreqMat)
  FreqMatPoor <- FreqMat
  
  #check if FreqMatPoor datafram is empty and populate with values of zero
  if(class(FreqMatPoor)=="numeric") {
    FreqMatPoor <- NULL
    FreqMatPoor <- data.frame(ST=character(),
                              Freq=integer())
  }
  
  
  FreqComp <- merge(x=FreqMatGood, y=FreqMatPoor, by="ST", all = TRUE)
  FreqComp[is.na(FreqComp$Freq.x),"Freq.x"] <- 0
  FreqComp[is.na(FreqComp$Freq.y),"Freq.y"] <- 0
  FreqComp$FreqGood <- FreqComp$Freq.x - FreqComp$Freq.y
  FreqComp$FreqPoor <- FreqComp$Freq.y - FreqComp$Freq.x

  attach(FreqComp)
  FreqComp <- FreqComp[order(ST,decreasing=FALSE),]
  detach(FreqComp)
  
  colnames(FreqComp) = c("Keyword","GoodFreq","PoorFreq","Good", "Poor")
  
  
  #convert rules to data frame in prep to write to DB
  rulestoDB <- FreqComp
  rulestoDB$Targeting <- (sku_list[i]) 
  rulestoDBALL <- rbind(rulestoDBALL,rulestoDB)
  #clear data sets
  l <- NULL
  s <- NULL
  crude <- NULL
  crudeGood <- NULL
  crudePoor <- NULL
  FreqComp <- NULL
  FreqMatGood <- NULL
  FreqMatPoor <- NULL
  rulestoDB <- NULL
}

rulestoDBALL$RptType <- var.RptType
rulestoDBALL$Mth <- var.Mth
rulestoDBALL$Year <- var.Year

#export data to database
sqlSave(cnWrite,rulestoDBALL,tablename = "stgR_AmzSearchTerm",rownames=FALSE,append = TRUE, fast = FALSE )



Now we can jump back into our database and move the Stage results into our permanent historical table, R_AmzSearchTerm.


--Insert Search Term Details
INSERT INTO [Grok].[dbo].[R_AmzSearchTerm]
           ([Keyword],[GoodFreq],[PoorFreq],[Good],[Poor],[Targeting],[RptType],[Mth],[Year])
  SELECT [Keyword],[GoodFreq],[PoorFreq],[Good],[Poor],[Targeting],[RptType],[Mth],[Year]
  FROM [Grok].[dbo].[stgR_AmzSearchTerm]

--Change the Month and Year before running
--Insert Campaign Level Summary
INSERT INTO [dbo].[R_AmzSearchTermSummary]
           ([StartDate],[EndDate],[Campaign Name],[Ad Group Name],[Targeting],[Match Type],[CustomerSearchTerms]
           ,[Impressions],[Clicks],[Mth],[Year])   
   SELECT  min([Start Date]) StartDate,max([End Date]) EndDate
      ,[Campaign Name],[Ad Group Name],[Targeting],[Match Type]
      ,count([Customer Search Term]) CustomerSearchTerms
      ,sum(cast([Impressions] as INT)) Impressions
      ,sum(cast([Clicks] as INT)) Clicks
   ,'Jan'
   ,2020
  FROM [Grok].[dbo].STGSponsoredProductsSearchTermReport 
  group by [Campaign Name],[Ad Group Name],[Targeting],[Match Type]


And lets cleanup our temp tables for our next run:


truncate table [Grok].[dbo].[stgR_AmzSearchTerm]
GO
drop table [Grok].[dbo].[STGSponsoredProductsSearchTermReport]
GO


Now we can take a look at our Power BI report that we are going to share with our end users and executives to show them how data science can be used to improve decision making.

The report can be downloaded from here:
AmazonSearchTerms

I am using Power BI Desktop which if free as well as the online portal to share out the report. The desktop version is great because anyone can install it. Sharing of files becomes a bit trickier but is manageable.

The report is broken up into 5 sections.

Keyword Performance - Lists each individual and how it performed. Performance column is the difference between a Good Count (Click Thru rate greater than 25%) and a Poor Count (Click Thru rate less than 25%).

Targeting Campaign Summary - Summary of each campaign. Customer Search Terms is the count of unique search terms that were used. Impressions are time shown to a customer. Clicks are times with a Click Thru. Click Thru Rate % is the percent of Clicks compared to Impressions.

Targeting Keyword Detail - All Keywords as columns and Targeting Campaigns as rows. Becomes useful once limit to a single campaign

Good Keywords - Wordcloud of keywords generating greater than 25% click through.

Poor Keywords - Wordcloud of keywords generating less than 25% click through.

I like to use the report by selecting a keyword to see wich Targeting Campaigns it was used in. I then select the Targeting Campaign to see how different words performed within that campaign.