Experiential is a learning, given audio and video with gesture in Shopping Mall and Farmer’s market, that essentially determine cognitive and context learning by taking actions – responses to achieve a set of rewards. In short, knowing by doing. For example, in the world of shopping, understanding behavior and setting interactions of actions – responses – both online and offline.

Will Machine Differentiate States?

The current structured approach enables retailers to effectively apply ML to improve various aspects of their operations, including demand forecasting, customer segmentation, and operational efficiency, thereby driving growth.​​ However, this approach does not enable ground-level humans to learn techniques to predict flow of customers, optimize queues, determine options in choice-set, allocate capacity and assign resources at various touch-points that can transform the service by personalizing the experience and optimizing revenues.

We set actions-responses, given combined speech-object recognition, to achieve reward and save hours of training and learning by doing. We got the idea of training a model in a manner similar to kids going through the world and narrating how they’re shopping.

We compiled thousands of video and audio data, of only Supermarket and Farmers’ market, in hundreds of different shopping situations and sequences, such as search of item, recognition of item, evaluate various attributes, determine latent features, build trade-offs, forecast future price, derive options and determine post-purchase cognitive dissonance. The model on an image of an item in a Supermarket, and the same item in Farmers’ market, extensively worked in learning models that automatically and consistently price every item by looking at data so that model leverages video-audio recognition technology to understand the product’s look and consider other aspects such as brand, condition, fabric, category, description, historical sales data, and past pricing decisions of similar items and computes an optimal price for each item with competitor price in another location. The model learned to merge multiple offers from different sellers into a single product ranking, among other less obvious uses like performing assortment gap analysis against competitors as they were described.

The highest layer of the model computes outputs of the multi-dimensional networks and maps the action-response-reward data.

We ran the model that augment the decision-making process by leveraging experiencing data to forecast the optimal reward conditions and provide optimal parameters for its execution. The model predictions aid machines in the process of balancing the costs and profits with thousands of attributes, latent features and metrics through our cognitive framework. After learning thousands of trade-offs and options, the model learns the cognitive signals corresponding with audio-video with gesture signals, and associates those signals with action-response in the decision-process. 

The goal is to understand how machines react to items at trade-offs – combination of different attributes and price points and estimate the right utility for each product if machine wants to buy it in a certain period, ultimately maximizing profit and optimizing budget.

The model forecasts demand and considers many factors to derive choice-sets and options on dynamic price adjustments.  By including supply, seasonality, external events related to item (e.g. a weekend holiday, a supply breakdown, a festival) and market demand and offer, an automatic pricing and feature adjusting system can test and learn from experiences which is the most profitable or estimated “optimal” price.

Multi-modal knowledge representation learning is to predict behavior in the decision-making system and to estimate how buyers will behave in the future based on data of pattern behaviors and improve the utility score. Our SCANN architecture in cognitive framework – based on consumer buying functions, setting only audio-video with gesture recognition, cognitive states and expected states of the target and contexts was configured to process the instructions of each item-mode input, and then detects the correctness of the action command and expected response through optimal choice and reward functions in monadic and paired-comparison for individual and nearest-neighbor.

Finally, the obtained results are merged and output instructions. The model is to compare the obtained action-responses based on audio-video with hand motion recognition results twice for each item and then to fuse. Firstly, the keywords of video and gesture recognition are retrieved in action signals. Secondly, the action signal is processed by expected response segmentation, and then the response-for-action segmentation results were compared with the keywords of video and gesture recognition. Finally, the two comparison results were fused to obtain the final command, which improved the accuracy of decision-effectiveness.