feat: Making feast vector store with open ai search api compatible#6121
feat: Making feast vector store with open ai search api compatible#6121patelchaitany wants to merge 5 commits intofeast-dev:masterfrom
Conversation
e45f167 to
c8392a9
Compare
974d688 to
639a87e
Compare
Signed-off-by: Chaitany patel <patelchaitany93@gmail.com>
Signed-off-by: Chaitany patel <patelchaitany93@gmail.com>
7e8adfb to
3f541ad
Compare
| if requested_features is None: | ||
| requested_features = [] | ||
| if "distance" not in requested_features: | ||
| requested_features.append("distance") |
There was a problem hiding this comment.
🔴 RemoteOnlineStore mutates caller's requested_features list, causing duplicate 'distance' in response
In RemoteOnlineStore.retrieve_online_documents_v2, lines 351-354 mutate the requested_features parameter in-place by appending "distance". This list is the same object passed by reference from feature_store.py:_retrieve_from_online_store_v2 (line 3066). After the remote store call returns, feature_store.py:3089 builds features_to_request = requested_features + ["distance"], which now produces a list with "distance" appearing twice (since it was already appended). This causes _populate_response_from_feature_data at feature_store.py:3121-3128 to add a duplicate "distance" entry in the response metadata and results, producing a malformed OnlineResponse when using the remote online store.
| if requested_features is None: | |
| requested_features = [] | |
| if "distance" not in requested_features: | |
| requested_features.append("distance") | |
| if requested_features is None: | |
| requested_features = [] | |
| requested_features = list(requested_features) | |
| if "distance" not in requested_features: | |
| requested_features.append("distance") |
Was this helpful? React with 👍 or 👎 to provide feedback.
There was a problem hiding this comment.
the request feature is modified after getting response as it is not returning the distance field in the response of the remoteonlinestore
Signed-off-by: Chaitany patel <patelchaitany93@gmail.com>
Signed-off-by: Chaitany patel <patelchaitany93@gmail.com>
…patelchaitany/feast into enh/openai-compatibel-store-api
| if f.type == "eq": | ||
| return {"term": {field: fmt_val}} | ||
| elif f.type == "ne": | ||
| return {"bool": {"must_not": [{"term": {field: fmt_val}}]}} | ||
| elif f.type in ("gt", "gte", "lt", "lte"): | ||
| return {"range": {field: {f.type: fmt_val}}} | ||
| elif f.type == "in": | ||
| if not isinstance(f.value, list): | ||
| raise ValueError( | ||
| f"'in' filter requires a list value, got {type(f.value)}" | ||
| ) | ||
| return {"terms": {field: fmt_list}} | ||
| elif f.type == "nin": | ||
| if not isinstance(f.value, list): | ||
| raise ValueError( | ||
| f"'nin' filter requires a list value, got {type(f.value)}" | ||
| ) | ||
| return {"bool": {"must_not": [{"terms": {field: fmt_list}}]}} |
There was a problem hiding this comment.
🔴 Elasticsearch term query on analyzed text field causes string filters to silently fail
The new _translate_comparison_filter method generates term/terms queries against the value_text field (e.g. {"term": {"category.value_text": "Category-0"}}). However, value_text is mapped as "type": "text" in the ES index (sdk/python/feast/infra/online_stores/elasticsearch_online_store/elasticsearch.py:260). ES text fields are analyzed (lowercased, tokenized on whitespace/punctuation), but term queries match against raw unanalyzed tokens. This means an eq filter for "Category-0" would look for the exact token "Category-0", but the indexed tokens are ["category", "0"] after analysis — so the filter silently returns no matches. This affects eq, ne, in, and nin operators for any string value that is capitalized, hyphenated, or multi-word.
Fix: add a keyword sub-field to value_text mapping
Change the value_text mapping from {"type": "text"} to {"type": "text", "fields": {"keyword": {"type": "keyword"}}}, and update _translate_comparison_filter to use f"{f.key}.value_text.keyword" for term/terms queries instead of f"{f.key}.value_text". This preserves full-text search on the analyzed field while enabling exact matching via the keyword sub-field.
Prompt for agents
Two changes are needed to fix the Elasticsearch string filter bug:
1. In sdk/python/feast/infra/online_stores/elasticsearch_online_store/elasticsearch.py, in the create_index method (around line 258-260), change the value_text mapping from:
"value_text": {"type": "text"}
to:
"value_text": {"type": "text", "fields": {"keyword": {"type": "keyword"}}}
2. In the same file, in the _translate_comparison_filter method (around lines 438-464), when has_value_num is False (i.e., the text branch), change the field from:
field = f"{f.key}.value_text"
to:
field = f"{f.key}.value_text.keyword"
This ensures that exact-match filters (eq, ne, in, nin) use the non-analyzed keyword sub-field for reliable matching, while the analyzed text field remains available for full-text search queries.
Was this helpful? React with 👍 or 👎 to provide feedback.
What this PR does / why we need it:
This PR making the feast vector store api with open ai search api compatible so.
the current changes are creating an new rest api end point which is compatible with open ai search api and also it include an extra field in the metadata named features_to_retrieve(allows to retrieve specific feature) and content_field (which field include the main content or Chunk document)
what things are missing :
Which issue(s) this PR fixes:
#5615
Misc