Skip to content

feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132

Open
soooojinlee wants to merge 6 commits intofeast-dev:masterfrom
soooojinlee:feat/support-nested-collection-types
Open

feat: Support nested collection types (Array/Set of Array/Set) (#5947)#6132
soooojinlee wants to merge 6 commits intofeast-dev:masterfrom
soooojinlee:feat/support-nested-collection-types

Conversation

@soooojinlee
Copy link

@soooojinlee soooojinlee commented Mar 19, 2026

What this PR does / why we need it:

Add 2-level nested collection types: Array(Array(T)), Array(Set(T)), Set(Array(T)), Set(Set(T))

Changes

  • Proto: 4 new ValueType enums (LIST_LIST/LIST_SET/SET_LIST/SET_SET) with RepeatedValue fields
  • Type system: Allow Array/Set mutual nesting with 2-level depth limit, preserve inner type via feast:nested_inner_type tag
  • Serialization: proto conversion, JSON deserialization, PyArrow schema inference support
  • Remote online store: pass correct ValueType via feature_type_map instead of ValueType.UNKNOWN
  • Replace silent String fallback in _str_to_feast_type with ValueError
  • Fix mypy type error in nested collection proto construction

Which issue(s) this PR fixes:

Fixes #5947

Misc

The UUID type feature PR was developed first and uses ValueType enum values 36-41. This PR uses 36-39 for nested collection types. If this PR is merged first, the UUID branch will need to reassign its enum values to avoid conflicts.


Open with Devin

@soooojinlee soooojinlee requested a review from a team as a code owner March 19, 2026 14:30
devin-ai-integration[bot]

This comment was marked as resolved.

@nquinn408
Copy link
Contributor

@soooojinlee , this looks good. Just need to respond to Devin feedback and update branch and I can approve.

soooojinlee and others added 6 commits March 26, 2026 16:17
…-dev#5947)

Add support for 2-level nested collection types: Array(Array(T)),
Array(Set(T)), Set(Array(T)), and Set(Set(T)).

- Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
  backed by RepeatedValue proto messages
- Persist inner type info in Field tags (feast:nested_inner_type),
  following the existing Struct schema tag pattern
- Handle edge cases: empty inner collections, Set dedup at inner level,
  depth limit enforcement (2 levels max)
- Add proto/JSON/remote transport serialization support
- Add 25 unit tests covering all combinations and edge cases

Signed-off-by: Soojin Lee <lsjin0602@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
- Fix remote online store read path to use declared feature types from
  FeatureView instead of ValueType.UNKNOWN, which fails for nested
  collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
- Add Nested Collection Types section to type-system.md with type table,
  usage examples, and empty-inner-collection→None limitation docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
…for nested collection types

- Add nested list handling in proto_json from_json_object (list of lists
  was raising ParseError since no branch matched list-typed elements)
- Fix pa_to_feast_value_type to recognize nested list PyArrow types
  (list<item: list<item: T>>) instead of crashing with KeyError
- Replace silent String fallback in _str_to_feast_type with ValueError
  to surface corrupted tag values instead of silently losing type info
- Strengthen test coverage: type str roundtrip, inner value verification,
  multi-value batch, proto JSON roundtrip, PyArrow nested type inference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue
construction to satisfy mypy's strict type checking.

Signed-off-by: soojin <soojin@dable.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
…n edge case

- Add __eq__/__hash__ to Array and Set so inner element types are compared
  (previously Array(Array(String)) == Array(Array(Int32)) was True)
- Fix nested collection detection in proto_json when first element is None
  by using any() fallback instead of only checking value[0]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
… coverage

- Remove 2-level depth restriction from Array and Set constructors
  to support unbounded nesting per maintainer request
- Make _convert_nested_collection_to_proto() recursive for 3+ levels
- Update error message for nested type inference to guide users
  toward explicit Field dtype declaration
- Add 3+ level tests for Field roundtrip, str roundtrip, and PyArrow conversion

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
@soooojinlee soooojinlee force-pushed the feat/support-nested-collection-types branch from 58dff41 to c1ce5d8 Compare March 26, 2026 07:19
@soooojinlee
Copy link
Author

Addressed Devin review feedback and also removed the 2-level depth restriction for nested collection types — now supports unbounded nesting depth as discussed in #5947.

@nquinn408
Copy link
Contributor

Addressed Devin review feedback and also removed the 2-level depth restriction for nested collection types — now supports unbounded nesting depth as discussed in #5947.

@soooojinlee , thanks so much for working on this. My question is if unbounded, why not just implement like map. Reduce to just SET and LIST instead of SET_LIST, SET_SET, LIST_SET, and LIST_LIST or if too vanilla, you can call it VAR_SET or VAR_LIST but then will clash with MAP. Let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support nested set and list as types

2 participants