Class Pipeline

java.lang.Object
com.google.cloud.firestore.Pipeline

@BetaApi public final class Pipeline extends Object
The Pipeline class provides a flexible and expressive framework for building complex data transformation and query pipelines for Firestore.

A pipeline takes data sources, such as Firestore collections or collection groups, and applies a series of stages that are chained together. Each stage takes the output from the previous stage (or the data source) and produces an output for the next stage (or as the final output of the pipeline).

Expressions from com.google.cloud.firestore.pipeline.expressions can be used within each stages to filter and transform data through the stage.

NOTE: The chained stages do not prescribe exactly how Firestore will execute the pipeline. Instead, Firestore only guarantees that the result is the same as if the chained stages were executed in order.

Usage Examples:


 Firestore firestore; // A valid firestore instance.

 // Example 1: Select specific fields and rename 'rating' to 'bookRating'
 Snapshot results1 = firestore.pipeline()
     .collection("books")
     .select(field("title"), field("author"), field("rating").as("bookRating"))
     .execute()
     .get();

 // Example 2: Filter documents where 'genre' is "Science Fiction" and
 // 'published' is after 1950
 Snapshot results2 = firestore.pipeline()
     .collection("books")
     .where(and(equal("genre", "Science Fiction"), greaterThan("published", 1950)))
     .execute()
     .get();
 // Same as above but using methods on expressions as opposed to static
 // functions.
 results2 = firestore.pipeline()
     .collection("books")
     .where(and(field("genre").equal("Science Fiction"), field("published").greaterThan(1950)))
     .execute()
     .get();

 // Example 3: Calculate the average rating of books published after 1980
 Snapshot results3 = firestore.pipeline()
     .collection("books")
     .where(greaterThan("published", 1980))
     .aggregate(average("rating").as("averageRating"))
     .execute()
     .get();
 
  • Method Details

    • addFields

      @BetaApi public Pipeline addFields(Selectable field, Selectable... additionalFields)
      Adds new fields to outputs from previous stages.

      This stage allows you to compute values on-the-fly based on existing data from previous stages or constants. You can use this to create new fields or overwrite existing ones (if there is name overlaps).

      The added fields are defined using Selectable expressions, which can be:

      Example:

      
       firestore.pipeline().collection("books")
           .addFields(
               field("rating").as("bookRating"), // Rename 'rating' to 'bookRating'
               add(5, field("quantity")).as("totalCost") // Calculate 'totalCost'
           );
       
      Parameters:
      field - The field to add to the documents, specified as Selectable expressions.
      additionalFields - The additional fields to add to the documents, specified as Selectable expressions.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • removeFields

      @BetaApi public Pipeline removeFields(String field, String... additionalFields)
      Remove fields from outputs of previous stages.

      Example:

      
       firestore.pipeline().collection("books")
           .removeFields(
               "rating", "cost");
       
      Parameters:
      field - The fields to remove.
      additionalFields - The additional fields to remove.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • removeFields

      @BetaApi public Pipeline removeFields(Field field, Field... additionalFields)
      Remove fields from outputs of previous stages.

      Example:

      
       firestore.pipeline().collection("books")
           .removeFields(
               field("rating"), field("cost"));
       
      Parameters:
      field - The field to remove.
      additionalFields - The additional fields to remove.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • select

      @BetaApi public Pipeline select(Selectable selection, Selectable... additionalSelections)
      Selects or creates a set of fields from the outputs of previous stages.

      The selected fields are defined using Selectable expressions, which can be:

      If no selections are provided, the output of this stage is empty. Use addFields(Selectable, Selectable...) instead if only additions are desired.

      Example:

      
       firestore.pipeline().collection("books")
         .select(
           field("name"),
           field("address").toUppercase().as("upperAddress"),
         );
       
      Parameters:
      selection - The field to include in the output documents, specified as Selectable expressions.
      additionalSelections - The additional fields to include in the output documents,
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • select

      @BetaApi public Pipeline select(String field, String... additionalFields)
      Selects a set of fields from the outputs of previous stages.

      If no selections are provided, the output of this stage is empty. Use addFields(Selectable, Selectable...) instead if only additions are desired.

      Example:

      
       firestore.collection("books")
           .select("name", "address");
      
       // The above is a shorthand of this:
       firestore.pipeline().collection("books")
           .select(field("name"), field("address"));
       
      Parameters:
      field - The name of the field to include in the output documents.
      additionalFields - The additional fields to include in the output documents.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • where

      @BetaApi public Pipeline where(BooleanExpression condition)
      Filters the documents from previous stages to only include those matching the specified BooleanExpression.

      This stage allows you to apply conditions to the data, similar to a "WHERE" clause in SQL. You can filter documents based on their field values, using implementions of BooleanExpression, typically including but not limited to:

      Example:

      
       firestore.pipeline().collection("books")
         .where(
           and(
               gt("rating", 4.0),   // Filter for ratings greater than 4.0
               field("genre").eq("Science Fiction") // Equivalent to eq("genre", "Science Fiction")
           )
         );
       
      Parameters:
      condition - The BooleanExpression to apply.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • offset

      @BetaApi public Pipeline offset(int offset)
      Skips the first `offset` number of documents from the results of previous stages.

      This stage is useful for implementing pagination in your pipelines, allowing you to retrieve results in chunks. It is typically used in conjunction with limit(int) to control the size of each page.

      Example:

      
       // Retrieve the second page of 20 results
       firestore.pipeline().collection("books")
           .sort(field("published").descending())
           .offset(20)  // Skip the first 20 results
           .limit(20);   // Take the next 20 results
       
      Parameters:
      offset - The number of documents to skip.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • limit

      @BetaApi public Pipeline limit(int limit)
      Limits the maximum number of documents returned by previous stages to `limit`.

      This stage is particularly useful when you want to retrieve a controlled subset of data from a potentially large result set. It's often used for:

      • **Pagination:** In combination with offset(int) to retrieve specific pages of results.
      • **Limiting Data Retrieval:** To prevent excessive data transfer and improve performance, especially when dealing with large collections.

      Example:

      
       // Limit the results to the top 10 highest-rated books
       firestore.pipeline().collection("books")
           .sort(field("rating").descending())
           .limit(10);
       
      Parameters:
      limit - The maximum number of documents to return.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • aggregate

      @BetaApi public Pipeline aggregate(AliasedAggregate... accumulators)
      Performs aggregation operations on the documents from previous stages.

      This stage allows you to calculate aggregate values over a set of documents. You define the aggregations to perform using AliasedExpression expressions which are typically results of calling Expression.as(String) on AggregateFunction instances.

      Example:

      
       // Calculate the average rating and the total number of books
       firestore.pipeline().collection("books")
           .aggregate(
               field("rating").average().as("averageRating"),
               countAll().as("totalBooks"));
       
      Parameters:
      accumulators - The AliasedExpression expressions, each wrapping an AggregateFunction and provide a name for the accumulated results.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • aggregate

      @BetaApi public Pipeline aggregate(Aggregate aggregate)
      Performs optionally grouped aggregation operations on the documents from previous stages.

      This stage allows you to calculate aggregate values over a set of documents, optionally grouped by one or more fields or functions. You can specify:

      • **Grouping Fields or Functions:** One or more fields or functions to group the documents by. For each distinct combination of values in these fields, a separate group is created. If no grouping fields are provided, a single group containing all documents is used. Not specifying groups is the same as putting the entire inputs into one group.
      • **Accumulators:** One or more accumulation operations to perform within each group. These are defined using AliasedExpression expressions, which are typically created by calling Expression.as(String) on AggregateFunction instances. Each aggregation calculates a value (e.g., sum, average, count) based on the documents within its group.

      Example:

      
       // Calculate the average rating for each genre.
       firestore.pipeline().collection("books")
           .aggregate(
               Aggregate
                   .withAccumulators(average("rating").as("avg_rating"))
                   .withGroups("genre"));
       
      Parameters:
      aggregate - An Aggregate object that specifies the grouping fields (if any) and the aggregation operations to perform.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • aggregate

      @BetaApi public Pipeline aggregate(Aggregate aggregate, AggregateOptions options)
    • distinct

      @BetaApi public Pipeline distinct(String... fields)
      Returns a set of distinct field values from the inputs to this stage.

      This stage run through the results from previous stages to include only results with unique combinations of values for the specified fields and produce these fields as the output.

      Example:

      
       // Get a list of unique genres.
       firestore.pipeline().collection("books")
           .distinct("genre");
       
      Parameters:
      fields - The fields to consider when determining distinct values.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • distinct

      @BetaApi public Pipeline distinct(Selectable... selectables)
      Returns a set of distinct Expression values from the inputs to this stage.

      This stage run through the results from previous stages to include only results with unique combinations of Expression values (Field, FunctionExpression, etc).

      The parameters to this stage are defined using Selectable expressions, which can be:

      Example:

      
       // Get a list of unique author names in uppercase and genre combinations.
       firestore.pipeline().collection("books")
           .distinct(toUppercase(field("author")).as("authorName"), field("genre"))
           .select("authorName");
       
      Parameters:
      selectables - The Selectable expressions to consider when determining distinct value combinations.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • findNearest

      @BetaApi public Pipeline findNearest(String fieldName, double[] vector, FindNearest.DistanceMeasure distanceMeasure, FindNearestOptions options)
      Performs vector distance (similarity) search with given parameters to the stage inputs.

      This stage adds a "nearest neighbor search" capability to your pipelines. Given a field that stores vectors and a target vector, this stage will identify and return the inputs whose vector field is closest to the target vector, using the parameters specified in `options`.

      Example:

      
       // Find books with similar "topicVectors" to the given targetVector
       firestore.pipeline().collection("books")
           .findNearest("topicVectors", targetVector, FindNearest.DistanceMeasure.COSINE,
              new FindNearestOptions()
                .withLimit(10)
                .withDistanceField("distance"));
       
      Parameters:
      fieldName - The name of the field containing the vector data. This field should store VectorValue.
      vector - The target vector to compare against.
      distanceMeasure - The distance measure to use: cosine, euclidean, etc.
      options - Configuration options for the nearest neighbor search, such as limit and output distance field name.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • findNearest

      @BetaApi public Pipeline findNearest(Expression property, double[] vector, FindNearest.DistanceMeasure distanceMeasure, FindNearestOptions options)
      Performs vector distance (similarity) search with given parameters to the stage inputs.

      This stage adds a "nearest neighbor search" capability to your pipelines. Given an expression that evaluates to a vector and a target vector, this stage will identify and return the inputs whose vector expression is closest to the target vector, using the parameters specified in `options`.

      Example:

      
       // Find books with similar "topicVectors" to the given targetVector
       firestore.pipeline().collection("books")
           .findNearest(
              field("topicVectors"),
              targetVector,
              FindNearest.DistanceMeasure.COSINE,
              new FindNearestOptions()
                .withLimit(10)
                .withDistanceField("distance"));
       
      Parameters:
      property - The expression that evaluates to a vector value using the stage inputs.
      vector - The target vector to compare against.
      distanceMeasure - The distance measure to use: cosine, euclidean, etc.
      options - Configuration options for the nearest neighbor search, such as limit and output distance field name.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • sort

      @BetaApi public Pipeline sort(Ordering... orders)
      Sorts the documents from previous stages based on one or more Ordering criteria.

      This stage allows you to order the results of your pipeline. You can specify multiple Ordering instances to sort by multiple fields in ascending or descending order. If documents have the same value for a field used for sorting, the next specified ordering will be used. If all orderings result in equal comparison, the documents are considered equal and the order is unspecified.

      Example:

      
       // Sort books by rating in descending order, and then by title in ascending order for books with the same rating
       firestore.pipeline().collection("books")
           .sort(
               Ordering.of("rating").descending(),
               Ordering.of("title")  // Ascending order is the default
           );
       
      Parameters:
      orders - One or more Ordering instances specifying the sorting criteria.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • replaceWith

      @BetaApi public Pipeline replaceWith(String fieldName)
      Fully overwrites all fields in a document with those coming from a nested map.

      This stage allows you to emit a map value as a document. Each key of the map becomes a field on the document that contains the corresponding value.

      Example:

      
       // Input.
       // {
       //  "name": "John Doe Jr.",
       //  "parents": {
       //    "father": "John Doe Sr.",
       //    "mother": "Jane Doe"
       //   }
       // }
      
       // Emit parents as document.
       firestore.pipeline().collection("people").replaceWith("parents");
      
       // Output
       // {
       //  "father": "John Doe Sr.",
       //  "mother": "Jane Doe"
       // }
       
      Parameters:
      fieldName - The name of the field containing the nested map.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • replaceWith

      @BetaApi public Pipeline replaceWith(Expression expr)
      Fully overwrites all fields in a document with those coming from a nested map.

      This stage allows you to emit a map value as a document. Each key of the map becomes a field on the document that contains the corresponding value.

      Example:

      
       // Input.
       // {
       //  "name": "John Doe Jr.",
       //  "parents": {
       //    "father": "John Doe Sr.",
       //    "mother": "Jane Doe"
       //  }
       // }
      
       // Emit parents as document.
       firestore.pipeline().collection("people").replaceWith(field("parents"));
      
       // Output
       // {
       //  "father": "John Doe Sr.",
       //  "mother": "Jane Doe"
       // }
       
      Parameters:
      expr - The Expression field containing the nested map.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • sample

      @BetaApi public Pipeline sample(int limit)
      Performs a pseudo-random sampling of the documents from the previous stage.

      This stage will filter documents pseudo-randomly. The 'limit' parameter specifies the number of documents to emit from this stage, but if there are fewer documents from previous stage than the 'limit' parameter, then no filtering will occur and all documents will pass through.

      Example:

      
       // Sample 10 books, if available.
       firestore.pipeline().collection("books")
           .sample(10);
       
      Parameters:
      limit - The number of documents to emit, if possible.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • sample

      @BetaApi public Pipeline sample(Sample sample)
      Performs a pseudo-random sampling of the documents from the previous stage.

      This stage will filter documents pseudo-randomly. The 'options' parameter specifies how sampling will be performed. See SampleOptions for more information.

      Examples:

      
       // Sample 10 books, if available.
       firestore.pipeline().collection("books")
           .sample(Sample.withDocLimit(10));
      
       // Sample 50% of books.
       firestore.pipeline().collection("books")
           .sample(Sample.withPercentage(0.5));
       
      Parameters:
      sample - The Sample specifies how sampling is performed.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • union

      @BetaApi public Pipeline union(Pipeline other)
      Performs union of all documents from two pipelines, including duplicates.

      This stage will pass through documents from previous stage, and also pass through documents from previous stage of the `other` Pipeline given in parameter. The order of documents emitted from this stage is undefined.

      Example:

      
       // Emit documents from books collection and magazines collection.
       firestore.pipeline().collection("books")
           .union(firestore.pipeline().collection("magazines"));
       
      Parameters:
      other - The other Pipeline that is part of union.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • unnest

      @BetaApi public Pipeline unnest(String fieldName, String alias)
      Produces a document for each element in array found in previous stage document.

      For each previous stage document, this stage will emit zero or more augmented documents. The input array found in the previous stage document field specified by the `fieldName` parameter, will for each input array element produce an augmented document. The input array element will augment the previous stage document by replacing the field specified by `fieldName` parameter with the element value.

      In other words, the field containing the input array will be removed from the augmented document and replaced by the corresponding array element.

      Example:

      
       // Input:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tags": [ "comedy", "space", "adventure" ], ... }
      
       // Emit a book document for each tag of the book.
       firestore.pipeline().collection("books")
           .unnest("tags", "tag");
      
       // Output:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tag": "comedy", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tag": "space", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tag": "adventure", ... }
       
      Parameters:
      fieldName - The name of the field containing the array.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • unnest

      @BetaApi public Pipeline unnest(String fieldName, String alias, UnnestOptions options)
      Produces a document for each element in array found in previous stage document.

      For each previous stage document, this stage will emit zero or more augmented documents. The input array found in the previous stage document field specified by the `fieldName` parameter, will for each input array element produce an augmented document. The input array element will augment the previous stage document by replacing the field specified by `fieldName` parameter with the element value.

      In other words, the field containing the input array will be removed from the augmented document and replaced by the corresponding array element.

      Example:

      
       // Input:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tags": [ "comedy", "space", "adventure" ], ... }
      
       // Emit a book document for each tag of the book.
       firestore.pipeline().collection("books")
           .unnest("tags", "tag", new UnnestOptions().withIndexField("tagIndex"));
      
       // Output:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 0, "tag": "comedy", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 1, "tag": "space", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 2, "tag": "adventure", ... }
       
      Parameters:
      fieldName - The name of the field containing the array.
      options - The UnnestOptions options.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • unnest

      @BetaApi public Pipeline unnest(Selectable expr)
      Produces a document for each element in array found in previous stage document.

      For each previous stage document, this stage will emit zero or more augmented documents. The input array found in the previous stage document field specified by the `fieldName` parameter, will for each input array element produce an augmented document. The input array element will augment the previous stage document by replacing the field specified by `fieldName` parameter with the element value.

      In other words, the field containing the input array will be removed from the augmented document and replaced by the corresponding array element.

      Example:

      
       // Input:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tags": [ "comedy", "space", "adventure" ], ... }
      
       // Emit a book document for each tag of the book.
       firestore.pipeline().collection("books")
           .unnest(field("tags").as("tag"));
      
       // Output:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 0, "tag": "comedy", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 1, "tag": "space", ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 2, "tag": "adventure", ... }
       
      Parameters:
      expr - The name of the expression containing the array.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • unnest

      @BetaApi public Pipeline unnest(Selectable field, UnnestOptions options)
      Produces a document for each element in array found in previous stage document.

      For each previous stage document, this stage will emit zero or more augmented documents. The input array found in the specified by Selectable expression parameter, will for each input array element produce an augmented document. The input array element will augment the previous stage document by assigning the Selectable alias the element value.

      Example:

      
       // Input:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tags": [ "comedy", "space",
       "adventure" ], ... }
      
       // Emit a book document for each tag of the book.
       firestore.pipeline().collection("books")
           .unnest(field("tags").as("tag"), UnnestOptions.indexField("tagIndex"));
      
       // Output:
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 0, "tag": "comedy",
       "tags": [ "comedy", "space", "adventure" ], ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 1, "tag": "space", "tags":
       [ "comedy", "space", "adventure" ], ... }
       // { "title": "The Hitchhiker's Guide to the Galaxy", "tagIndex": 2, "tag": "adventure",
       "tags": [ "comedy", "space", "adventure" ], ... }
       
      Parameters:
      field - The expression that evaluates to the input array.
      options - The UnnestOptions options.
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • rawStage

      @BetaApi public Pipeline rawStage(RawStage stage)
      Adds a generic stage to the pipeline.

      This method provides a flexible way to extend the pipeline's functionality by adding custom stages. Each generic stage is defined by a unique `name` and a set of `params` that control its behavior.

      Example (Assuming there is no "where" stage available in SDK):

      
       // Assume we don't have a built-in "where" stage
       Map<String, Object> whereParams = new HashMap<>();
       whereParams.put("condition", field("published").lt(1900));
      
       firestore.pipeline().collection("books")
           .genericStage("where", Lists.newArrayList(field("published").lt(1900)), new RawOptions()) // Custom "where" stage
           .select("title", "author");
       
      Returns:
      A new Pipeline object with this stage appended to the stage list.
    • execute

      @BetaApi public com.google.api.core.ApiFuture<Pipeline.Snapshot> execute()
      Executes this pipeline and returns a future to represent the asynchronous operation.

      The returned ApiFuture can be used to track the progress of the pipeline execution and retrieve the results (or handle any errors) asynchronously.

      The pipeline results are returned as a list of PipelineResult objects. Each PipelineResult typically represents a single key/value map that has passed through all the stages of the pipeline, however this might differ depends on the stages involved in the pipeline. For example:

      • If there are no stages or only transformation stages, each PipelineResult represents a single document.
      • If there is an aggregation, only a single PipelineResult is returned, representing the aggregated results over the entire dataset .
      • If there is an aggregation stage with grouping, each PipelineResult represents a distinct group and its associated aggregated values.

      Example:

      
       ApiFuture<Snapshot> futureResults = firestore.pipeline().collection("books")
           .where(gt("rating", 4.5))
           .select("title", "author", "rating")
           .execute();
       
      Returns:
      An ApiFuture representing the asynchronous pipeline execution.
    • execute

      @BetaApi public com.google.api.core.ApiFuture<Pipeline.Snapshot> execute(PipelineExecuteOptions options)
    • execute

      @BetaApi public void execute(com.google.api.gax.rpc.ApiStreamObserver<PipelineResult> observer)
      Executes this pipeline, providing results to the given ApiStreamObserver as they become available.

      This method allows you to process pipeline results in a streaming fashion, rather than waiting for the entire pipeline execution to complete. The provided ApiStreamObserver will receive:

      • **onNext(PipelineResult):** Called for each PipelineResult produced by the pipeline. Each PipelineResult typically represents a single key/value map that has passed through all the stages. However, the exact structure might differ based on the stages involved in the pipeline (as described in execute()).
      • **onError(Throwable):** Called if an error occurs during pipeline execution.
      • **onCompleted():** Called when the pipeline has finished processing all documents.

      Example:

      
       firestore.pipeline().collection("books")
           .where(gt("rating", 4.5))
           .select("title", "author", "rating")
           .execute(new ApiStreamObserver<PipelineResult>() {
               @Override
               public void onNext(PipelineResult result) {
                   // Process each result as it arrives
                   System.out.println(result.getData());
               }
      
               @Override
               public void onError(Throwable t) {
                   // Handle errors during execution
                   t.printStackTrace();
               }
      
               @Override
               public void onCompleted() {
                   System.out.println("Pipeline execution completed.");
               }
           });
       
      Parameters:
      observer - The ApiStreamObserver to receive pipeline results and events.
    • toProtoValue

      @InternalApi public Value toProtoValue()